Rewrite native thread-local storage #116123

joboet · 2023-09-24T17:57:43Z

The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.

This does not change the recursive initialization behaviour I described in this comment, so it should be a library-only change. Changing that behaviour should be quite easy now, however.

r? @m-ou-se
@rustbot label +T-libs

library/std/src/sys/common/thread_local/fast_local.rs

joboet · 2023-11-24T12:34:34Z

@rustbot author
until I fix CI.

m-rph · 2023-12-28T18:01:44Z

library/std/src/sys/common/thread_local/fast_local.rs

+        fn __init() -> $t {
+            $init
+        }


I have been impl'ing a lint in clippy to suggest const if possible, and it relies on this structure existing and returning exactly $init. As long as this doesn't change, the lint should keep working.

rust-lang/rust-clippy#12015

bors · 2024-01-13T16:39:05Z

☔ The latest upstream changes (presumably #117285) made this pull request unmergeable. Please resolve the merge conflicts.

Dylan-DPC · 2024-02-15T15:09:47Z

@joboet any updates on this? thanks

joboet · 2024-02-15T15:15:06Z

@rustbot label +S-blocked

I want to do some other cleanups before this can be merged.

joboet · 2024-03-12T13:06:45Z

@rustbot ready

The other improvements proposed by #110897 become much easier once this is merged, so I'm prioritising this one again.

joboet · 2024-03-16T11:22:05Z

I'm curious as to whether this affects performance. This should reduce the number of TLS accesses, after all.
@bors try @rust-timer queue

joboet · 2024-03-18T18:06:02Z

I've split up the implementation into different types for const/lazy initialization, so we are no longer at the mercy of the optimizer to recognize a discriminant update to the state enum without copying the TLS value around. This is probably better from a readability standpoint as well, even though it duplicates some very similar code.

Let's try perf again.
@bors try @rust-timer queue

Rewrite native thread-local storage (part of rust-lang#110897) The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses. This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however. r? `@m-ou-se` `@rustbot` label +T-libs

joboet · 2024-03-18T20:52:12Z

@bors try @rust-timer queue

Rewrite native thread-local storage (part of rust-lang#110897) The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses. This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however. r? `@m-ou-se` `@rustbot` label +T-libs

bors · 2024-03-18T20:53:23Z

⌛ Trying commit 6bc1647 with merge 765aea1...

bors · 2024-03-18T22:26:43Z

☀️ Try build successful - checks-actions
Build commit: 765aea1 (765aea13f10fbcaea0da13cb3ad88f9d65be842d)

bors · 2024-03-18T22:26:44Z

☀️ Try build successful - checks-actions
Build commit: 765aea1 (765aea13f10fbcaea0da13cb3ad88f9d65be842d)

rust-timer · 2024-03-18T23:40:18Z

Finished benchmarking commit (765aea1): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	4
Regressions ❌ (secondary)	1.3%	[1.3%, 1.3%]	3
Improvements ✅ (primary)	-0.8%	[-2.2%, -0.3%]	6
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-2.2%, 0.4%]	10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.4%	[0.8%, 18.7%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.2%	[-9.3%, -0.2%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.8%	[-9.3%, 18.7%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.7%	[1.6%, 1.7%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.5%	[-2.1%, -0.9%]	2
Improvements ✅ (secondary)	-6.1%	[-6.1%, -6.1%]	1
All ❌✅ (primary)	0.1%	[-2.1%, 1.7%]	4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.0%, 0.6%]	33
Regressions ❌ (secondary)	0.1%	[0.0%, 1.4%]	38
Improvements ✅ (primary)	-0.2%	[-0.5%, -0.0%]	22
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.5%, 0.6%]	55

Bootstrap: 667.419s -> 668.667s (0.19%)
Artifact size: 312.79 MiB -> 312.84 MiB (0.02%)

joboet · 2024-03-20T12:03:01Z

That's definitely better. Still, some more regressions than I'd hoped. My guess is that LLVM misses some optimizations on const initialized locals because the state is now in the same global as the data. I don't know how to best resolve that in the general case, as that separation is the cause of the improvements. In the specific case of proc-macros, I think we can optimize the TLS usage a bit, but that's for another PR.

Given that the average performance has improved, I'd say that the regressions are worth it. What do you think, @m-ou-se?

saethlin · 2024-03-20T13:10:58Z

The cachegrind diffs for the regressions are still dominated by __tls_get_addr so I think it's wrong to blame missed optimizations.

joboet · 2024-03-20T13:54:27Z

The cachegrind diffs for the regressions are still dominated by __tls_get_addr so I think it's wrong to blame missed optimizations.

No, it is not wrong.

For const-initialized TLS variables, which is what is used in the regressions, the old implementation has at least two TLS references (on the fast path, which is what counts here). One is for the state and the other for the data. This PR reduces that to just one reference.

I'll interpret an increase in instructions for __tls_get_addr as an increase in the number of calls to that function. As the number of TLS references in the rustc generated IR has reduced, this can only mean that either

the number of TLS references that LLVM was able to remove in the old implementation is more than twice that of the new implementation
LLVM generates more calls to __tls_get_addr for the new implementation than necessary. This can happen because LLVM sees thread-local references as references to globals, which it doesn't like to cache. This was supposedly fixed for Linux, but still, the way the IR works there's always a risk of introducing more calls.

In either of those cases, the problem is bad optimization.

bors · 2024-04-06T17:23:33Z

☔ The latest upstream changes (presumably #123339) made this pull request unmergeable. Please resolve the merge conflicts.

…read-locals

bors · 2024-04-27T09:45:36Z

☔ The latest upstream changes (presumably #124428) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot assigned m-ou-se Sep 24, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 24, 2023

This comment has been minimized.

Sign in to view

joboet mentioned this pull request Sep 25, 2023

std::thread::local internals allow race conditions in safe but unstable code. #43733

Closed

RustyYato suggested changes Sep 27, 2023

View reviewed changes

library/std/src/sys/common/thread_local/fast_local.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 24, 2023

joboet marked this pull request as draft November 24, 2023 12:34

joboet mentioned this pull request Dec 28, 2023

New Lint: [thread_local_initializer_can_be_made_const] rust-lang/rust-clippy#12026

Merged

m-rph reviewed Dec 28, 2023

View reviewed changes

rustbot added the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Feb 15, 2024

joboet force-pushed the rewrite_native_tls branch from 08bc1c6 to 7b69733 Compare March 12, 2024 13:04

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Mar 12, 2024

joboet marked this pull request as ready for review March 12, 2024 13:04

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 16, 2024

joboet force-pushed the rewrite_native_tls branch from a087a37 to 6c3b701 Compare March 18, 2024 18:02

This comment was marked as outdated.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2024

This comment was marked as outdated.

Sign in to view

This comment has been minimized.

Sign in to view

joboet force-pushed the rewrite_native_tls branch from 6c3b701 to d668fd7 Compare March 18, 2024 19:23

This comment has been minimized.

Sign in to view

joboet force-pushed the rewrite_native_tls branch from d668fd7 to 6bc1647 Compare March 18, 2024 19:47

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2024

joboet force-pushed the rewrite_native_tls branch from 6bc1647 to 507d6c9 Compare March 28, 2024 14:08

joboet mentioned this pull request Apr 1, 2024

Tracking issue for cleaning up std's thread_local implementation details #110897

Open

22 tasks

joboet added 2 commits April 8, 2024 12:25

std: rewrite native thread-local storage

4727b6f

delete UI tests that only check internal implementation details of th…

911ead7

…read-locals

joboet force-pushed the rewrite_native_tls branch from 507d6c9 to 911ead7 Compare April 8, 2024 10:26

joboet mentioned this pull request Apr 26, 2024

thread_local: be excruciatingly explicit in dtor code #124387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite native thread-local storage #116123

Rewrite native thread-local storage #116123

joboet commented Sep 24, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

joboet commented Nov 24, 2023

m-rph Dec 28, 2023 •

edited

bors commented Jan 13, 2024

Dylan-DPC commented Feb 15, 2024

joboet commented Feb 15, 2024

joboet commented Mar 12, 2024

This comment has been minimized.

joboet commented Mar 16, 2024

This comment has been minimized.

joboet commented Mar 18, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment has been minimized.

This comment has been minimized.

joboet commented Mar 18, 2024

This comment has been minimized.

bors commented Mar 18, 2024

bors commented Mar 18, 2024

bors commented Mar 18, 2024

This comment has been minimized.

rust-timer commented Mar 18, 2024

joboet commented Mar 20, 2024

saethlin commented Mar 20, 2024

joboet commented Mar 20, 2024 •

edited

bors commented Apr 6, 2024

bors commented Apr 27, 2024

Rewrite native thread-local storage #116123

Are you sure you want to change the base?

Rewrite native thread-local storage #116123

Conversation

joboet commented Sep 24, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

joboet commented Nov 24, 2023

m-rph Dec 28, 2023 • edited

Choose a reason for hiding this comment

bors commented Jan 13, 2024

Dylan-DPC commented Feb 15, 2024

joboet commented Feb 15, 2024

joboet commented Mar 12, 2024

This comment has been minimized.

joboet commented Mar 16, 2024

This comment has been minimized.

joboet commented Mar 18, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment has been minimized.

This comment has been minimized.

joboet commented Mar 18, 2024

This comment has been minimized.

bors commented Mar 18, 2024

bors commented Mar 18, 2024

bors commented Mar 18, 2024

This comment has been minimized.

rust-timer commented Mar 18, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

joboet commented Mar 20, 2024

saethlin commented Mar 20, 2024

joboet commented Mar 20, 2024 • edited

bors commented Apr 6, 2024

bors commented Apr 27, 2024

m-rph Dec 28, 2023 •

edited

joboet commented Mar 20, 2024 •

edited