New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite native thread-local storage #116123
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@rustbot author |
fn __init() -> $t { | ||
$init | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been impl'ing a lint in clippy to suggest const
if possible, and it relies on this structure existing and returning exactly $init
. As long as this doesn't change, the lint should keep working.
☔ The latest upstream changes (presumably #117285) made this pull request unmergeable. Please resolve the merge conflicts. |
@joboet any updates on this? thanks |
@rustbot label +S-blocked I want to do some other cleanups before this can be merged. |
08bc1c6
to
7b69733
Compare
This comment has been minimized.
This comment has been minimized.
I'm curious as to whether this affects performance. This should reduce the number of TLS accesses, after all. |
This comment has been minimized.
This comment has been minimized.
a087a37
to
6c3b701
Compare
I've split up the implementation into different types for Let's try perf again. |
This comment was marked as outdated.
This comment was marked as outdated.
Rewrite native thread-local storage (part of rust-lang#110897) The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses. This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however. r? `@m-ou-se` `@rustbot` label +T-libs
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
6c3b701
to
d668fd7
Compare
This comment has been minimized.
This comment has been minimized.
d668fd7
to
6bc1647
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Rewrite native thread-local storage (part of rust-lang#110897) The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses. This does not change the recursive initialization behaviour I described in [this comment](rust-lang#110897 (comment)), so it should be a library-only change. Changing that behaviour should be quite easy now, however. r? `@m-ou-se` `@rustbot` label +T-libs
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (765aea1): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 667.419s -> 668.667s (0.19%) |
That's definitely better. Still, some more regressions than I'd hoped. My guess is that LLVM misses some optimizations on Given that the average performance has improved, I'd say that the regressions are worth it. What do you think, @m-ou-se? |
The cachegrind diffs for the regressions are still dominated by |
No, it is not wrong. For I'll interpret an increase in instructions for
In either of those cases, the problem is bad optimization. |
6bc1647
to
507d6c9
Compare
☔ The latest upstream changes (presumably #123339) made this pull request unmergeable. Please resolve the merge conflicts. |
507d6c9
to
911ead7
Compare
☔ The latest upstream changes (presumably #124428) made this pull request unmergeable. Please resolve the merge conflicts. |
(part of #110897)
The current native thread-local storage implementation has become quite messy, uses indescriptive names and unnecessarily adds code to the macro expansion. This PR tries to fix that by using a new implementation that also allows more layout optimizations and potentially increases performance by eliminating unnecessary TLS accesses.
This does not change the recursive initialization behaviour I described in this comment, so it should be a library-only change. Changing that behaviour should be quite easy now, however.
r? @m-ou-se
@rustbot label +T-libs