-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gvn: Promote/propagate const local array #126444
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[WIP] gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used PromoteTemps pass. Fix rust-lang#73825 ### Current status - [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F). r? ghost
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e26c0b3): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 3.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -4.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 671.518s -> 674.73s (0.48%) |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[WIP] gvn: Promote/propagate const local array Rewriting of rust-lang#125916 which used PromoteTemps pass. Fix rust-lang#73825 ### Current status - [ ] Waiting for [consensus](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Could.20const.20read-only.20arrays.20be.20const.20promoted.3F). r? ghost
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (7e160d4): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary 9.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 670.938s -> 673.147s (0.33%) |
This comment has been minimized.
This comment has been minimized.
big arrays make the diff hard to read
How does it affect debug build ? Isn't gvn run only if |
Don't mind max RSS, it's quite noisy. Unless there are regressions across the board, you can mostly ignore it. |
this slips through commmit "shorten array in tests"
@@ -382,7 +396,16 @@ impl<'body, 'tcx> VnState<'body, 'tcx> { | |||
let ty = match kind { | |||
AggregateTy::Array => { | |||
assert!(fields.len() > 0); | |||
Ty::new_array(self.tcx, fields[0].layout.ty, fields.len() as u64) | |||
let field_ty = fields[0].layout.ty; | |||
// Ignore nested array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just feel that nested arrays are rarer than 1-demensional arrays.
And it is kinda redundant that in MIR output that each Local
is const promoted as a const array.
If nested array are allowed, we would have.
let a = [[b], [c]];
promoted as
_b = const [.. ];
_c = const [.. ];
-_a = const [move _b, moved _c];
+_a = const [[..], [..]];
We could const promote only _a
, but it is more complex to ignore _b and _c.
if ty.layout.size().bytes() <= STACK_THRESHOLD { | ||
return None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, should we attempt to fallback to the next branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the fallback is return None
at line L471 ? Maybe I read the code wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the matches!(ty.abi, Abi::Scalar(..) | Abi::ScalarPair(..))
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But ty.abi
of array is not Abi::Scalar*
though. And with STACK_THRESHOLD, I fallback to default behavior on master branch. Also I committed 4fb498e.
But I don't get what the differences between ecx.intern_with_temp_alloc
and ecx.alloc
, in the case of big array (sizeof(array) > stack_size
). Or should I use rustc_const_eval::const_eval::machine::MemoryKind
?
// Ignore arrays in operands. | ||
// Prevent code bloat that makes | ||
// `_2 = _1` now resolved to `_2 = <evaluated array>`. | ||
let disallow_dup_array = if rvalue.ty(self.local_decls, self.tcx).is_array() | ||
&& let Some(locals) = self.rev_locals.get(value).as_deref() | ||
&& let [first, ..] = locals[..] | ||
{ | ||
first != lhs.local | ||
} else { | ||
false | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this is meant to do. Why is this only checking the first inside rev_local
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, duplicate arrays in code will be duplicated in mir output.
Consider this
// code
a = [1,2,3];
b = [1,2,3];
// mir with and without this condition.
_1 = const [1,2,3];
-_2 = _1;
+_2 = const [1,2,3];
In short, we want to run try_as_local
for duplicate arrays. But I don't know how often it is to worry about it.
StorageDead(_12); | ||
StorageLive(_13); | ||
StorageLive(_14); | ||
_14 = [const 1_i32, const 0_i32, const 0_i32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this one made a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sizeof(_14) is 12 bytes ( < STACK_THRESHOULD
). After codegen, it is just a mov xmm
instruction, which is very cheap.
Co-authored-by: Camille Gillot <gillot.camille@gmail.com>
This comment has been minimized.
This comment has been minimized.
more explicit ir checks
dbg: move to sub functions for easier filtering tracing
This breaks many invariants of the pass. `new_index` may not be newly created. The caching is what `evaluated` does.
This comment has been minimized.
This comment has been minimized.
Could not assign reviewer from: |
Rewriting of #125916 which used
PromoteTemps
pass.This allows promoting constant local arrays as anonymous constants. So that's in codegen for
a local array, rustc outputs
llvm.memcpy
(which is easy for LLVM to optimize) instead of a seriesof
store
on stack (a.k.a in-place initialization). This makes rustc on par with clang on this specific case.See more in #73825 or zulip for more info.
Here is a simple micro benchmark that shows the performance differences between promoting arrays or not.
Prior discussions on zulip.
This patch saves about 600 KB (~0.5%) of
![image](https://private-user-images.githubusercontent.com/15225902/339982907-0e37559c-f5d9-4cdf-b7e3-a2956fd17bc1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAzOTE0MjQsIm5iZiI6MTcyMDM5MTEyNCwicGF0aCI6Ii8xNTIyNTkwMi8zMzk5ODI5MDctMGUzNzU1OWMtZjVkOS00Y2RmLWI3ZTMtYTI5NTZmZDE3YmMxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA3VDIyMjUyNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY0ZDZlNDY3YmUxNTI3MWRjNzgwM2I3NTI3ZmVhNmY5YzVhYTY2ODg1Y2MyOWMzOGQ0NWEzMzMzNzkxYmFjM2UmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.q-KeEqc3dZCdJbff78pW8o-lRNrHl4bYC7SDqOWCDqk)
librustc_driver.so
.Fix #73825
r? cjgillot
Unresolved questions
I think that promoting nested arrays is bloating codegen.
If yes, the test should be updated to make arrays larger than 32 bytes.
Is this concerning thatcall(move _1)
is nowcall(const [array])
?It reverted back to
call(move _1)