-
Notifications
You must be signed in to change notification settings - Fork 13.8k
perf: change RawVec
grow_one
from #[inline(never)]
to #[inline]
#146819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I noticed that this was not inlined when trying to `push` and probably the reason why the compiler lose the information about the capacity (fixes rust-lang#82801) BTW The original PR that added it (rust-lang#91352) wrote: > [...] I tried lots of minor variations on this, e.g. different inlining attributes. This was the best one I could find. [...] I never contributed to Rust so I did not know how to test that it actually fixes the problem, but I'm fairly certain it is. Consider the very basic example (Godbolt rustc 1.90.0 `-C opt-level=3 -C target-feature=+avx2 -C codegen-units=1`) ```rust #[no_mangle] fn extend_offsets(offsets: &[usize]) -> Vec::<usize> { let mut intermediate = Vec::<usize>::with_capacity(offsets.len()); for &offset in offsets { intermediate.push(offset) } intermediate } ``` it does not inline `grow_one` which make it not use SIMD. If however we are using [`push_within_capacity`](rust-lang#100486): ```rust #![feature(vec_push_within_capacity)] #[no_mangle] fn extend_offsets(offsets: &[usize]) -> Vec::<usize> { let mut intermediate = Vec::<usize>::with_capacity(offsets.len()); for &offset in offsets { intermediate.push_within_capacity(offset).unwrap() } intermediate } ``` it will use SIMD
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
perf: change `RawVec` `grow_one` from `#[inline(never)]` to `#[inline]`
The job Click to see the possible cause of the failure (guessed by this bot)
|
This comment has been minimized.
This comment has been minimized.
The way to do this would be a codegen test. I think you could add a new function at You should also add another function that doesn't have the capacity information and passes with (not sure how familiar you are with filecheck but
@nnethercote it's been forever so I doubt it, but any idea what metrics you were going for here? |
Finished benchmarking commit (2b1a9e5): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 3.4%, secondary -3.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.6%, secondary 1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 471.07s -> 478.071s (1.49%) |
Debugging locally for the last 3 hours, the problem for not optimizing into SIMD is not because of the |
How do you help the optimizer in this case |
I haven't dug into it but you may be able to play with using rust/library/alloc/src/raw_vec/mod.rs Lines 425 to 426 in dd7fda5
assert_unchecked can make compile times worse.
The usual process here usually involves playing with the implementations and looking at the result of If you're getting really deep into things, godbolt also has "add new"->"opt pipeline" for looking at optimizations done by each LLVM pass one at a time. Can be helpful for looking at things more incrementally, rather than just the coarse diff between |
So I found the problem. LLVM does not eliminate dead code in some cases: const COUNT: usize = 10000;
#[no_mangle]
pub fn run() -> usize {
let mut inter = 0;
let mut my_cap = COUNT;
for _ in 0..COUNT {
let len = inter;
if len == my_cap {
something(len);
}
inter += 1;
}
return inter;
}
#[inline(never)]
fn something(len: usize) {
std::hint::black_box(len as i128);
}
pub fn main() {
std::hint::black_box(run());
} it will eliminate: if len == my_cap {
something(len);
} but if you change the code of the if len == my_cap {
something(len);
my_cap += 1;
} it will not eliminate the looking at Rust MIR, the will try to look inside LLVM (Ugh, CPP) |
I noticed that this was not inlined when trying to
push
and probably the reason why the compiler lose the information about the capacity (fixes #82801) - is there a way I can test that to make sure the issue is fixed?BTW The original PR that added it (#91352) wrote:
I never contributed to Rust so I did not know how to test that it actually fixes the problem, but I'm fairly certain it is.
Consider the very basic example (Godbolt rustc 1.90.0
-C opt-level=3 -C target-feature=+avx2 -C codegen-units=1
)it does not inline
grow_one
which make it not use SIMD.If however we are using
push_within_capacity
:it will use SIMD