Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upNightly performance issue with vec::extend_with_element #32155
Comments
This comment has been minimized.
This comment has been minimized.
|
If I'm reading callgrind/kcachegrind output correctly, this is being called by io::buffered::BufReader::...::with_capacity, but the extend_with_element function is the thing that's actually taking all the time. |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the report! This may be related to some minor tweak in the standard library or maybe an LLVM update (not sure), but is there is something that we can run locally to help diagnose as well? The main profilers I've used at least are |
This comment has been minimized.
This comment has been minimized.
|
I can reproduce with for example the code version rustc 1.9.0-nightly (998a672 2016-03-07) |
This comment has been minimized.
This comment has been minimized.
|
I'll have to check if #31999 can have caused this |
This comment has been minimized.
This comment has been minimized.
|
@bluss are you sure? The IR for this function looks exactly the same on stable/beta/nightly for me: pub fn foo(n: usize) -> Vec<u8> {
vec![0; n]
} |
This comment has been minimized.
This comment has been minimized.
|
I'm trying to reduce a testcase, I've only gotten it down to this: https://gist.github.com/bluss/c31b308feb347067ab19 The case that is slow in nightly is fillvec_vec_macro_u8. It's slow in |
This comment has been minimized.
This comment has been minimized.
|
the .as_ptr() change is unrelated (I tried with it reverted). |
This comment has been minimized.
This comment has been minimized.
|
@bluss From what I could see, this was happening on Beta as well. So it might be easier to find the problem commit on that branch. |
steveklabnik
added
the
I-slow
label
Mar 11, 2016
This comment has been minimized.
This comment has been minimized.
|
I've done some further investigation, and found the compiler version where this bug was first introduced. On rust-nightly 2016-02-11 the regression did not occur, on rust-nightly 2016-02-12 (and all following tested versions) the regression did occur. Interestingly the testcase made by @bluss was almost identical for both of these compiler versions. Though strangely, there has been a ten-times speedup for resize, vec_macro_u32 and vec_macro_u8 between 2016-02-11/12 nightlies and a recent nightly. For testing, I'm currently using perf report to identify a high count of vec::Vec$LT$T$GT$::extend_with_element when running the "psq" command from this repo. There is also a high count of "isolate_freepages_block", "je_arena_malloc_large", and "arena_chunk_alloc" on the problem compilers, so I'll try further investigating this in that direction. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
jemalloc was updated to 4.1.0 a week ago, we also now use it without the je_ prefix (affects some llvm malloc special casing). |
This comment has been minimized.
This comment has been minimized.
|
Interestingly this doesn't seem to occur when I use opt-level=2 with a recent nightly compiler. |
This comment has been minimized.
This comment has been minimized.
|
Alright, so after waiting though many hours of compilation, I've worked out it's definitely the noalias change introduced in a17fb64. (Or at least, the regression occurs on this commit, but does not occur for the previous commit). I'll also mention again that this does impact the beta channel, so if there is some kind of fix for this, it would need to be back ported. If you want me to do any further diagnostics, I'll be happy to run them. But for now I don't think there is much else I can do with this. |
This comment has been minimized.
This comment has been minimized.
|
What I see is that in the program psq, in the original report, with current nightly, extend_with_element compiles into a loop where it repeatedly sets the length of the vector in the loop. This is intended to be something that the optimizer lifts out of the loop. │ e0: mov BYTE PTR [rdx+rcx*1],0x0 ▒
28,57 │ lea rsi,[rdi+rcx*1] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
│ mov BYTE PTR [rdx+rcx*1+0x1],0x0 ▒
│ lea rsi,[rdi+rcx*1+0x1] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
│ mov BYTE PTR [rdx+rcx*1+0x2],0x0 ▒
14,29 │ lea rsi,[rdi+rcx*1+0x2] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
│ mov BYTE PTR [rdx+rcx*1+0x3],0x0 ▒
28,57 │ lea rsi,[rdi+rcx*1+0x3] ▒
│ mov QWORD PTR [rbx+0x10],rsi ▒
14,29 │ add rcx,0x4 ▒
│ cmp r8,rcx ▒
│ ↑ jne e0 |
bluss
referenced this issue
May 9, 2016
Closed
Performance issue in `write_all` (`Vec::extend_from_slice`) #33518
tikue
added a commit
to tikue/tarpc
that referenced
this issue
May 28, 2016
This comment has been minimized.
This comment has been minimized.
|
Triage: Still exists in rustc 1.13.0-nightly (923bac4 2016-09-06) |
keeperofdakeys commentedMar 9, 2016
When compiling this program with nightly and beta, the runtime increases by 1/3, and perf reports that "vec::Vec$LT$T$GT$::extend_with_element::..." is taking 20% of the runtime. This doesn't occur with the stable compiler.
Are there any tools, or perf options, that I can use to further debug this?