-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression between v1.76.0 and v1.77.2 #125543
Comments
Here are the contents of
|
@jmillikin How does the performance go on 1.78.0, the current stable? |
v1.78.0 generates the same assembly for |
wow, and that's the version that actually has the LLVM upgrade. |
I tried using cc @cjgillot who may have some ideas on whether the new code is working as intended or not. My suspicion is that unification of searched nightlies: from nightly-2023-12-21 to nightly-2024-02-01 bisected with cargo-bisect-rustc v0.6.8Host triple: x86_64-unknown-linux-gnu cargo bisect-rustc --start=1.76.0 --end=1.77.0 --script=./test.sh |
@jmillikin I also tried bisecting this, but had some problems with the script. If you don't mind, what's the content of your |
It is super duper hacky -- literally grepping for an instruction pattern that only shows up in the fast version. #!/bin/sh
set -eux
RUSTFLAGS="-Copt-level=3" cargo build --release
objdump -M intel "$CARGO_TARGET_DIR/x86_64-unknown-linux-gnu/release/main" --no-addresses --no-show-raw-insn --disassemble='benchmark_decode_u32' | grep 'd,DWORD PTR'
exit $? A more portable version might be to invoke |
Not sure if this is useful, but I found a way to make v1.76.0 emit the same code as v1.77.2: pub fn decode_u32(buf: &[u8; 5]) -> (u32, usize) {
let prefix_0 = buf[0];
if prefix_0 < 0x80 {
return (prefix_0 as u32, 1);
}
if prefix_0 < 0xF0 {
let x = u32::from_le(unsafe {
ptr::from_ref(buf).cast::<u32>().read_unaligned()
});
let prefix_1 = buf[0];
// let prefix_1 = prefix_0;
if prefix_1 < 0b11000000 {
let decoded = (x & 0x3F) | ((x & 0xFF00) >> 2);
return (decoded, 2);
}
if prefix_1 < 0b11100000 {
let decoded = (x & 0x1F) | ((x & 0xFFFF00) >> 3);
return (decoded, 3);
}
let decoded = (x & 0x0F) | ((x & 0xFFFFFF00) >> 4);
return (decoded, 4);
}
let decoded = u32::from_le(unsafe {
ptr::from_ref(buf)
.cast::<u8>()
.add(1)
.cast::<u32>()
.read_unaligned()
});
(decoded, ((prefix_0 & 0x0F) + 2) as usize)
} This version is compiled to the same assembly as the original in both v1.76.0 and v1.77.2. If the statement |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-medium |
Disabling GVN via |
Code
I'm working on some low-level bit-manipulation code, and discovered that v1.77 generates code that runs at about half the speed compared to the output of v1.76. Since the v1.77 release notes don't mention anything about an LLVM upgrade I figure there must be something going wrong in the
rustc
LLVM IR generation.Compilable reproduction:
The inner loop is
decode_u32()
, which is sort of ugly but isn't doing anything especially complex.The exact timing varies depending on the distribution of input values. The attached reproduction uses a distribution for which the new code is about twice as slow.
Version it worked on
It most recently worked on: Rust v1.76.0
Version with regression
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: