-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIR: global performance regression with Vec resize() or extend_from_slice(), since 1.12 #40267
Comments
Can confirm. Semi-minified code (not to depend on formatters which generate tons of junk IR - plz fix @eddyb by introducing ): #![crate_type="rlib"]
/// Audio sample rate for the test set, used for realtime speed
/// calculation
const SAMPLE_RATE: f64 = 48000.0;
/// Total length of samples the filter benchmarks are ran on
const SAMPLE_COUNT: u64 = 524288;
/// Select how many IIR filters should be applied consecutively
/// on each buffer during the benchmark
const FILTER_COUNT: usize = 100;
const BUFFER_LEN: usize = 128;
/// 2nd order biquad filter
#[derive(Copy)]
struct Biquad {
b0: f64,
b1: f64,
b2: f64,
a1: f64,
a2: f64,
x1: f64,
x2: f64,
y1: f64,
y2: f64,
}
impl Clone for Biquad {
fn clone(&self) -> Biquad {
*self
}
}
impl Biquad {
fn new() -> Biquad {
Biquad {
b0: 0.0,
b1: 0.0,
b2: 0.0,
a1: 0.0,
a2: 0.0,
x1: 0.0,
x2: 0.0,
y1: 0.0,
y2: 0.0,
}
}
}
fn iir(buf: &mut [f64], bq: &mut Biquad) {
for i in 0..buf.len() {
let x = buf[i];
buf[i] = (bq.b0 * x) + (bq.b1 * bq.x1) + (bq.b2 * bq.x2) - (bq.a1 * bq.y1) -
(bq.a2 * bq.y2);
bq.x2 = bq.x1;
bq.x1 = x;
bq.y2 = bq.y1;
bq.y1 = buf[i];
}
}
#[cfg(slow)]
pub fn foo() {
println!("Create an empty vector, resized then discarded");
let mut vec_test: Vec<f64> = Vec::new();
vec_test.resize(1234, 0.0);
}
#[inline(never)]
pub fn sample(buffer_len: usize) {
let buffer_count = SAMPLE_COUNT / buffer_len as u64;
for _ in 0..10 {
let mut buf = vec![0.0; buffer_len];
let mut biquads = [Biquad::new(); FILTER_COUNT];
for _ in 0..buffer_count {
for f in 0..FILTER_COUNT {
iir(buf.as_mut_slice(), &mut biquads[f]);
}
}
}
} The fast inner loop looks like:
And the slow one looks like (notice the additional, un-extracted stores - looks like some aliasing issue):
|
The 4 extra stores are to the |
resize's grow loop has already been patched with SetLenOnDrop for an aliasing issue, maybe that now backfires? |
The problem is the missing SROA, which seems to result from an un-inlined call to In the |
Of course, an interesting question is why are things "fast" on non-MIR trans. |
This patch fixes the slowness:
I'll still want to check what pre-MIR is doing in this context to ensure we are not missing anything important. |
@arielb1 I wasn't able to confirm that the patch fixed the slowness, with identical results on patched rustc x86-64 1.15.1 source running on Core i5-750. |
Now confirmed working with patched rustc 1.15.1x86-64 on i5-750 ! My mistake in the previous test might have been to re-compile and x.py dist an already compiled rustc repo after applying the patch. Maybe not all the necessary elements were re-built that way. |
Confirmed the fix is successful on Raspberry Pi 3 also. |
Thanks, the confirmation of the fix is valuable information. This should open until a fix is merged in. |
It seems that the IR created by MIR contains cross-basic-block memsets that confuse LLVM. Running |
Paradoxically, the reason LICM makes things fast is because it replaces code in the style of:
With code in the style of
Which LLVM surprisingly enough handles better. I suppose MemCpyOpt needs a "merge adjacent memset" optimization. |
Hi! sorry, is there any news about this problem in 2020? |
A bug introduced with rustc 1.12.0 with MIR enabled.
This was originally reported imprecisely as #40044
If anywhere in the code a vector is either resize() or extend_from_slice(), any algorithms operating on any other vector will run slower.
The performance impact varies with the compiler targets and CPU characteristics.
Real world examples measured on an audio DSP algorithm
armv7
Raspberry Pi 3, Rustc 1.12.0: 14% slower
Raspberry Pi 3, Rustc 1.15.1: 11% slower
Nexus 5, Rustc 1.15: 1% slower
arm
Raspberry Pi, Rustc 1.15.1: 8% slower
x86-64
Core i5-750, Rustc 1.12.1 or Rustc 1.15.1: 0.4% slower
When compiling using Rustc 1.12.0 with MIR disabled, the performance regression does not occur.
Test and demo project
https://github.com/supercurio/rust-issue-mir-vec-slowdown
The text was updated successfully, but these errors were encountered: