Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow in rayon-core 1.8 causes process abort on 32 bit CPUs #797

Closed
ebarnard opened this issue Sep 11, 2020 · 6 comments · Fixed by #800
Closed

Integer overflow in rayon-core 1.8 causes process abort on 32 bit CPUs #797

ebarnard opened this issue Sep 11, 2020 · 6 comments · Fixed by #800

Comments

@ebarnard
Copy link

ebarnard commented Sep 11, 2020

I have some software that aborts when compiled as a 32-bit executable but runs fine as a 64-bit executable. It seems that Counters::plus() is causing word to overflow leading to Rayon aborting the process.

The abort is in a deadlock detection test that tries to trigger a race condition that affected the software's original deadlock detection, so it ends up calling rayon::join quite a lot in a short space of time (but not anywhere near 2^32 times). It occurs in debug and release mode (with overflow checks enabled).

Reverting to rayon 1.3.1 and rayon-core 1.7.1 fixes the issue.

I'm working on a minimal example as I can't share the original software.

Backtrace

thread '<unnamed>' panicked at 'attempt to add with overflow', /home/pi/.cargo/registry/src/github.com-1285ae84e5963aae/rayon-core-1.8.0/src/sleep/counters.rs:226:19
thread '<unnamed>' panicked at 'attempt to add with overflow', /home/pi/.cargo/registry/src/github.com-1285ae84e5963aae/rayon-core-1.8.0/src/sleep/counters.rs:226:19
thread '<unnamed>' panicked at 'attempt to add with overflow', /home/pi/.cargo/registry/src/github.com-1285ae84e5963aae/rayon-core-1.8.0/src/sleep/counters.rs:226:19
thread '<unnamed>' panicked at 'attempt to add with overflow', /home/pi/.cargo/registry/src/github.com-1285ae84e5963aae/rayon-core-1.8.0/src/sleep/counters.rs:226:19
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1076
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1537
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:217
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:526
  11: rust_begin_unwind
             at src/libstd/panicking.rs:437
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  13: core::panicking::panic
             at src/libcore/panicking.rs:50
  14: rayon_core::sleep::Sleep::announce_sleepy
  15: rayon_core::registry::WorkerThread::wait_until_cold
  16: rayon_core::registry::ThreadBuilder::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Rayon: detected unexpected panic; aborting
@cuviper
Copy link
Member

cuviper commented Sep 11, 2020

Hmm, I thought the JEC was supposed to use wrapping add -- @nikomatsakis do you know why that's a bare addition?

@cuviper
Copy link
Member

cuviper commented Sep 11, 2020

Prior to a0efb4a it was using fetch_add, which implicitly wraps.

@ebarnard
Copy link
Author

A minimal reproduction of this bug on 32-bit platforms is:

fn main() {
    let mut i = 0;
    let mut j = 0;

    for _ in 0..u32::MAX {
        rayon::join(|| i += 1, || j += 1);
    }

    println!("{} {}", i, j);
}

@cuviper
Copy link
Member

cuviper commented Sep 15, 2020

Thanks - I can reliably reproduce the overflow panic with 100,000 iterations, and it works when changing Counters::plus to use wrapping_add. With u32::MAX iterations, it hasn't completed yet, but at least it hasn't crashed yet either... 😅

@nikomatsakis
Copy link
Member

@cuviper I believe it was meant to be a wrapping add indeed, I think I just wasn't thinking. I guess I will review the code to double check my understanding.

@nikomatsakis
Copy link
Member

From reading the code, and the README.md in particular, I definitely think it was meant to rollover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants