Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miri detects undefined behavior in the test suite #952

Closed
HeroicKatora opened this issue Jun 29, 2022 · 5 comments · Fixed by #1013
Closed

Miri detects undefined behavior in the test suite #952

HeroicKatora opened this issue Jun 29, 2022 · 5 comments · Fixed by #1013

Comments

@HeroicKatora
Copy link

(Found while scrolling through https://miri.saethlin.dev/ub?crate=rayon&version=1.5.3)

Undefined Behavior: deallocating while item is protected: [SharedReadOnly for <2273653> (call 593590)]
    --> /root/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.3/src/registry.rs:479:9
     |
479  |         })
     |         ^ deallocating while item is protected: [SharedReadOnly for <2273653> (call 593590)]

Since this is a at the end of the closure, is it referring to deallocating a local variable? Such as a dangling reference to the job?


Further investigation running with full backtrace: It seems the issue only occurs when running a multi-threaded tests. The exact location varies from run to run (non-deterministic). The command used to run the test suite for me is:

MIRIFLAGS="-Zmiri-backtrace=full -Zmiri-disable-isolation" cargo +nightly miri test -- --test-threads=2

The location pin-pointed stays the same:

error: Undefined Behavior: deallocating while item is protected: [SharedReadOnly for <18205362> (call 4948034)]
    --> /home/andreas/code/investigate/rayon/rayon-core/src/registry.rs:479:9
     |
479  |         })
     |         ^ deallocating while item is protected: [SharedReadOnly for <18205362> (call 4948034)]
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
             
     = note: inside closure at /home/andreas/code/investigate/rayon/rayon-core/src/registry.rs:479:9
     = note: inside `std::thread::LocalKey::<rayon_core::latch::LockLatch>::try_with::<[closure@rayon_core::registry::Registry::in_worker_cold<[closure@rayon_core::join_context<[closure@src/iter/plumbing/mod.rs:417:17: 425:18], [closure@src/iter/plumbing/mod.rs:426:17: 434:18], iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>>::{closure#0}], (iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>)>::{closure#0}], (iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>)>` at /home/andreas/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:445:16
     = note: inside `std::thread::LocalKey::<rayon_core::latch::LockLatch>::with::<[closure@rayon_core::registry::Registry::in_worker_cold<[closure@rayon_core::join_context<[closure@src/iter/plumbing/mod.rs:417:17: 425:18], [closure@src/iter/plumbing/mod.rs:426:17: 434:18], iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>>::{closure#0}], (iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>)>::{closure#0}], (iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>)>` at /home/andreas/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:421:9
note: inside `rayon_core::registry::Registry::in_worker_cold::<[closure@rayon_core::join_context<[closure@src/iter/plumbing/mod.rs:417:17: 425:18], [closure@src/iter/plumbing/mod.rs:426:17: 434:18], iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>>::{closure#0}], (iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>)>` at /home/andreas/code/investigate/rayon/rayon-core/src/registry.rs:461:9
    --> /home/andreas/code/investigate/rayon/rayon-core/src/registry.rs:461:9
     |
461  | /         LOCK_LATCH.with(|l| {
462  | |             // This thread isn't a member of *any* thread pool, so just block.
463  | |             debug_assert!(WorkerThread::current().is_null());
464  | |             let job = StackJob::new(
...    |
478  | |             job.into_result()
479  | |         })
     | |__________^
note: inside `rayon_core::registry::in_worker::<[closure@rayon_core::join_context<[closure@src/iter/plumbing/mod.rs:417:17: 425:18], [closure@src/iter/plumbing/mod.rs:426:17: 434:18], iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>>::{closure#0}], (iter::collect::consumer::CollectResult<std::vec::Vec<i32>>, iter::collect::consumer::CollectResult<std::vec::Vec<i32>>)>` at /home/andreas/code/investigate/rayon/rayon-core/src/registry.rs:879:13
    --> /home/andreas/code/investigate/rayon/rayon-core/src/registry.rs:879:13
     |
879  |             global_registry().in_worker_cold(op)
     |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@saethlin
Copy link
Contributor

Interesting, this always runs into the issue for me:

MIRIFLAGS=-Zmiri-disable-isolation cargo +nightly miri test -p rayon-core join_counter_overflow

@HeroicKatora
Copy link
Author

Miri pinpoints the creation of the protector tag as follows:

$ MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-track-pointer-tag=2782933" cargo +nightly miri test -p rayon-core join_counter_overflow

   --> rayon-core/src/latch.rs:372:5
    |
372 | /     fn set(&self) {
373 | |         L::set(self);
374 | |     }
    | |_____^ created tag 2782933
    |
    = note: inside `<&latch::LockLatch as latch::Latch>::set` at rayon-core/src/latch.rs:372:5
note: inside `<job::StackJob<&latch::LockLatch, [closure@rayon-core/src/registry.rs:465:17: 469:18], ((), ())> as job::Job>::execute` at rayon-core/src/job.rs:114:9
   --> rayon-core/src/job.rs:114:9
    |
114 |         this.latch.set();
    |         ^^^^^^^^^^^^^^^^
note: inside `job::JobRef::execute` at rayon-core/src/job.rs:57:9
   --> rayon-core/src/job.rs:57:9
    |
57  |         (self.execute_fn)(self.pointer)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `registry::WorkerThread::execute` at rayon-core/src/registry.rs:752:9
   --> rayon-core/src/registry.rs:752:9
    |
752 |         job.execute();
    |         ^^^^^^^^^^^^^
note: inside `registry::WorkerThread::wait_until_cold` at rayon-core/src/registry.rs:729:17
   --> rayon-core/src/registry.rs:729:17
    |
729 |                 self.execute(job);
    |                 ^^^^^^^^^^^^^^^^^
note: inside `registry::WorkerThread::wait_until::<latch::CountLatch>` at rayon-core/src/registry.rs:703:13
   --> rayon-core/src/registry.rs:703:13
    |
703 |             self.wait_until_cold(latch);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `registry::main_loop` at rayon-core/src/registry.rs:836:5
   --> rayon-core/src/registry.rs:836:5
    |
836 |     worker_thread.wait_until(my_terminate_latch);
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `registry::ThreadBuilder::run` at rayon-core/src/registry.rs:55:18
   --> rayon-core/src/registry.rs:55:18
    |
55  |         unsafe { main_loop(self.worker, self.registry, self.index) }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at rayon-core/src/registry.rs:100:20
   --> rayon-core/src/registry.rs:100:20
    |
100 |         b.spawn(|| thread.run())?;
    |                    ^^^^^^^^^^^^

@HeroicKatora
Copy link
Author

A protected tag is one created for a function parameter, and is popped when such function exits. This leads me to believe that the function latch.set() must still be active. As it's running in another thread, it's still scheduled or, worse, could it be blocked? Note that LockLatch::set contains code that takes a mutex by calling lock().

@cuviper
Copy link
Member

cuviper commented Jun 29, 2022

I think this is a generalization of rust-lang/rust#55005, and I also mentioned it in the zulip discussion here. The current proposal would stop emitting LLVM dereferenceable for !Freeze types, with a corresponding Miri change, but that's not settled.

The problem is that the code that calls the latch set makes it immediately possible for the waiter to resume and deallocate the latch, while the setter may still have its reference. We had real bugs with that in SpinLatch which are hopefully resolved in practice, though they still get flagged in Miri. For LockLatch, only Miri cares so far.

@cuviper
Copy link
Member

cuviper commented Jan 19, 2023

I believe #1011 will fix this, but I would appreciate review on that. In my test, join_counter_overflow formerly got a rapid error, and now runs fine for a long time, longer than I was willing to wait. Hopefully that's just miri being slow, and not a state perturbation that hid the error until later...

@cuviper cuviper linked a pull request Jan 22, 2023 that will close this issue
@bors bors bot closed this as completed in #1013 Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants