Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharded shuffle concurrency test failures with rust 1.78 #1036

Closed
andyleiserson opened this issue May 2, 2024 · 4 comments
Closed

Sharded shuffle concurrency test failures with rust 1.78 #1036

andyleiserson opened this issue May 2, 2024 · 4 comments

Comments

@andyleiserson
Copy link
Collaborator

I see this failure consistently on my local machine (aarch64-apple-darwin) after upgrading to rust 1.78, but not in CI (e.g. https://github.com/private-attribution/ipa/actions/runs/8929174792/job/24526364966?pr=1035).

Command: cargo test -p ipa-core --release --features "shuttle multi-threading" -- shuffle::sharded::tests::empty (non_empty_input also fails)

Output:

---- protocol::ipa_prf::shuffle::sharded::tests::empty stdout ----
TestWorld random seed 9634094103382193149
2024-05-02T19:42:39.728593Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.728612Z  INFO {h=A}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728625Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.728627Z  INFO {h=B}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728633Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.728636Z  INFO {h=C}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728646Z  INFO {h=A}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: close time.busy=3.08µs time.idle=31.9µs
2024-05-02T19:42:39.728651Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: close time.busy=21.5µs time.idle=38.1µs
2024-05-02T19:42:39.728957Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728965Z  INFO {h=B}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: close time.busy=583ns time.idle=337µs
2024-05-02T19:42:39.728967Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: close time.busy=5.67µs time.idle=336µs
2024-05-02T19:42:39.728969Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.728974Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
WARNING: Shuttle only correctly models SeqCst atomics and treats all other Orderings as if they were SeqCst. Bugs caused by weaker orderings like Relaxed may be missed. See https://docs.rs/shuttle/*/shuttle/sync/atomic/index.html#warning-about-relaxed-behaviors for details or to disable this warning.
2024-05-02T19:42:39.728995Z  INFO {h=C}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: close time.busy=500ns time.idle=359µs
2024-05-02T19:42:39.728997Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: close time.busy=18.5µs time.idle=345µs
TestWorld random seed 5841473747150548803
2024-05-02T19:42:39.729123Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729125Z  INFO {h=A}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729132Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: close time.busy=2.71µs time.idle=173µs
2024-05-02T19:42:39.729135Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: close time.busy=5.38µs time.idle=160µs
2024-05-02T19:42:39.729138Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: close time.busy=2.29µs time.idle=161µs
2024-05-02T19:42:39.729140Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729141Z  INFO {h=A}:transport_loop{id=ShardIndex(1)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729148Z  INFO {h=A}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729149Z  INFO {h=A}:transport_loop{id=ShardIndex(2)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729156Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729157Z  INFO {h=B}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729162Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729163Z  INFO {h=B}:transport_loop{id=ShardIndex(1)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729168Z  INFO {h=B}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729170Z  INFO {h=B}:transport_loop{id=ShardIndex(2)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729176Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729177Z  INFO {h=C}:transport_loop{id=ShardIndex(0)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729184Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729185Z  INFO {h=C}:transport_loop{id=ShardIndex(1)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729191Z  INFO {h=C}: ipa_core::helpers::transport::in_memory::sharding: new
2024-05-02T19:42:39.729192Z  INFO {h=C}:transport_loop{id=ShardIndex(2)}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729493Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729499Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729533Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729840Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729845Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.729850Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.730153Z  INFO transport_loop{id=A}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.730159Z  INFO transport_loop{id=B}: ipa_core::helpers::transport::in_memory::transport: new
2024-05-02T19:42:39.730163Z  INFO transport_loop{id=C}: ipa_core::helpers::transport::in_memory::transport: new
test panicked in task 'main-thread'
failing schedule:
"
9107e50ac8c6dcd4f3eb9db51b00000000020004000000ff0a0806000c0000000000000e000a
0a00000a000a0e0000000000000000000000000000100c0000000c0000000c00100c00000000
0000000000000000000000001200000008080000000800080000001200000000000000000000
00000000000000000000000000000000000014080a0c000016180000001a00201c001e220000
2400002a2826002c00002e00323000003600ff69000000000000702c2c2c7000742c00003030
0000000030743000000000000000000000000000000000000000000000000000000078003838
7c3c00383878003c3c7c0000003c000000000000000000000000000000000000000000000000
0000000000000000000080280084282880003030308400302800000000000000000000000000
0000000000000000000000000000000000000000008c8800343434003488003c003c003c8c00
3c00000000000000000000000000000000000000000000000000000000000000000000000090
00280028000000942828902c2c002c942c000000000000000000000000000000000000000000
0000000000000000000000000098282800009c2c282c280000000000000000982c2c009c0000
000000000000000000000000000000000000000000a0340000a438383838a40000340034a000
3400000000000000000000000000000000000000000000000000000000000000000000000000
0000a8006c000000006c6ca86c00000000000000000000000000000000b00044004400ac4440
404000004400ac00b00000400000000000000000000000000000000000000000000000000000
00000000000000000000000000000000b42c2c002c0000b4b80030302c30b800000030000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000bc545454005400bc0000000000000000000000000000000000000000c000c4
4400484848000048c4004444c000440000000000000000000000000000000000000000000000
00000000000000000000000000000000000000ccc830303000cc2828300000002828c8000000
000000000000000000000000000000000000000000000000000000000000000000d05c5c5cd0
5c000000000000000000000000000000000000000000000000000000d4000000600000006060
d400600000000000000000000000d834343434000000dcd83c00003c003c3c0000dc00000000
00000000000000000000000000000000000000000000000000000000e0000000400040e40048
4848004800e440000000e0000000000000400000000000000000000000000000000000000000
00000000e84040004000e8ec404800000000004848ec00480000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000f0680000680000
686800f00000000000000000000000000000000000f40000f80038380038f800343400380000
000034f400003400000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000fc4040004000000041fc4444440044000100000000000000
0000000000000000000000000000000000000000000000000000000000000000040100000050
00005050040100500000000000000000000000000000000008010c013c3c00003c3c38000000
0038380c39080100000000000000000000000000000000000000000000000000000000000010
4d4c00004c10014c000000000000000000000000000000000000000000000000140144001801
4448000048444400001449000048001801000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000
"
pass that string to `shuttle::replay` to replay the failure
thread 'protocol::ipa_prf::shuffle::sharded::tests::empty' panicked at ipa-core/src/helpers/transport/stream/collection.rs:90:25:
assertion failed: will_wake
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   4: <ipa_core::helpers::transport::receive::LogErrors<S,T,E> as futures_core::stream::Stream>::poll_next
   5: <ipa_core::helpers::buffers::unordered_receiver::Receiver<S,C,M> as core::future::future::Future>::poll
   6: ipa_core::helpers::gateway::receive::MpcReceivingEnd<M>::receive::{{closure}}::{{closure}}
   7: ipa_core::protocol::ipa_prf::shuffle::sharded::shuffle::{{closure}}
   8: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
   9: <futures_util::future::maybe_done::MaybeDone<Fut> as core::future::future::Future>::poll
  10: <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll
  11: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
  12: <futures_util::stream::futures_ordered::OrderWrapper<T> as core::future::future::Future>::poll
  13: <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
  14: <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
  15: <futures_util::stream::stream::collect::Collect<St,C> as core::future::future::Future>::poll
  16: <ipa_core::test_fixture::world::TestWorld<ipa_core::test_fixture::world::WithShards<_,D>> as ipa_core::test_fixture::world::Runner<ipa_core::test_fixture::world::WithShards<_,D>>>::semi_honest::{{closure}}
  17: ipa_core::protocol::ipa_prf::shuffle::sharded::tests::sharded_shuffle::{{closure}}
  18: shuttle::future::block_on
  19: shuttle::thread::thread_fn
  20: generator::stack::StackBox<F>::call_once
  21: generator::gen_impl::gen_init
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    protocol::ipa_prf::shuffle::sharded::tests::empty

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 30 filtered out; finished in 0.01s

error: test failed, to rerun pass `-p ipa-core --lib`
@akoshelev
Copy link
Collaborator

Can you try to pin the shuttle version 0.6 and see if it works. There was also a release of 0.7 along with rustc that could've broken it.

@andyleiserson
Copy link
Collaborator Author

It fails with either version 0.6 or 0.7 of shuttle. Unfortunately only release builds fail.

I added some debug prints to StreamCollection::add_waker that suggest it is a false positive, because the original and replacement wakers are the same until cloned in add_waker. (The pairs of hex values are the data and vtable members of the raw waker.)

2024-05-02T21:49:30.542275Z  INFO ipa_core::helpers::transport::stream::collection: add_waker(0x6000038a40e0): H2 protocol/iter0/cardinality (0x14c307ad0 0x1004ec598)
2024-05-02T21:49:30.542310Z  INFO ipa_core::helpers::transport::stream::collection: inserting waker 0x14c307ad0 0x1004ea2f0
...
2024-05-02T21:49:35.377006Z  INFO ipa_core::helpers::transport::stream::collection: add_waker(0x6000038a40e0): H2 protocol/iter0/cardinality (0x14c307ad0 0x1004ec598)
2024-05-02T21:49:35.377050Z  INFO ipa_core::helpers::transport::stream::collection: found existing waker 0x14c307ad0 0x1004ea2f0

The original and cloned vtables even have the same contents, even though they are distinct in memory:

(lldb) x/4gx 0x1004ea2f0
0x1004ea2f0: 0x000000010004e958 0x000000010004de28
0x1004ea300: 0x000000010004ea80 0x000000010004d824
(lldb) x/4gx 0x1004ec598
0x1004ec598: 0x000000010004e958 0x000000010004de28
0x1004ec5a8: 0x000000010004ea80 0x000000010004d824

@andyleiserson
Copy link
Collaborator Author

Here we go: rust-lang/rust#119863

@akoshelev
Copy link
Collaborator

akoshelev commented May 3, 2024

great catch, I guess we need to get rid of will_wake checks - they don't do us any good.

Will work on that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants