Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded code that calls Stronghold::load_client panics in various ways #353

Closed
PhilippGackstatter opened this issue May 4, 2022 · 2 comments
Assignees
Labels
bug Something isn't working rust Pull requests that update Rust code

Comments

@PhilippGackstatter
Copy link
Collaborator

Bug description

Multi-threaded code that uses Stronghold::load_client panics in various ways, depending on some parameters.

Rust version

Which version of Rust are you running?

  • Rust version: 1.60.0

Stronghold version

Which version of Stronghold are you using?

Current dev-refactor branch, commit: 629466da.

Hardware specification

What hardware are you using?

  • Operating system: Ubuntu 20.04

If hardware details are important, I'm happy to provide them.

Steps To reproduce the bug

Explain how the maintainer can reproduce the bug.

Run the following test multiple times. On my machine, the results were different across executions.

#[test]
fn test_stronghold_multi_threading() {
    let client_path = b"client_path".to_vec();
    let client_path2 = b"client_path2".to_vec();

    let stronghold1 = Stronghold::default();

    let client = stronghold1.create_client(&client_path).unwrap();
    let client2 = stronghold1.create_client(&client_path2).unwrap();

    stronghold1.write_client(&client_path).unwrap();
    stronghold1.write_client(&client_path2).unwrap();

    let stronghold2 = stronghold1.clone();

    let t1 = std::thread::spawn(move || {
        for i in 0..20 {
            let cl = stronghold1.load_client(&client_path).unwrap();
            cl.store().insert(b"test".to_vec(), b"value".to_vec(), None).unwrap();
        }
    });

    let t2 = std::thread::spawn(move || {
        for i in 0..20 {
            let cl = stronghold2.load_client(&client_path2).unwrap();
            cl.store().insert(b"test".to_vec(), b"value".to_vec(), None).unwrap();
        }
    });

    t1.join().unwrap();
    t2.join().unwrap();
}

Expected behaviour

Test always finishes successfully, i.e. without panics.

Actual behaviour

I observed 5 different results:

  1. Test binary finishes successfully (exit code 0)
  2. thread '' panicked at 'called Result::unwrap() on an Err value: LockAcquireFailed', client/src/tests/interface_tests.rs:162:60
  3. thread '' panicked at 'Releases exceeded retains', engine/runtime/src/boxed.rs:188:9
  4. Caused by: process didn't exit successfully: ~/git/stronghold.rs/target/debug/deps/iota_stronghold-18f1a25c50491033 test_stronghold_multi_threading --nocapture (signal: 11, SIGSEGV: invalid memory reference)
  5. fish: Job 1, './target/debug/deps/iota_strong…' terminated by signal SIGSEGV (Address boundary error)

Regarding case 2: The current stronghold code uses try_lock everywhere, which means that if a lock cannot be acquired immediately, an error is returned to a user. In my opinion, this should be changed to a lock call, i.e. the call should block until the lock can be acquired. It should not be the stronghold user's responsibility to implement retry behaviour. This error was the original cause for writing the above test, which turned out to result in other behaviour, too.

The issue seems to come from Stronghold::load_client invocations, and in turn from the Snapshot::get_state call, which calls KeyStore::get_key. I think somewhere in that code area, the issue originates from, though I did not track it any further. The call to the store can be replaced with a procedure execution and will result in the same behaviour, so that specific line is most likely not the cause of the errors.

@PhilippGackstatter PhilippGackstatter added bug Something isn't working rust Pull requests that update Rust code labels May 4, 2022
@felsweg-iota felsweg-iota self-assigned this May 12, 2022
@felsweg-iota
Copy link
Contributor

Hey @PhilippGackstatter thanks for reporting this bug.

As we discussed earlier, using mutexes / locks was intended to be only a temporary solution. try_lock() was used as an mechanism to always fail with an error, instead of running into deadlocks with lock()

thread '' panicked at 'Releases exceeded retains', engine/runtime/src/boxed.rs:188:9
This is a race condition.

With rust >= 1.61.x the currently employed std::sync::Mutex are denied by clippy, because this MutexGuard is held across an await point.

Solving this bug would either require proof of absence of dead locks, or finalizing the stm based approach.

@PhilippGackstatter
Copy link
Collaborator Author

I can confirm the test I attached in the issue runs (with minimal changes due to the changed interface, i.e. replacing load_client with get_client) and the example that we deactivated due to this issue also runs. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rust Pull requests that update Rust code
Projects
None yet
Development

No branches or pull requests

2 participants