Multithreaded code that calls `Stronghold::load_client` panics in various ways #353

PhilippGackstatter · 2022-05-04T16:04:30Z

Bug description

Multi-threaded code that uses Stronghold::load_client panics in various ways, depending on some parameters.

Rust version

Which version of Rust are you running?

Rust version: 1.60.0

Stronghold version

Which version of Stronghold are you using?

Current dev-refactor branch, commit: 629466da.

Hardware specification

What hardware are you using?

Operating system: Ubuntu 20.04

If hardware details are important, I'm happy to provide them.

Steps To reproduce the bug

Explain how the maintainer can reproduce the bug.

Run the following test multiple times. On my machine, the results were different across executions.

#[test]
fn test_stronghold_multi_threading() {
    let client_path = b"client_path".to_vec();
    let client_path2 = b"client_path2".to_vec();

    let stronghold1 = Stronghold::default();

    let client = stronghold1.create_client(&client_path).unwrap();
    let client2 = stronghold1.create_client(&client_path2).unwrap();

    stronghold1.write_client(&client_path).unwrap();
    stronghold1.write_client(&client_path2).unwrap();

    let stronghold2 = stronghold1.clone();

    let t1 = std::thread::spawn(move || {
        for i in 0..20 {
            let cl = stronghold1.load_client(&client_path).unwrap();
            cl.store().insert(b"test".to_vec(), b"value".to_vec(), None).unwrap();
        }
    });

    let t2 = std::thread::spawn(move || {
        for i in 0..20 {
            let cl = stronghold2.load_client(&client_path2).unwrap();
            cl.store().insert(b"test".to_vec(), b"value".to_vec(), None).unwrap();
        }
    });

    t1.join().unwrap();
    t2.join().unwrap();
}

Expected behaviour

Test always finishes successfully, i.e. without panics.

Actual behaviour

I observed 5 different results:

Test binary finishes successfully (exit code 0)
thread '' panicked at 'called Result::unwrap() on an Err value: LockAcquireFailed', client/src/tests/interface_tests.rs:162:60
thread '' panicked at 'Releases exceeded retains', engine/runtime/src/boxed.rs:188:9
Caused by: process didn't exit successfully: ~/git/stronghold.rs/target/debug/deps/iota_stronghold-18f1a25c50491033 test_stronghold_multi_threading --nocapture (signal: 11, SIGSEGV: invalid memory reference)
fish: Job 1, './target/debug/deps/iota_strong…' terminated by signal SIGSEGV (Address boundary error)

Regarding case 2: The current stronghold code uses try_lock everywhere, which means that if a lock cannot be acquired immediately, an error is returned to a user. In my opinion, this should be changed to a lock call, i.e. the call should block until the lock can be acquired. It should not be the stronghold user's responsibility to implement retry behaviour. This error was the original cause for writing the above test, which turned out to result in other behaviour, too.

The issue seems to come from Stronghold::load_client invocations, and in turn from the Snapshot::get_state call, which calls KeyStore::get_key. I think somewhere in that code area, the issue originates from, though I did not track it any further. The call to the store can be replaced with a procedure execution and will result in the same behaviour, so that specific line is most likely not the cause of the errors.

The text was updated successfully, but these errors were encountered:

felsweg-iota · 2022-05-23T07:55:16Z

Hey @PhilippGackstatter thanks for reporting this bug.

As we discussed earlier, using mutexes / locks was intended to be only a temporary solution. try_lock() was used as an mechanism to always fail with an error, instead of running into deadlocks with lock()

thread '' panicked at 'Releases exceeded retains', engine/runtime/src/boxed.rs:188:9
This is a race condition.

With rust >= 1.61.x the currently employed std::sync::Mutex are denied by clippy, because this MutexGuard is held across an await point.

Solving this bug would either require proof of absence of dead locks, or finalizing the stm based approach.

PhilippGackstatter · 2022-10-13T10:42:11Z

I can confirm the test I attached in the issue runs (with minimal changes due to the changed interface, i.e. replacing load_client with get_client) and the example that we deactivated due to this issue also runs. Thank you!

PhilippGackstatter added bug Something isn't working rust Pull requests that update Rust code labels May 4, 2022

PhilippGackstatter mentioned this issue May 4, 2022

Upgrade to new Stronghold interface iotaledger/identity.rs#787

Merged

10 tasks

felsweg-iota self-assigned this May 12, 2022

PhilippGackstatter mentioned this issue May 23, 2022

Add Diffie-Hellman key exchange for encryption to Account iotaledger/identity.rs#809

Merged

10 tasks

vuongDang mentioned this issue Oct 12, 2022

Concurrency with locks #441

Merged

7 tasks

felsweg-iota closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreaded code that calls `Stronghold::load_client` panics in various ways #353

Multithreaded code that calls `Stronghold::load_client` panics in various ways #353

PhilippGackstatter commented May 4, 2022

felsweg-iota commented May 23, 2022

PhilippGackstatter commented Oct 13, 2022

Multithreaded code that calls Stronghold::load_client panics in various ways #353

Multithreaded code that calls Stronghold::load_client panics in various ways #353

Comments

PhilippGackstatter commented May 4, 2022

Bug description

Rust version

Stronghold version

Hardware specification

Steps To reproduce the bug

Expected behaviour

Actual behaviour

felsweg-iota commented May 23, 2022

PhilippGackstatter commented Oct 13, 2022

Multithreaded code that calls `Stronghold::load_client` panics in various ways #353

Multithreaded code that calls `Stronghold::load_client` panics in various ways #353