-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce deadlock #3995
Reproduce deadlock #3995
Conversation
ce5bb52
to
5580add
Compare
@@ -804,6 +804,7 @@ impl JsonRpcClient for WsClient { | |||
return WsClientBuilder::default() | |||
.use_webpki_rustls() | |||
.max_concurrent_requests(u16::MAX as usize) | |||
.request_timeout(Duration::from_secs(3600)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is just making it really slow?
let db = Database::new( | ||
fedimint_rocksdb::RocksDb::open(tempfile::tempdir().unwrap()).unwrap(), | ||
decoders, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't repro with only client db changed (with server memdb)
and I thought this was client side deadlock, maybe it is server side?
this could be separate deadlock not same as #3989 |
This looks like Will keep debugging. |
I've pin-pointed the problem the that AlephBFT hanging in impl std::io::Write for UnitSaver {
fn write(&mut self, buffer: &[u8]) -> std::io::Result<usize> {
self.buffer.extend_from_slice(buffer);
Ok(buffer.len())
}
fn flush(&mut self) -> std::io::Result<()> {
info!(units_index = %self.units_index, "##### UNIT FLUSH START");
block_on(async {
let mut dbtx = self.db.begin_transaction().await;
dbtx.insert_new_entry(&AlephUnitsKey(self.units_index), &self.buffer)
.await;
dbtx.commit_tx_result()
.await
.expect("This is the only place where we write to this key");
})
info!(units_index = %self.units_index, "##### UNIT FLUSH END");
self.buffer.clear();
self.units_index += 1;
Ok(())
}
} It is using I'm pretty sure this Anyway - I think it makes sense, the only thing that this PR does is add The original purpose of this PR was to show that there are some problems with the database that are not about wrong implementation of MVCC in MemDatabase, but I think the above goes to show that this is not the case (at least not here). |
negative. yield_now is a very simple future implementation that will be pending once then always ready after that. there is nothing tokio specific there. this issue isn't a incompatibility between async runtimes because there is only one runtime in this project: tokio |
Edit: the below is wrong, I think. The The log line before that |
@dpc so you tried this approach I assume? fedimint/devimint/src/external.rs Lines 208 to 214 in 9c36777
|
Still relevant? |
No, this was related to the AlephBFT sync/async bug. |
This is an attempt to reproduce the deadlock issue arised on #3989
We can see it's possible to reproduce the issue with 2 small changes from master:
MemDatabase
toRocksDb
task::yield_now()
tocommit_tx()
I also increased the timeout of the connection requests just to avoid noise in the logs due to the low 60s default timeout.
One way to see a test deadlocking for debug purposes is to run:
RUST_LOG=trace,hyper=info,soketto=info,fedimint_server::net::framed=debug cargo test --package fedimint-mint-tests --test fedimint_mint_tests -- sends_ecash_out_of_band --exact --nocapture
The goal of this PR is just to show that there is some issue on Fedimint db layer that isn't exclusive to
MemDatabase
and that can be reproduced without adding new tests. Created an issue to track it: #3995