Skip to content

Replay-immune votes#2087

Closed
achamayou wants to merge 13 commits into
microsoft:masterfrom
achamayou:replay_immune_proposal_id
Closed

Replay-immune votes#2087
achamayou wants to merge 13 commits into
microsoft:masterfrom
achamayou:replay_immune_proposal_id

Conversation

@achamayou
Copy link
Copy Markdown
Member

Draft open for discussion. Final part of #1649.

This is using only a digest of the request, not bound to the root of the tree. There is a difficulty in obtaining past_root(read_version) in a thread-safe way: compaction may have occurred between taking read_version and calling past_root() on the history.

I am not clear on the best way to solve this, my intuition is that it is necessary to create a variant of the initial get_view() call that atomically marks the version in the history as being a compaction watermark, and to release that watermark once past_root() has been called. That's fairly awkward, because the initial get_view() happens deep in our code, before the user transaction. In the case of BFT, it's even worse, the read version is set before we can possibly know this is a "propose", on the initial read from the request table.

Ideas would be most welcome.

@eddyashton
Copy link
Copy Markdown
Member

We should add a note in the CHANGELOG.

For root retrieval:

  • Since we already store a range of recent nodes (for receipt retrieval), maybe we can rely on this rather than additional locking? ie - assume that by the time a proposal gets to this point, the flushed index won't be MAX_HISTORY_LEN ahead of this transaction's read index, so the past root will still be retrievable. I don't know what the safety net is when this heuristic fails - I guess if past_root says "that's too early", this transaction needs to throw an "I feel like I've conflicted please retry me" exception? But this feels like opening a dangerous situation where the service is so busy (and making constant progress on business transactions!) that you can't open any proposals...
  • If we're trying to lock versions in the history so they won't be compacted, my gut says we'd need to do it on every transaction, rather than trying to work out which are proposals before doing any reads. Maybe we can identify proposals early in the execution (ie - just after reading from the requests table in the BFT case), and try to lock a version then (shrinking the gap before we call past_root so it hopefully isn't compacted, and only doing the lock for the few transactions that need it), but this would still be best-effort and need error handling like above for when the tree had been flushed in the interim.

I'll keep thinking about this.

@achamayou
Copy link
Copy Markdown
Member Author

  • Add note in CHANGELOG.md

Relaxing/changing the history interval is good, but only helps probabilistically as you rightly point out. Conflicting on failure in this case is awkward because there is no guarantee the retry will succeed. In pure CFT, it might be ok to do that as a first pass and then do something more expensive the second time, but that will need to happen every time on backups in BFT.

I will set up a meeting to discuss this further, but please feel free to mention any ideas here too.

@achamayou achamayou changed the title Replay immune proposal Replay-immune proposals Jan 18, 2021
@ghost
Copy link
Copy Markdown

ghost commented Jan 18, 2021

replay_immune_proposal_id@17675 aka 20210118.19 vs master ewma over 50 builds from 17031 to 17654
images

@eddyashton
Copy link
Copy Markdown
Member

I think there's a simpler answer here. Rather than trying to intercept the first call to get_view (where the read version is currently set), we can just add a new call on the Tx, immediately after its constructed but before we've done anything else to it, which both gets the root and sets the read version. The Txs are now always constructed inside frontend.h, when we have an RpcContext to query to distinguish proposals. Something like:

// In frontend.h, everywhere we call `create_tx()`
auto tx = tables.create_tx();
if (request_needs_iniital_merkle_root(ctx))
{
  tx.fetch_merkle_root(); // Maybe needs to take history as an arg? 
}

...

bool request_needs_initial_merkle_root(...)
{
  return ctx->get_request_path().ends_with("proposal");
}

...

const auto tx_read_merkle_root = tx.get_read_merkle_root();
if (!tx_read_merkle_root.has_value())
{
 // Uh oh
}
proposal_id = fmt::format(...);

@achamayou
Copy link
Copy Markdown
Member Author

@eddyashton that sounds good, there are a few implications:

  1. We need a lock on history
  2. History needs to keep track of the current version, because we don't want a lock that also covers the store to be able to grab this in sync
  3. BFT backups need to reset the read version after they've deserialised the request (but they probably do that already?). I don't know how we can make sure they haven't truncated their tree though.

@achamayou
Copy link
Copy Markdown
Member Author

After discussion, let's go with this with the twist that you proposed: add a read dependency on signatures. That way we're guaranteed never to have a signature (and therefore a possible compaction) between our read_version and our commit version. This should hold even on BFT backups, without having to bundle any additional meta data.

@achamayou achamayou changed the title Replay-immune proposals Replay-immune votes Jan 21, 2021
@achamayou
Copy link
Copy Markdown
Member Author

achamayou commented Jan 22, 2021

I am going to split this change in at least the following items:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants