-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save observed instances of ChunkStateWitness to the database for later analysis #11137
Conversation
Save the latest observed instances of ChunkStateWitness in the database for later analysis and debugging. Sometime witness validation goes wrong, and we're not sure why. It's impossible to debug such cases without the witness, and currently there's no way to view the offending witness. Saving the witnesses in the database will allow us to inspect weird witnesses, even if the node crashes.
Saving observed witnesses could be an attack vector. Someone could send thousands of witnesses and we'd save them all, which could overload the database and cause a denial of service. We want to save invalid witnesses, so we can't really filter anything out. Let's add a config option which controls whether the witnesses are saved. It's disabled by default, as saving witnesses poses a security risk.
Add a command which allows to print observed witnesses with the given height and shard id.
I just saw that the issue description mentions that we already save witnesses in some cases x.x
Is this correct? I didn't find anything like that in the code. We save |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11137 +/- ##
==========================================
+ Coverage 71.10% 71.15% +0.04%
==========================================
Files 773 775 +2
Lines 154176 154033 -143
Branches 154176 154033 -143
==========================================
- Hits 109622 109596 -26
+ Misses 40109 40003 -106
+ Partials 4445 4434 -11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -43,6 +43,10 @@ impl Client { | |||
transactions_storage_proof, | |||
)?; | |||
|
|||
if self.config.save_latest_witnesses { | |||
self.chain.chain_store.save_lateset_chunk_state_witness(&state_witness)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo, lateset->latest
pub shard_id: u64, | ||
/// Each witness has a random UUID to ensure that the key is unique. | ||
/// It allows to store multiple witnesses with the same height and shard_id. | ||
pub random_uuid: [u8; 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be the account id of the chunk producer that sent the witness? from chunk producer side it will be single, from chunk validator side it should expect a single witness from each chunk producer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include epoch_id in the key as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I'm not sure if there's any benefit to it.
Usually there'll be only one witness per (height, shard_id). The situations where there is more than one will be weird, where anything can happen, including sending multiple witnesses by the same chunk producer.
For debugging I think it's best to save everything that we receive, without deduplication. If I were debugging witness issues I'd like to know that the chunk producer is sending duplicate messages, and deduplicating by account id would hide that from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yes, using account id would hide the case of the same chunk producer sending duplicate witnesses. and witness already has the account it for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include epoch_id in the key as well
I think it's impossible to have multiple possible epochs at a height. There's exactly one valid block per height, and this block will always have a clearly defined epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but we could receive witnesses on different epochs. I can add an epoch filter as well.
|
||
impl LatestWitnessesKey { | ||
/// `LatestWitnessesKey` has custom serialization to ensure that the binary representation | ||
/// starts with big-endan height and shard_id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo in big-endan
|
||
// Go over witnesses with increasing (height, shard_id) and remove them until the limits are satisfied. | ||
// Height and shard id are stored in big-endian representation, so sorting the binary representation is | ||
// the same as sorting the integeres. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit typo in integeres
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh crap, I forgot to post the comments... Had reviewed this some time ago.
pub shard_id: u64, | ||
/// Each witness has a random UUID to ensure that the key is unique. | ||
/// It allows to store multiple witnesses with the same height and shard_id. | ||
pub random_uuid: [u8; 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub shard_id: u64, | ||
/// Each witness has a random UUID to ensure that the key is unique. | ||
/// It allows to store multiple witnesses with the same height and shard_id. | ||
pub random_uuid: [u8; 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include epoch_id in the key as well
@@ -43,6 +43,10 @@ impl Client { | |||
transactions_storage_proof, | |||
)?; | |||
|
|||
if self.config.save_latest_witnesses { | |||
self.chain.chain_store.save_lateset_chunk_state_witness(&state_witness)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also be calling this function at process_chunk_state_witness
where we first receive the state witness from an external chunk producer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also called in process_chunk_state_witness
witness: &ChunkStateWitness, | ||
) -> Result<(), std::io::Error> { | ||
let serialized_witness = borsh::to_vec(witness)?; | ||
let serialized_witness_size: u64 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumb question, why aren't we working with usize here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to serialize the size, so I wanted it to have a clearly defined representation. The exact type of usize
depends on the architecture, and I didn't want to do that in serialized data.
I guess it doesn't really matter, because no one is going to move the saved witnsses data to a different arch, but it feels more proper to use u64
.
I just learned that there's a |
There should be a return there, we don't want to save big witnesses.
/// `LatestWitnessesKey` has custom serialization to ensure that the binary representation | ||
/// starts with big-endian height and shard_id. | ||
/// This allows to query using a key prefix to find all witnesses for a given height (and shard_id). | ||
pub fn serialized(&self) -> [u8; 64] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need manual ser/deser instead of just using borsh?
from what I understand we use borsh whenever possible when encoding data in the db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the rows to be ordered by (height, shard_id) in big endian representation. I'm not sure if borsh would do that, it was quicker and easier to manually serialize it.
Save all of the observed instances of
ChunkStateWitness
to the database.They can later be fetched for debugging and analysis.
Sometimes things go wrong with witness validation, and we're not sure why. In such cases it would be good to take a look at the witness that failed the validation, but currently there's no way to see the witness, it disappears after validation. Saving the witnesses in the database will allow us to inspect the witnesses after it failed validation and see what exactly is wrong, even after node crash.
Adds a new database column
DBCol::LatestChunkStateWitnesses
, which keeps a set of latest observed witnesses. Size of this set is limited to4GB
and60*30 (30 min)
of instances. When the limit is hit, the witness with the lowest height is removed from the set.We can't really save all witnesses, at 4MB/s that would add up to 345GB/day, so we're forced to garbage collect the old witnesses. Having only the latest witnesses should be enough for debugging and analysis.
Witness saving could potentially be an attack vector, as we save all witnesses, even the ones that failed validation. An attacker could spam the node with thousands of witnesses, which would all be saved to the database, which could cause denial of service.
Because of that the feature is guarded by a new config option:
save_latest_witnesses
. By default it's false, so production nodes won't save anything, and won't be vulnerable to such attacks. It can be selectively enabled on test/canary nodes when needed.A new database command is added to access the saved witnesses:
The tool allows to print observed witnesses with given height (and optionally shard id).
Either as a debug print
{:?}/{:#?}
, or as a binary blob that can be pasted into rust code.I'm not sure if that's the best way to expose it for debugging, but it was the easiest one to implement. Maybe it should somehow be integrated with
debug-ui
? I'm not very familiar with it, idk if it'd make sense.Fixes: #11110
Similar to: #10599