-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store proof info (keys, uptime, version) directly in LMDB #952
Conversation
It looks like there are several issues. First we were restoring the keys from state_archive which has old proof information (10k blocks old). But supposing we load the latest proof info, that information that get serializeds to DB is actually 12hrs old (since we re-derive everything from 12hrs back), if you shut down and restart your node. It also seems like a commit got lost somewhere, I thought I remember you deleting the checkpointing part in this function. But right now we're still only storing quorums from the
So actually, the proof information being stored is probably only 12 blocks old. The missing x25519 pubkeys might be from restoring from the archive and not the latest proof. I don't think the moving of x25519 map init to the end of the init function is the fix, rather that it was restoring from the wrong state and also this decouples the x25519 from the load step which doesn't feel right. |
I think we should just move to proof info to a completely separate database; it doesn't really have much to do with state and isn't something that we ever care about more than the latest value (also it isn't affected by rollbacks). |
ab7c98e
to
a01036c
Compare
Mostly rewrote this PR to remove proofs from stored states: This extracts uptime proof data entirely from service node states,
and update it every time a proof is received. Upon restart, we load all This separation required some restructuring: most notably changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just not too sure about restoring the uptime proofs. Reason being we didn't trust that the P2P layer is a fully connected graph, so some nodes might never see the uptime proof at all- so we made it relay aggressively in general.
I don't know. It's hard to say, the only way I'd really trust it is when we convert the system to use your zmq layer.
src/blockchain_db/lmdb/db_lmdb.cpp
Outdated
|
||
typedef mdb_block_info_3 mdb_block_info; | ||
static_assert(sizeof(mdb_block_info_3) == sizeof(mdb_block_info_1) + 16, "unexpected mdb_block_info struct sizes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we don't even need old migration code and the various version, v3 is more than 2 hard forks old at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's there already I'm kind of tempted to leave it in, mainly because it won't break the case where someone finds some ancient lmdb they forgot about and tries to update lokid. Not too concerned about that, but like I said, the support is already there.
: height{state.height} | ||
, key_image_blacklist{std::move(state.key_image_blacklist)} | ||
, only_loaded_quorums{state.only_stored_quorums} | ||
, block_hash{state.block_hash} | ||
, sn_list{snl} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the blockchain dependency in the constructor in #956 I'd prefer we don't give state_t access to the service node list if we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really going to be feasible with a much more significant rearchitecting: state_t
is the place where we process state changes and registrations but it needs to be able to get at proof data to update it. Taking it in the constructor like this is slightly cleaner than the alternative of needing to take it as an argument in multiple methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally processing of state changes should be done in service_node_list, which then sets up the state_t as needed, but like I said, that's a bigger change than I wanted to introduce at this point.
I've added a commit to restore that aggressive sending by clearing our own timestamp when loading. |
Various code improvements to LMDB code (no logic changes): - compile less code by moving the cursor macros (which are used often) into local functions - removes the TXN_POSTFIX_RDONLY macro because it does exactly nothing - remove useless `inline` from various functions in C++ only has one purpose: to tell the compiler that it's okay to have multiple definitions (and just throw the extra away when linking). It is not used by any modern C++ compiler to actually decide when to inline, and so has no use in a .cpp file. (Suspect this was written by a C programmer who doesn't fully understand C++). - Remove the C `typedef struct blah { ... } blah;` idiom which has no purpose in C++. - Use inheritance on the nearly-duplicate `mdb_block_info_{1,2,3} structs rather than repeating all the field in each one. This also removed a completely unneeded offsetof into them that was being used to avoid having to learn C++. (Also adds a static_assert in case anyone ever changes some contained type and breaks things). - Fix bi_weight comment about why it is really uin64_t - Add a missing method `override`. - Replace "22" buried in the code with a constant indicating what it is for. (Oh, reader, you wanted to know what it's used for? It's the number of different databases allowed in the LMDB) - Move `m_cur_whatever` macros into the .cpp file, and avoid using them in any code we've added in loki. These are incredibly useless macros: they are basically just defined to slightly shorten a name. I'm guessing that, at some point, these members got moved and rather than fix the remaining code someone defined a bunch of macros to deal with the rename. Yuck. (Getting rid of *all* of them (i.e. replacing all `m_cur_whatever` with `m_cursors->whatever` would be preferable, but would make future merge conflicts more painful, so I left the macros as is for the upstream databases but removed our service_node_data one.) - Rename `mdb_txn_cursors` members to drop the leading `m_txc_` prefix. When the struct is nothing more than a POD collection of values the prefix makes no sense at all. This allows us the write `m_cursors->service_node_data` instead of `m_cursors->m_txc_service_node_data` (or the horrible macro, above, which seems designed to just macro around this naming horror).
Passing by reference is a more natural C++ way of doing this. Keeps pointer constructors for compatibility.
Looking up by height currently goes via a mostly-empty implicitly created state_t, which is wasteful but was necessary with C++11. C++14 `std::set`s let us avoid this by making `state_t` and heights directly comparable for set lookups.
This extracts uptime proof data entirely from service node states, instead storing (some) proof data as its own first-class object in the code and backed by the database. We now persistently store: - timestamp - version - ip & ports - ed25519 pubkey and update it every time a proof is received. Upon restart, we load all the proofs from the database, which means we no longer lose last proof received times on restart, and always have the most recently received ip and port information (rather than only having whatever it was in the most recently stored state, which could be ~12 blocks ago). It also means we don't need to spam the network with proofs for up to half an hour after a restart because we now know when we last sent (and received) our own proof before restarting. This separation required some restructuring: most notably changing `state_t` to be constructed with a `service_node_list` pointer, which it needs both directly and for BlockchainDB access. Related to this is also eliminating the separate BlockchainDB pointer stored in service_node_list in favour of just accessing it via m_blockchain (which now has a new `has_db()` method to detect when it has been initialized).
This would wipe out proof data if the rescan crossed a registration, and wipe out the timestamp if it crossed a decommission. We don't want to do either when rescanning: our latest proof data would have already been reset the first time we cross the registration/decomm transactions.
This renames mdb_block_info_3 to mdb_block_info so that _1 and _2 are the old versions (you can't predeclare a typedef), and use it rather than a void pointer in the 64bit block value fetcher.
Also import boost::endian namespace because the boost::endian's throughout the file were needlessly verbose (the little_to_native calls are already obvious).
This restores the more aggressive sending of uptime proofs after a restart.
521a927
to
d0abfdd
Compare
Rebased and resolved conflicts (and squashed the last couple fixes into the previous commits) |
Edit: see revamped version below.
The map was being initialized too early, from before recent state proofdata had been set, and so on startup quorumnet SN authentication was
failing to recognize incoming SN connections until a proof had been
received from the remote SN.
This also adds a shared mutex around the x25519 map access.