Store proof info (keys, uptime, version) directly in LMDB #952

jagerman · 2019-12-04T20:54:27Z

Edit: see revamped version below.

The map was being initialized too early, from before recent state proof
data had been set, and so on startup quorumnet SN authentication was
failing to recognize incoming SN connections until a proof had been
received from the remote SN.

This also adds a shared mutex around the x25519 map access.

Doy-lee · 2019-12-04T23:12:04Z

It looks like there are several issues. First we were restoring the keys from state_archive which has old proof information (10k blocks old). But supposing we load the latest proof info, that information that get serializeds to DB is actually 12hrs old (since we re-derive everything from 12hrs back), if you shut down and restart your node.

It also seems like a commit got lost somewhere, I thought I remember you deleting the checkpointing part in this function. But right now we're still only storing quorums from the last immutable checkpoint -> chain height in short_term_state_cull_height

  static uint64_t short_term_state_cull_height(uint8_t hf_version, cryptonote::BlockchainDB const *db, uint64_t block_height)
  {
    size_t constexpr DEFAULT_SHORT_TERM_STATE_HISTORY = 6 * STATE_CHANGE_TX_LIFETIME_IN_BLOCKS;
    static_assert(DEFAULT_SHORT_TERM_STATE_HISTORY >= BLOCKS_EXPECTED_IN_HOURS(12), // Arbitrary, but raises a compilation failure if it gets shortened.
        "not enough short term state storage for blink quorum retrieval!");
    uint64_t result =
        (block_height < DEFAULT_SHORT_TERM_STATE_HISTORY) ? 0 : block_height - DEFAULT_SHORT_TERM_STATE_HISTORY;

    if (hf_version >= cryptonote::network_version_13_enforce_checkpoints)
    {
      uint64_t latest_height = db->height() - 1;
      cryptonote::checkpoint_t checkpoint;
      if (db->get_immutable_checkpoint(&checkpoint, latest_height))
        result = std::min(result, checkpoint.height - 1);
    }
    return result;
  }

So actually, the proof information being stored is probably only 12 blocks old.

The missing x25519 pubkeys might be from restoring from the archive and not the latest proof. I don't think the moving of x25519 map init to the end of the init function is the fix, rather that it was restoring from the wrong state and also this decouples the x25519 from the load step which doesn't feel right.

jagerman · 2019-12-05T00:40:07Z

I think we should just move to proof info to a completely separate database; it doesn't really have much to do with state and isn't something that we ever care about more than the latest value (also it isn't affected by rollbacks).

jagerman · 2019-12-08T17:48:23Z

Mostly rewrote this PR to remove proofs from stored states:

This extracts uptime proof data entirely from service node states,
instead storing (some) proof data as its own first-class object in the
code and backed by the database. We now persistently store:

timestamp
version
ip & ports
ed25519 pubkey

and update it every time a proof is received. Upon restart, we load all
the proofs from the database, which means we no longer lose last proof
received times on restart, and always have the most recently received ip
and port information (rather than only having whatever it was in the
most recently stored state, which could be ~12 blocks ago). It also
means we don't need to spam the network with proofs for up to half an
hour after a restart because we now know when we last sent (and
received) our own proof before restarting.

This separation required some restructuring: most notably changing
state_t to be constructed with a service_node_list pointer, which it
needs both directly and for BlockchainDB access. Related to this is
also eliminating the separate BlockchainDB pointer stored in
service_node_list in favour of just accessing it via m_blockchain (which
now has a new has_db() method to detect when it has been initialized).

Doy-lee

I'm just not too sure about restoring the uptime proofs. Reason being we didn't trust that the P2P layer is a fully connected graph, so some nodes might never see the uptime proof at all- so we made it relay aggressively in general.

I don't know. It's hard to say, the only way I'd really trust it is when we convert the system to use your zmq layer.

Doy-lee · 2019-12-08T23:38:07Z

src/blockchain_db/lmdb/db_lmdb.cpp


-typedef mdb_block_info_3 mdb_block_info;
+static_assert(sizeof(mdb_block_info_3) == sizeof(mdb_block_info_1) + 16, "unexpected mdb_block_info struct sizes");


Technically we don't even need old migration code and the various version, v3 is more than 2 hard forks old at this point.

Since it's there already I'm kind of tempted to leave it in, mainly because it won't break the case where someone finds some ancient lmdb they forgot about and tries to update lokid. Not too concerned about that, but like I said, the support is already there.

src/blockchain_db/lmdb/db_lmdb.cpp

src/cryptonote_core/service_node_list.cpp

Doy-lee · 2019-12-09T01:00:13Z

src/cryptonote_core/service_node_list.cpp

  : height{state.height}
  , key_image_blacklist{std::move(state.key_image_blacklist)}
  , only_loaded_quorums{state.only_stored_quorums}
  , block_hash{state.block_hash}
+  , sn_list{snl}


I've removed the blockchain dependency in the constructor in #956 I'd prefer we don't give state_t access to the service node list if we don't have to.

It's not really going to be feasible with a much more significant rearchitecting: state_t is the place where we process state changes and registrations but it needs to be able to get at proof data to update it. Taking it in the constructor like this is slightly cleaner than the alternative of needing to take it as an argument in multiple methods.

Ideally processing of state changes should be done in service_node_list, which then sets up the state_t as needed, but like I said, that's a bigger change than I wanted to introduce at this point.

jagerman · 2019-12-09T18:49:43Z

I'm just not too sure about restoring the uptime proofs. Reason being we didn't trust that the P2P layer is a fully connected graph, so some nodes might never see the uptime proof at all- so we made it relay aggressively in general.

I've added a commit to restore that aggressive sending by clearing our own timestamp when loading.

src/blockchain_db/lmdb/db_lmdb.cpp

src/cryptonote_core/service_node_list.cpp

Various code improvements to LMDB code (no logic changes): - compile less code by moving the cursor macros (which are used often) into local functions - removes the TXN_POSTFIX_RDONLY macro because it does exactly nothing - remove useless `inline` from various functions in C++ only has one purpose: to tell the compiler that it's okay to have multiple definitions (and just throw the extra away when linking). It is not used by any modern C++ compiler to actually decide when to inline, and so has no use in a .cpp file. (Suspect this was written by a C programmer who doesn't fully understand C++). - Remove the C `typedef struct blah { ... } blah;` idiom which has no purpose in C++. - Use inheritance on the nearly-duplicate `mdb_block_info_{1,2,3} structs rather than repeating all the field in each one. This also removed a completely unneeded offsetof into them that was being used to avoid having to learn C++. (Also adds a static_assert in case anyone ever changes some contained type and breaks things). - Fix bi_weight comment about why it is really uin64_t - Add a missing method `override`. - Replace "22" buried in the code with a constant indicating what it is for. (Oh, reader, you wanted to know what it's used for? It's the number of different databases allowed in the LMDB) - Move `m_cur_whatever` macros into the .cpp file, and avoid using them in any code we've added in loki. These are incredibly useless macros: they are basically just defined to slightly shorten a name. I'm guessing that, at some point, these members got moved and rather than fix the remaining code someone defined a bunch of macros to deal with the rename. Yuck. (Getting rid of *all* of them (i.e. replacing all `m_cur_whatever` with `m_cursors->whatever` would be preferable, but would make future merge conflicts more painful, so I left the macros as is for the upstream databases but removed our service_node_data one.) - Rename `mdb_txn_cursors` members to drop the leading `m_txc_` prefix. When the struct is nothing more than a POD collection of values the prefix makes no sense at all. This allows us the write `m_cursors->service_node_data` instead of `m_cursors->m_txc_service_node_data` (or the horrible macro, above, which seems designed to just macro around this naming horror).

Passing by reference is a more natural C++ way of doing this. Keeps pointer constructors for compatibility.

Looking up by height currently goes via a mostly-empty implicitly created state_t, which is wasteful but was necessary with C++11. C++14 `std::set`s let us avoid this by making `state_t` and heights directly comparable for set lookups.

This extracts uptime proof data entirely from service node states, instead storing (some) proof data as its own first-class object in the code and backed by the database. We now persistently store: - timestamp - version - ip & ports - ed25519 pubkey and update it every time a proof is received. Upon restart, we load all the proofs from the database, which means we no longer lose last proof received times on restart, and always have the most recently received ip and port information (rather than only having whatever it was in the most recently stored state, which could be ~12 blocks ago). It also means we don't need to spam the network with proofs for up to half an hour after a restart because we now know when we last sent (and received) our own proof before restarting. This separation required some restructuring: most notably changing `state_t` to be constructed with a `service_node_list` pointer, which it needs both directly and for BlockchainDB access. Related to this is also eliminating the separate BlockchainDB pointer stored in service_node_list in favour of just accessing it via m_blockchain (which now has a new `has_db()` method to detect when it has been initialized).

This would wipe out proof data if the rescan crossed a registration, and wipe out the timestamp if it crossed a decommission. We don't want to do either when rescanning: our latest proof data would have already been reset the first time we cross the registration/decomm transactions.

This renames mdb_block_info_3 to mdb_block_info so that _1 and _2 are the old versions (you can't predeclare a typedef), and use it rather than a void pointer in the 64bit block value fetcher.

Also import boost::endian namespace because the boost::endian's throughout the file were needlessly verbose (the little_to_native calls are already obvious).

This restores the more aggressive sending of uptime proofs after a restart.

jagerman · 2019-12-10T02:28:19Z

Rebased and resolved conflicts (and squashed the last couple fixes into the previous commits)

jagerman force-pushed the sn-startup-key-fix branch from ab7c98e to a01036c Compare December 8, 2019 17:42

jagerman changed the title ~~Fix x25519 -> pubkey map generation~~ Store proof info (keys, uptime, version) directly in LMDB Dec 8, 2019

Doy-lee reviewed Dec 9, 2019

View reviewed changes

src/blockchain_db/lmdb/db_lmdb.cpp Outdated Show resolved Hide resolved

src/cryptonote_core/service_node_list.cpp Outdated Show resolved Hide resolved

jagerman added 10 commits December 9, 2019 22:25

db_txn_guard: allow BlockchainDB passed by reference

06bdc7d

Passing by reference is a more natural C++ way of doing this. Keeps pointer constructors for compatibility.

Eliminate state_t lookup temporaries (C++14)

c8ca6a0

Looking up by height currently goes via a mostly-empty implicitly created state_t, which is wasteful but was necessary with C++11. C++14 `std::set`s let us avoid this by making `state_t` and heights directly comparable for set lookups.

Take mdb_block_info pointer in 64bit value helper

9797a6a

This renames mdb_block_info_3 to mdb_block_info so that _1 and _2 are the old versions (you can't predeclare a typedef), and use it rather than a void pointer in the 64bit block value fetcher.

Iterate directly on service node info map

0829df4

Log warning when ed25519 -> x25519 conversion fails

2817148

Simplify service_node_proof_serialized ctor

ed0adc3

Also import boost::endian namespace because the boost::endian's throughout the file were needlessly verbose (the little_to_native calls are already obvious).

Clear our own proof timestamp when loading

d0abfdd

This restores the more aggressive sending of uptime proofs after a restart.

jagerman force-pushed the sn-startup-key-fix branch from 521a927 to d0abfdd Compare December 10, 2019 02:27

Doy-lee merged commit 7054982 into oxen-io:dev Dec 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store proof info (keys, uptime, version) directly in LMDB #952

Store proof info (keys, uptime, version) directly in LMDB #952

jagerman commented Dec 4, 2019 •

edited

Loading

Doy-lee commented Dec 4, 2019

jagerman commented Dec 5, 2019

jagerman commented Dec 8, 2019

Doy-lee left a comment

Doy-lee Dec 8, 2019

jagerman Dec 9, 2019

Doy-lee Dec 9, 2019

jagerman Dec 9, 2019

jagerman Dec 9, 2019

jagerman commented Dec 9, 2019

jagerman commented Dec 10, 2019


		typedef mdb_block_info_3 mdb_block_info;
		static_assert(sizeof(mdb_block_info_3) == sizeof(mdb_block_info_1) + 16, "unexpected mdb_block_info struct sizes");

Store proof info (keys, uptime, version) directly in LMDB #952

Store proof info (keys, uptime, version) directly in LMDB #952

Conversation

jagerman commented Dec 4, 2019 • edited Loading

Doy-lee commented Dec 4, 2019

jagerman commented Dec 5, 2019

jagerman commented Dec 8, 2019

Doy-lee left a comment

Choose a reason for hiding this comment

Doy-lee Dec 8, 2019

Choose a reason for hiding this comment

jagerman Dec 9, 2019

Choose a reason for hiding this comment

Doy-lee Dec 9, 2019

Choose a reason for hiding this comment

jagerman Dec 9, 2019

Choose a reason for hiding this comment

jagerman Dec 9, 2019

Choose a reason for hiding this comment

jagerman commented Dec 9, 2019

jagerman commented Dec 10, 2019

jagerman commented Dec 4, 2019 •

edited

Loading