Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store proof info (keys, uptime, version) directly in LMDB #952

Merged
merged 10 commits into from
Dec 10, 2019

Conversation

jagerman
Copy link
Member

@jagerman jagerman commented Dec 4, 2019

Edit: see revamped version below.

The map was being initialized too early, from before recent state proof
data had been set, and so on startup quorumnet SN authentication was
failing to recognize incoming SN connections until a proof had been
received from the remote SN.

This also adds a shared mutex around the x25519 map access.

@Doy-lee
Copy link
Collaborator

Doy-lee commented Dec 4, 2019

It looks like there are several issues. First we were restoring the keys from state_archive which has old proof information (10k blocks old). But supposing we load the latest proof info, that information that get serializeds to DB is actually 12hrs old (since we re-derive everything from 12hrs back), if you shut down and restart your node.

It also seems like a commit got lost somewhere, I thought I remember you deleting the checkpointing part in this function. But right now we're still only storing quorums from the last immutable checkpoint -> chain height in short_term_state_cull_height

  static uint64_t short_term_state_cull_height(uint8_t hf_version, cryptonote::BlockchainDB const *db, uint64_t block_height)
  {
    size_t constexpr DEFAULT_SHORT_TERM_STATE_HISTORY = 6 * STATE_CHANGE_TX_LIFETIME_IN_BLOCKS;
    static_assert(DEFAULT_SHORT_TERM_STATE_HISTORY >= BLOCKS_EXPECTED_IN_HOURS(12), // Arbitrary, but raises a compilation failure if it gets shortened.
        "not enough short term state storage for blink quorum retrieval!");
    uint64_t result =
        (block_height < DEFAULT_SHORT_TERM_STATE_HISTORY) ? 0 : block_height - DEFAULT_SHORT_TERM_STATE_HISTORY;

    if (hf_version >= cryptonote::network_version_13_enforce_checkpoints)
    {
      uint64_t latest_height = db->height() - 1;
      cryptonote::checkpoint_t checkpoint;
      if (db->get_immutable_checkpoint(&checkpoint, latest_height))
        result = std::min(result, checkpoint.height - 1);
    }
    return result;
  }

So actually, the proof information being stored is probably only 12 blocks old.

The missing x25519 pubkeys might be from restoring from the archive and not the latest proof. I don't think the moving of x25519 map init to the end of the init function is the fix, rather that it was restoring from the wrong state and also this decouples the x25519 from the load step which doesn't feel right.

@jagerman
Copy link
Member Author

jagerman commented Dec 5, 2019

I think we should just move to proof info to a completely separate database; it doesn't really have much to do with state and isn't something that we ever care about more than the latest value (also it isn't affected by rollbacks).

@jagerman jagerman changed the title Fix x25519 -> pubkey map generation Store proof info (keys, uptime, version) directly in LMDB Dec 8, 2019
@jagerman
Copy link
Member Author

jagerman commented Dec 8, 2019

Mostly rewrote this PR to remove proofs from stored states:

This extracts uptime proof data entirely from service node states,
instead storing (some) proof data as its own first-class object in the
code and backed by the database. We now persistently store:

  • timestamp
  • version
  • ip & ports
  • ed25519 pubkey

and update it every time a proof is received. Upon restart, we load all
the proofs from the database, which means we no longer lose last proof
received times on restart, and always have the most recently received ip
and port information (rather than only having whatever it was in the
most recently stored state, which could be ~12 blocks ago). It also
means we don't need to spam the network with proofs for up to half an
hour after a restart because we now know when we last sent (and
received) our own proof before restarting.

This separation required some restructuring: most notably changing
state_t to be constructed with a service_node_list pointer, which it
needs both directly and for BlockchainDB access. Related to this is
also eliminating the separate BlockchainDB pointer stored in
service_node_list in favour of just accessing it via m_blockchain (which
now has a new has_db() method to detect when it has been initialized).

Copy link
Collaborator

@Doy-lee Doy-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not too sure about restoring the uptime proofs. Reason being we didn't trust that the P2P layer is a fully connected graph, so some nodes might never see the uptime proof at all- so we made it relay aggressively in general.

I don't know. It's hard to say, the only way I'd really trust it is when we convert the system to use your zmq layer.


typedef mdb_block_info_3 mdb_block_info;
static_assert(sizeof(mdb_block_info_3) == sizeof(mdb_block_info_1) + 16, "unexpected mdb_block_info struct sizes");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we don't even need old migration code and the various version, v3 is more than 2 hard forks old at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's there already I'm kind of tempted to leave it in, mainly because it won't break the case where someone finds some ancient lmdb they forgot about and tries to update lokid. Not too concerned about that, but like I said, the support is already there.

src/blockchain_db/lmdb/db_lmdb.cpp Outdated Show resolved Hide resolved
src/blockchain_db/lmdb/db_lmdb.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/service_node_list.cpp Show resolved Hide resolved
src/cryptonote_core/service_node_list.cpp Outdated Show resolved Hide resolved
: height{state.height}
, key_image_blacklist{std::move(state.key_image_blacklist)}
, only_loaded_quorums{state.only_stored_quorums}
, block_hash{state.block_hash}
, sn_list{snl}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the blockchain dependency in the constructor in #956 I'd prefer we don't give state_t access to the service node list if we don't have to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really going to be feasible with a much more significant rearchitecting: state_t is the place where we process state changes and registrations but it needs to be able to get at proof data to update it. Taking it in the constructor like this is slightly cleaner than the alternative of needing to take it as an argument in multiple methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally processing of state changes should be done in service_node_list, which then sets up the state_t as needed, but like I said, that's a bigger change than I wanted to introduce at this point.

@jagerman
Copy link
Member Author

jagerman commented Dec 9, 2019

I'm just not too sure about restoring the uptime proofs. Reason being we didn't trust that the P2P layer is a fully connected graph, so some nodes might never see the uptime proof at all- so we made it relay aggressively in general.

I've added a commit to restore that aggressive sending by clearing our own timestamp when loading.

src/blockchain_db/lmdb/db_lmdb.cpp Outdated Show resolved Hide resolved
src/cryptonote_core/service_node_list.cpp Outdated Show resolved Hide resolved
Various code improvements to LMDB code (no logic changes):

- compile less code by moving the cursor macros (which are used often)
  into local functions

- removes the TXN_POSTFIX_RDONLY macro because it does exactly nothing

- remove useless `inline` from various functions in C++ only has one
  purpose: to tell the compiler that it's okay to have multiple
  definitions (and just throw the extra away when linking).  It is not
  used by any modern C++ compiler to actually decide when to inline, and
  so has no use in a .cpp file.  (Suspect this was written by a C
  programmer who doesn't fully understand C++).

- Remove the C `typedef struct blah { ... } blah;` idiom which has no
  purpose in C++.

- Use inheritance on the nearly-duplicate `mdb_block_info_{1,2,3}
  structs rather than repeating all the field in each one.  This also
  removed a completely unneeded offsetof into them that was being used
  to avoid having to learn C++.  (Also adds a static_assert in case
  anyone ever changes some contained type and breaks things).

- Fix bi_weight comment about why it is really uin64_t

- Add a missing method `override`.

- Replace "22" buried in the code with a constant indicating what it is
  for.  (Oh, reader, you wanted to know what it's used for?  It's the
  number of different databases allowed in the LMDB)

- Move `m_cur_whatever` macros into the .cpp file, and avoid using them
  in any code we've added in loki.  These are incredibly useless
  macros: they are basically just defined to slightly shorten a name.
  I'm guessing that, at some point, these members got moved and rather
  than fix the remaining code someone defined a bunch of macros to deal
  with the rename.  Yuck.

  (Getting rid of *all* of them (i.e. replacing all `m_cur_whatever`
  with `m_cursors->whatever` would be preferable, but would make future
  merge conflicts more painful, so I left the macros as is for the
  upstream databases but removed our service_node_data one.)

- Rename `mdb_txn_cursors` members to drop the leading `m_txc_` prefix.
  When the struct is nothing more than a POD collection of values the
  prefix makes no sense at all.  This allows us the write
  `m_cursors->service_node_data` instead of
  `m_cursors->m_txc_service_node_data` (or the horrible macro, above,
  which seems designed to just macro around this naming horror).
Passing by reference is a more natural C++ way of doing this.  Keeps
pointer constructors for compatibility.
Looking up by height currently goes via a mostly-empty implicitly
created state_t, which is wasteful but was necessary with C++11.  C++14
`std::set`s let us avoid this by making `state_t` and heights directly
comparable for set lookups.
This extracts uptime proof data entirely from service node states,
instead storing (some) proof data as its own first-class object in the
code and backed by the database.  We now persistently store:

- timestamp
- version
- ip & ports
- ed25519 pubkey

and update it every time a proof is received.  Upon restart, we load all
the proofs from the database, which means we no longer lose last proof
received times on restart, and always have the most recently received ip
and port information (rather than only having whatever it was in the
most recently stored state, which could be ~12 blocks ago).  It also
means we don't need to spam the network with proofs for up to half an
hour after a restart because we now know when we last sent (and
received) our own proof before restarting.

This separation required some restructuring: most notably changing
`state_t` to be constructed with a `service_node_list` pointer, which it
needs both directly and for BlockchainDB access.  Related to this is
also eliminating the separate BlockchainDB pointer stored in
service_node_list in favour of just accessing it via m_blockchain (which
now has a new `has_db()` method to detect when it has been initialized).
This would wipe out proof data if the rescan crossed a registration, and
wipe out the timestamp if it crossed a decommission.  We don't want to
do either when rescanning: our latest proof data would have already been
reset the first time we cross the registration/decomm transactions.
This renames mdb_block_info_3 to mdb_block_info so that _1 and _2 are
the old versions (you can't predeclare a typedef), and use it rather
than a void pointer in the 64bit block value fetcher.
Also import boost::endian namespace because the boost::endian's
throughout the file were needlessly verbose (the little_to_native calls
are already obvious).
This restores the more aggressive sending of uptime proofs after a
restart.
@jagerman
Copy link
Member Author

Rebased and resolved conflicts (and squashed the last couple fixes into the previous commits)

@Doy-lee Doy-lee merged commit 7054982 into oxen-io:dev Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants