Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track and expose chain integrity info from HeaderDB #1924

Merged

Conversation

cburgdorf
Copy link
Contributor

@cburgdorf cburgdorf commented Apr 28, 2020

What was wrong?

Py-EVM allows persisting headers with a trusted score via its persist_checkpoint_header API. Essentially this allows persisting chains with arbitrary holes between segments. The gained flexibility allows the implementation of various different sync mechanisms.

However, eventually, sync algorithms might want to review the integrity of the chain and fix up holes to get back to a state that doesn't rely on trusted checkpoints.

How was it fixed?

  1. A new utility function calculate_gaps was created to track gaps in chains of BlockNumber. This can be used to individually track gaps for different purposes (e.g. header, blocks etc)

  2. The HeaderDB utilizes this function to continuously track gaps in the chain

  3. Gaps in the header chain can be read using the header_db.get_header_chain_gaps() API.

  4. The return type is Tuple[BlockRange, ...] with the following characteristics. It returns an ordered sequence of block ranges describing the integrity of the chain of headers. Each block range describes a missing segment in the chain and each range is defined with inclusive boundaries, meaning the first value describes the first missing block of that segment and the second value describes the last missing block of the segment. The last block range in the sequence is expected to have block number of -1 as the right-hand value which is to say the gap is open-ended. Consequently the value for an empty chain would be [[1, -1]], meaning 1 is the first missing number with the upper end being unknown.

To-Do

  • Clean up commit history

Cute Animal Picture

put a cute animal picture link inside the parentheses

@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 6ef230a to 1c8fbfd Compare April 28, 2020 15:01
eth/db/header.py Outdated Show resolved Hide resolved
eth/db/header.py Outdated Show resolved Hide resolved
@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 1c8fbfd to e81d1b1 Compare April 29, 2020 08:48
eth/db/header.py Outdated Show resolved Hide resolved
@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 3b2a386 to 380b2b6 Compare May 7, 2020 09:50
eth/db/header.py Outdated
@@ -319,6 +320,8 @@ def _persist_header_chain(
rlp.encode(curr_chain_head),
)

cls._add_block_number_to_hash_lookup(db, curr_chain_head)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carver Can I get your opinion on this?

I discovered a problem that we currently have in Py-EVM. Say you write header 10 as checkpoint header leading to a gap from 1 to 9. At some point later I go and fetch the headers 1, 2, 3 from a peer with the intention to start closing the gap. Now the problem is, how do we know that the headers 1, 2, 3 do actually belong to the canonical chain? The call to persist_header_chain does validate the headers in the sense that they match the previous header (genesis, in this example) but it does not know if it will eventually lead us back to the header 10 that we accepted with a trusted score. It could also be an alternative history that we are accepting here. Eventually, as we are closing the gap at some point we will know if the headers match up to our expected header 10 but there is a window of uncertainty where we can't be exactly sure about it I believe.

I have two ideas how to go about that.

  1. We could not assign block numbers until we finally closed the gap and then patch up the block numbers backwards from the closing point until we don't hit a missing one anymore. But that would mean that we can not read headers by block number while we are still working on closing the gap. We could still track the gap with the header_db.get_header_chain_gaps() API though.

  2. We could add a safeguard that goes roughly like: If we are writing headers with a block number lower than the one of our current head and we do not have block numbers assigned to any headers for these block numbers yet, accept them as canonical headers even though we can not entirely be sure that these will eventually lead to our previously accepted trusted score header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the chain rules should supercede a checkpointed header. In the case that we checkpoint at height 10, and then fill in 1-9 to find out that 9 is not the parent of 10, and that there is an alternate header at height 10 that would be picked by the fork choice rule we should discard our non-canonical 10 and all headers onward and re-sync.

At present we may be in a position where if we detect this situation we need to simply crash the client. In practice I believe that we are unlikely to end up in this scenario without the user doing something very odd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present we may be in a position where if we detect this situation we need to simply crash the client.

Yeah, I think this is the best path, at least in the short term. If the user asserts something that trinity finds to be false, we should crash.

That means we should be extra careful to help people not pick a checkpoint that gets uncled (it was already true, just a reminder).

One thing to note: it gives other clients a way to crash our client, if they know where our checkpoint is: they respond with an uncle at the last unknown header in the gap. So one way to avoid crashing hard is to treat the very last header as invalid if it's not a parent of the canonical block at the child's slot. We would have to print loud warnings, though. But having a client crash that is externally triggerable is a problem.

Copy link
Contributor Author

@cburgdorf cburgdorf May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for your input, it was really helpful.

I was starring at the screen for a while, puzzled how to come up with an API that:

  • doesn't need a new explicit param such as force_canonical
  • passes all our tests that ensure tracking of forks and the canonical chain works as it does prior to this PR
  • doesn't result in lots of additional db reads to figure out if when it would be appropriate to assign a header a block number (making it part of the canonical chain) when we write headers that are patching gaps.

At some point it occurred to me that _update_header_chain_gaps could just include the information of not just what are the new gaps but also what action was taken to get there (e.g. a gap was shrinked).

We can return this info without any additional db read and so I came up with a rule that says we always assign the block number in cases where we know that we have patched a gap in the chain.

This passes all existing tests without incurring much overhead and I'm quite happy with the API. That said, we could also move it to Trinity if you don't feel comfortable introducing these changes to Py-EVM. But I guess it would involve monkey patching Py-EVM to do the tracking which I'm not sure is a good idea.

But anyway, I'm not yet finished with the PR yet. This is just an intermediate update.

One thing to note: it gives other clients a way to crash our client, if they know where our checkpoint is: they respond with an uncle at the last unknown header in the gap. So one way to avoid crashing hard is to treat the very last header as invalid if it's not a parent of the canonical block at the child's slot. We would have to print loud warnings, though. But having a client crash that is externally triggerable is a problem.

Yeah, I think that part is easy to handle on the Trinity side. 👍

@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch 2 times, most recently from b1199f8 to 3448492 Compare May 12, 2020 09:26
@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch 6 times, most recently from 992eb1c to 1a6cbd1 Compare May 13, 2020 13:26
@cburgdorf cburgdorf marked this pull request as ready for review May 13, 2020 13:32
@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 1a6cbd1 to ccce771 Compare May 13, 2020 13:39
@cburgdorf
Copy link
Contributor Author

@carver @pipermerriam I think this is ready to get picked apart.

There's also a WIP PR in Trinity that uses this

ethereum/trinity#1714

The gap tracking should be easily reusable to track gabs in the chain of blocks (as opposed to just the chain of headers as it is in this PR) but I'm not yet sure if we might need to make further changes to Py-EVM to properly allow backfilling of blocks (See my comment in the Py-EVM channel)

@cburgdorf cburgdorf requested a review from carver May 13, 2020 13:48
Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 👍 If you are good with something like the proposed data structure change. Otherwise, let's chat a bit to find something that works. (once the data structure gets locked it will be a bit painful to change it)

eth/abc.py Outdated Show resolved Hide resolved
eth/db/chain_gaps.py Outdated Show resolved Hide resolved
def calculate_gaps(newly_persisted: BlockNumber, base_gaps: Tuple[BlockRange, ...]) -> GapInfo:

# If we have a fresh chain, our highest missing number can only be 1
highest_missing_number = 1 if base_gaps == () else base_gaps[-1][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name confused me on the first read, because the highest missing number should be the top end of the range, right? Maybe this is the...

  • highest_missing_gap_start (ugh, so long)
  • highest_range_start
  • latest_range_start
  • future_range_start (I guess the highest range, is always the one at the tail, from the head.number + 1 to -1?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess the highest range, is always the one at the tail

Correct, but future_range_start doesn't really imply that (to me). We could be talking about a range for a new gap somewhere in the center.

What do you think about known_missing_tip. As in: the tip might be far further ahead but to us, right now, this is what we think is the first missing number at the tip of the chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think of the tip as the header that we have, so missing tip was a little confusing. Maybe tip_child?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

eth/db/chain_gaps.py Outdated Show resolved Hide resolved
# we are shrinking the gap at the tail
updated_center = ((gap[0], gap[1] - 1,),)
gap_change = GapChange.GapShrink
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like it, this could be an elif gap[0] <= newly_persisted <= gap[1]: and we could add an else: raise Exception("Invariant").

Obviously, the search above should already have handled that, but I suppose it could help catch bugs if someone tries to tweak the search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprise, I didn't know about this more concise syntax. But I think it should be elif gap[0] < newly_persisted < gap[1]: (non-inclusive) because this arm is all about being not at the edges and these edge cases (pun intended) are handled already in the previous branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, yeah, totally 👍

eth/db/header.py Outdated
@@ -211,6 +256,9 @@ def _persist_header_chain(
rlp.encode(curr_chain_head),
)
score = cls._set_hash_scores_to_db(db, curr_chain_head, score)
gap_change, gaps = cls._update_header_chain_gaps(db, curr_chain_head)
if gap_change is GapChange.GapShrink or gap_change is GapChange.GapSplit:
cls._add_block_number_to_hash_lookup(db, curr_chain_head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves an inline comment. I suppose something like:

The caller implicitly asserts that persisted headers are canonical here. This assertion is made when persisting blocks that ore older than already persisted canonical blocks. So we just canonicalize them without any further verification. A block persisted at the tip would be a GapChange.TailWrite, instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is made when persisting blocks that ore older than already persisted canonical blocks.

getting nitpicky, just "being older" doesn't qualify. One can write a bunch of older headers for a parallel history and it won't trigger this write. There has to be a known gap to be written to in order to trigger it.

I has led me to the following elaboration:

The caller implicitly asserts that persisted headers are canonical here.
This assertion is made when persisting headers that are known to be part of a gap
in the canonical chain. While we do validate that the headers are valid, we can not
be 100 % sure that they will eventually lead to the expected child that fills the gap.
E.g. there is nothing preventing one from persisting an uncle as the final header to
close the gap, even if that will not be the parent of the next consecutive header.
Py-EVM makes the assumption that client code will check and prevent such a scenario.

That said, now that I wrote all this and now that we have GapChange.GapFilled, wouldn't it be better to simply make that extra check in Py-EVM when we fill a gap. I mean, load the existing known child (the checkpoint header) and if the ends don't match up, raise an exception. It would then be Trinitys job to handle this which probably means to just log a warning and try to close the gap another time (under the assumption that someone tried to feed us an uncle for whatever reason)

Copy link
Contributor

@carver carver May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have GapChange.GapFilled, wouldn't it be better to simply make that extra check in Py-EVM when we fill a gap. I mean, load the existing known child (the checkpoint header) and if the ends don't match up, raise an exception.

Yup, that seems reasonable to me.

This is also a reminder that we have to handle this scenario, with a series of inserted headers between A0 and A4, but the B branch is actually uncled. The top row is the presumed canonical one.

A0 -> __ -> __ -> __ -> A4

A0 -> B1 -> __ -> __ -> A4

A0 -> B1 -> B2 -> __ -> A4

A0 -> B1 -> B2 -> __ -> A4  # Tries to insert B3, but it's not a parent of A4, and raises an exception

# client somehow finds the A chain and starts inserting it
# it is deemed non-canonical at first (current PR functionality)
A0 -> B1 -> B2 -> __ -> A4
   \_ A1

A0 -> B1 -> B2 -> __ -> A4
   \_ A1 -> A2

# When inserting A3, pyevm notices that it fills a gap, and links with the child
# It notices that A3's parent is present but non-canonical, and makes 
# B* non-canonical and A[1-3] canonical
A0 -> A1 -> A2 -> A3 -> A4
   \_ B1 -> B2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I added a test for that (which also asserts a raised exception in case we are trying to close the gap with B3. I found it simpler to cover that in one test with two assertions)

eth/exceptions.py Outdated Show resolved Hide resolved
# is open-ended. E.g. (500, -1) means: Every header from number 500 upwards is missing.
# [[first_missing, last_missing], ..., [first_missing, -1]]
# Since RLP doesn't define signed integers, we convert the right-hand side from signed_to_unsigned
# before entries are written and convert from unsigned_to_signed after entries are read from disk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid a bit of this explanation and contortion with a new data structure. This is the only thing holding me back from a 👍 on the PR, since it is worth discussing if you disagree (and once the data structure is locked it will be a bit painful to change it).

What do you think about redefining the gaps native and rlp types like this?

BlockRange = Tuple[BlockNumber, BlockNumber]
ChainGaps = Tuple[
  # series of gaps before the chain head
  Tuple[BlockRange, ...],
  # beginning of the unknown future
  BlockNumber,
]

...

chain_gaps = rlp.sedes.List((
  rlp.sedes.CountableList(rlp.sedes.List((uint32, uint32))),
  uint32,
))

Then we wouldn't need any custom encode/decode methods for the storage at all, we could just use rlp directly.

I think the whole gap calculation is about the same or improved using the above data structure, and I figure you might like that the types can be tightened from int to BlockNumber.

Another perk is that (I think) it stops mattering if the gaps are sorted or not. It's probably not important, but nice to have a bit more flexibility.

Copy link
Contributor Author

@cburgdorf cburgdorf May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think your proposal is great! It's a really nice improvement to tighten the representation of the chain integrity. 👍

Two examples to double check I got it right:

Now -> Then

  1. An empty chain with only genesis

((1, -1)) -> ((), 1)

  1. A chain with Checkpoint header 250 and 500 and no other headers

((1, 249), (251, 499), (501, -1)) -> ((1, 249), (251, 499), 501)

and I figure you might like that the types can be tightened from int to BlockNumber

I didn't do that to avoid bloating the calculations with things like new_tail = ((highest_missing_number, gap_end), (newly_persisted + BlockNumber(1), BlockNumber(-1)),) but since we got rid of the -1 part, this may now become practical 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I just noticed your example further down (GapChange.GapSplit, (((2, 2), (4, 4)), 6)) which indicates we are on the same page 👍

tests/database/test_header_db.py Outdated Show resolved Hide resolved
tests/database/test_header_db.py Outdated Show resolved Hide resolved
@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 123d3d3 to 4e594ce Compare May 14, 2020 11:12
try:
encoded_gaps = db[SchemaV1.make_header_chain_gaps_lookup_key()]
except KeyError:
return GENESIS_CHAIN_GAPS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carver I want to point out this change. Previously we were just returning an empty tuple here but with the updated datastructure and empty tuple doesn't match the type anymore which is why I came up with GENESIS_CHAIN_GAPS. On the plus side, this slightly improves the calculate_gaps logic.

@cburgdorf
Copy link
Contributor Author

Thanks for the review @carver
I'm leaving this open for now to give you a chance for a second look. Main points are:

  1. Do you like the changes regarding the new datastructure (I do)
  2. What do you think about the proposed name change
  3. What do you think about tightening the rule so that Py-EVM does one extra check in the GapFill case to throw if the gap was filled with an uncle. (Not included yet in the changes)

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good!

Should be good to go after adding two tests:

  • that py-evm can handle the re-canonicalization of a newly-correct gap-fill (diagram in another comment)
  • an exception is raised when trying to fill a gap that doesn't match both parent and child

(and I think implementations for both :))

@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 96dd136 to 3f3fe83 Compare May 15, 2020 18:02
eth/db/header.py Outdated
# turn out not to be. That means by the time we write headers 1 -5 again (but from chain A)
# we do not recognize it as writing to a known gap. Therefore by the time we write header 6,
# when we finally realize that we are attempting to fill in a gap, we have to backtrack
# the ancestors to add them to the canonical chain.
Copy link
Contributor Author

@cburgdorf cburgdorf May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carver This is quite a wall of text but it helped me understand why we we always want the call to _find_new_ancestors here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, well, I think we can drop the last paragraph (or at least move it to the test), since it's basically just a test description, and I think it's the test you already wrote. (Could reference test_gap_filling_handles_uncles_correctly in a comment here if you want)

assert_headers_eq(headerdb.get_canonical_head(), pseudo_genesis)


def test_gap_filling_handles_uncles_correctly(headerdb, genesis_header):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carver I found it easier to make two assertions in the same test rather than two individual tests to avoid a big chunk of copy and paste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no problem really. In fact, I think it would help to generalize it so we could test the other scenarios I mentioned in other comments. Just thought of this one, too:

A3 (set checkpoint)
B0:2
B2 (fail)
A2

I'm not quite sure what should happen here. We don't have A0 or A1 yet. We don't want to fill the gap with an unlinked chain, but we don't want to drop A2 as failing either, I think. Maybe we could drop B1 as canonical, tending to prefer later canonical blocks as (more) correct?? That would be quite a weird mutation to handle, though. We don't have any logic yet for adding gaps that were previously filled, and maybe don't want to add any.

Copy link
Contributor

@carver carver May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having to come up with all these scenarios, and definitely missing some, I wonder if we can do a hypothesis test to cover more scenarios, to test a bunch of permutations. Something like these inputs:

  1. permutations of [(chain, index) for chain in ('chain_a', 'chain_b') for index in range(10)] -- which header to persist next with persist_header_chain()
  2. series of 20 booleans (should the previous header in the list ^ be joined with the next, or should it be persisted individually? some of these at the end would be unused. Would be ignored if the next header is from a different chain) Oof this is ugly, but I think we want to test ranges of headers in addition to individual ones.
  3. integer from 0-9 (which chain_a element to persist as a checkpoint header at the beginning)
    (Or maybe there's a better way I'm not seeing yet)

At the end, I think we can always assert_is_canonical_chain(headerdb, chain_a), right?

It even seems wise to test interlacing persist_checkpoint_header() in between the persist_header_chain calls, with multiple checkpoints (all of chain a, though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with hypothesis but I've build something home grown that is very simple yet flexible. Maybe that gets the job done?

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see a couple unhandled cases. Thoughts?

# Persist the checkpoint header with a trusted score
# chain_a[7] has block number 8 because `chain_a` doesn't include the genesis
pseudo_genesis = chain_a[7]
headerdb.persist_checkpoint_header(pseudo_genesis, 154618822656)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this magic number, maybe sum(header.difficulty for header in chain_a[0:8])?

eth/db/header.py Outdated
# turn out not to be. That means by the time we write headers 1 -5 again (but from chain A)
# we do not recognize it as writing to a known gap. Therefore by the time we write header 6,
# when we finally realize that we are attempting to fill in a gap, we have to backtrack
# the ancestors to add them to the canonical chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, well, I think we can drop the last paragraph (or at least move it to the test), since it's basically just a test description, and I think it's the test you already wrote. (Could reference test_gap_filling_handles_uncles_correctly in a comment here if you want)

eth/db/header.py Outdated
Comment on lines 309 to 316
# What we don't know is if this header will eventually lead up to the upper end of the
# gap (the checkpoint header) or if this is an alternative history. But we do ensure the
# integrity eventually. For one, we check the final header that would close the gap and if
# it does not match our expected child (the checkpoint header), we will reject it.
# Additionally, if we have persisted a chain of alternative history under the wrong
# assumption that it would be the canonical chain, but then at a later point it turns out it
# isn't and we overwrite it with the correct canonical chain, we make sure to
# recanonicalize all affected headers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we can tighten this up a bit. Something like:

What if this assertion is later found to be false? At gap fill time, we can detect if the chains don't link (and raise a ValidationError). Also, when a true canonical header is added eventually, we need to canonicalize all the true headers.

eth/db/header.py Outdated
return

if gap_change is GapChange.GapFill:
expected_child = cls._get_canonical_head(db)
Copy link
Contributor

@carver carver May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canonical head as the child of the gap seems wrong to me. What about this series of added headers:

A2 (set checkpoint)
A3
B0:2
A0:2

(using a and b from the test, and python index/slice rules)

Then A3 is canonical head when we are closing a gap with A1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right (and I admire how you spot these things during a review!). I added a fix + test case to cover this.

Comment on lines +328 to +329
for ancestor in cls._find_new_ancestors(db, header, genesis_parent_hash):
cls._add_block_number_to_hash_lookup(db, ancestor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this handles everything. What about this scenario:

A3 (set checkpoint)
B2
A2
A0
A1

Is A2 canonical now?

Copy link
Contributor Author

@cburgdorf cburgdorf May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will currently raise a ParentNotFound exception which I believe is the right thing to do. At least, everything becomes a bit simpler if we just continue with the assumption that headers must always be written sequentially. In other words, gaps aside, we can not write A2 before A1 was written.

But I will add a test for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, that does make things simpler. Though I think most of the complexity is added back in with checkpoints. What about multiple checkpoints? Want to make sure re-canonicalization happens correctly (for example, if the checkpoint fills a gap).

A3 (set checkpoint)
B0:2 (now canonical)
A2 (set checkpoint)
<- what happens here?
A0:2
<- what about here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are bringing up an interesting case there! When we set A2 as a new checkpoint it becomes the new canonical head and we stop caring about A3. I think that is the way it should be (For some reason, the user decided to set the checkpoint to A2, imho that should be given precedence).

But know what happens is that A2 is closing a gap "backwards" which isn't what we want here! We get into a situation where our gap tracking would think we have a well-connected chain which isn't the case!

So, my idea is that persist_checkpoint_header must be altered to notice situations when it is backwards filling a gap and then do one of these things:

  1. Raise an exception. Let client code now that they can not pick this checkpoint before they bring the database into a different state

  2. Alternatively, remove the B1 header to create a new gap to continue to track this missing connection. Removing just the single header before A2 should be enough because we do not know where exactly the fork might be (which are uncles, which are actual parents)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will track this one with an issue and merge the PR as is

assert_headers_eq(headerdb.get_canonical_head(), pseudo_genesis)


def test_gap_filling_handles_uncles_correctly(headerdb, genesis_header):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no problem really. In fact, I think it would help to generalize it so we could test the other scenarios I mentioned in other comments. Just thought of this one, too:

A3 (set checkpoint)
B0:2
B2 (fail)
A2

I'm not quite sure what should happen here. We don't have A0 or A1 yet. We don't want to fill the gap with an unlinked chain, but we don't want to drop A2 as failing either, I think. Maybe we could drop B1 as canonical, tending to prefer later canonical blocks as (more) correct?? That would be quite a weird mutation to handle, though. We don't have any logic yet for adding gaps that were previously filled, and maybe don't want to add any.

assert_headers_eq(headerdb.get_canonical_head(), pseudo_genesis)


def test_gap_filling_handles_uncles_correctly(headerdb, genesis_header):
Copy link
Contributor

@carver carver May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having to come up with all these scenarios, and definitely missing some, I wonder if we can do a hypothesis test to cover more scenarios, to test a bunch of permutations. Something like these inputs:

  1. permutations of [(chain, index) for chain in ('chain_a', 'chain_b') for index in range(10)] -- which header to persist next with persist_header_chain()
  2. series of 20 booleans (should the previous header in the list ^ be joined with the next, or should it be persisted individually? some of these at the end would be unused. Would be ignored if the next header is from a different chain) Oof this is ugly, but I think we want to test ranges of headers in addition to individual ones.
  3. integer from 0-9 (which chain_a element to persist as a checkpoint header at the beginning)
    (Or maybe there's a better way I'm not seeing yet)

At the end, I think we can always assert_is_canonical_chain(headerdb, chain_a), right?

It even seems wise to test interlacing persist_checkpoint_header() in between the persist_header_chain calls, with multiple checkpoints (all of chain a, though).

@carver
Copy link
Contributor

carver commented May 15, 2020

(BTW, thanks for not squashing your changes, it's been nice to just read the new commit each time)

@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch 2 times, most recently from 94d9f63 to 992fed3 Compare May 18, 2020 10:32
@cburgdorf
Copy link
Contributor Author

@carver This is ready for another review

@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 992fed3 to 596711f Compare May 18, 2020 10:35
@cburgdorf cburgdorf force-pushed the christoph/feat/chain_integrity_info branch from 596711f to 5058522 Compare May 18, 2020 10:36
Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully ingested the logic of this PR but looking at the tests, I realized that this might be an interesting candidate for the hypothesis machinery for testing state machines

We have a number of different state transitions which appear to be reasonably well defined, and I believe that we have a decent set of testable properties about the integrity of the chain. I'll try to give this a more thorough look-through this week.

@carver
Copy link
Contributor

carver commented May 18, 2020

I haven't fully ingested the logic of this PR but looking at the tests, I realized that this might be an interesting candidate for the hypothesis machinery for testing state machines

That's a good reminder. It seems promising. It's possible that an interactive draw could work well (or even better). Something like:

def test_always_canonicalized(data):
    CHAIN_A, CHAIN_B = 0, 1
    chain_length = 10
    headers = [
        # chain A: eventually canonical
        mk_header_chain(genesis_header, length=chain_length),
        # chain B: non-canonical
        mk_header_chain(genesis_header, length=chain_length),
    ]
    missing_indices = [
        set(range(chain_length)),  # chain A 
        set(range(chain_length)),  # chain B
    ] 
    
    @to_tuple 
    def get_valid_extensions(of_missing_indices):
        for chain_index, header_indices in enumerate(of_missing_indices):
            for header_index in header_indices: 
                if header_index == 0 or header_index - 1 not in header_indices: 
                    yield (chain_index, header_index) 
    
    num_checkpoints = 0 

    while len(missing_indices[CHAIN_A]):
        action = data.draw(st.sample_from(ActionEnum))
        if action == ActionEnum.CHECKPOINT:
            # Will py-evm reject a checkpoint that already exists? Assuming so, we can
            # just check the non-present ones...
            checkpoint_index = data.draw(st.sample_from(missing_indices[CHAIN_A]))
            checkpoint = headers[CHAIN_A][checkpoint_index]
            checkpoint_score = get_score(genesis, headers[CHAIN_A][:checkpoint_index])
            headerdb.persist_checkpoint_header(checkpoint, checkpoint_score) 

            # keep track of whether any checkpoints were added, so we eventually
            # guarantee A as canonical
            num_checkpoints += 1 
             
            missing_indices[CHAIN_A].discard(checkpoint_index)
                 
        elif action == ActionEnum.PERSIST_CHAIN:
            # choose the series of headers to add 
            valid_extensions = get_valid_extensions(missing_indices) 
            chain_idx, next_chain_start = data.draw(st.sample_from(valid_extensions))
            next_chain_len = data.draw(st.integers(min_value=1, max_value=chain_length))
            chain_range_end = next_chain_start + next_chain_len
            next_headers = headers[chain_idx][next_chain_start:chain_range_end]

            # persist them to chain
            headerdb.persist_header_chain(next_headers)

            # mark persisted headers as not missing
            for inserted_index in range(next_chain_start, chain_range_end):
                missing_indices[chain_idx].discard(inserted_index)

    if num_checkpoints == 0:
        (subsequent_header, ) = mk_header_chain(headers[CHAIN_A][-1], length=1)
        headerdb.persist_checkpoint_header(subsequent_header, get_score(...))

    assert is_canonical_chain(headers[CHAIN_A])

(I got a little carried away, because the hypothesis API looked fun to toy with)

Copy link
Contributor

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some vague sense that we may be missing more cases, but I think this passes the threshold for merging, if you want.

I do think it's worth looking at the hypothesis testing next to try to solidify our confidence in handling more cases (see previous comment).

So if you want to just keep amending this PR with those tests and further discussion, that's great too.

@cburgdorf
Copy link
Contributor Author

cburgdorf commented May 19, 2020

I have some vague sense that we may be missing more cases

Yeah, I opened #1928 to track one of them

but I think this passes the threshold for merging, if you want.

Yeah, I think it good enough to take it and track and fix edge cases with subsequent PRs.

I do think it's worth looking at the hypothesis testing next to try to solidify our confidence in handling more cases (see previous comment).

Yeah, it's probably worth to do that but getting the back fill out the door has a higher priority for me now :). But if its fun to you to continue with the hypothesis API I'm more than happy if you decide to replace the current tests :)

@cburgdorf cburgdorf merged commit 4beff35 into ethereum:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants