Some sync crash fixups #1321

carver · 2018-09-26T00:37:26Z

What was wrong?

Two crashes during sync, both during peer switchover:

The new peer reinserts a header into the queue that the fast-syncer is already trying to download a body for.
The full syncer catches up just after a new peer started syncing. It causes havoc to have the peer start skipping some headers midstream. Better to let it just insert the duplicates and let the solution to # 1 handle the situation.

How was it fixed?

Fixed with a new DuplicateTasks exception that is caught, so the relevant headers can be skipped.
Only check if headers are already in the database on the very first run

Plus some logging improvements.

There's another typing tangent that I wasted a lot of time on. Grumble. Any suggestions for making it work? (where the type: ignore is)

Cute Animal Picture

pipermerriam · 2018-09-26T17:20:34Z

trinity/sync/full/chain.py

+                # Tried a bunch of things that failed to make mypy happy:
+                # - cast(DuplicateTasks[BlockHeader], exc) -- refuses to cast
+                # - except DuplicateTasks[BlockHeader]: -- doesn't catch the exception
+                # - OrderedTaskPreparation.DuplicateTasks to inherit TTask -- mypy refuses


Can you add a line up above the try:

exc: DuplicateTask[BlockHeader] try: ... except DuplicateTasks as exc: ...

I think that's how you tell mypy ahead of time what type something is.

Alternate idea:

duplicates = cast(Tuple[BlockHeader, ...], exc.duplicates) self.header_queue.complete(batch_id, duplicates)

Less elegant/type-safe...

Pre-defining the exception type still complained... going with the duplicates cast.

I rabbit hole'd this a while ago. There's a solution that works for simple types but not this one but which has a runtime cost.

if isinstance(v, SomeType): do_SomeType_specific_thing(v)

mypy correctly understands that within the if context v must be SomeType. However, there's no way to do the equivalent of if isinstance(v, Tuple[SomeType, ...]) that I could find. 😢

pipermerriam

No review requested yet, but I did a full pass on this and it looks 👍

carver · 2018-09-26T19:00:31Z

Yeah, marked WIP because I had no idea what the p2p.discovery failure is. I spent some time digging into it. I can't reproduce locally. I am having a hard time even guessing what might have happened:

it's not a timeout, the test didn't run that long
it didn't get nonsense data, that would have shown up in a warning log

So far, I think maybe both alice and bob binded to the same UDP port, and alice responded to her own request for nodes with an empty list. Still not sure why that happened in 0.02s when the timeout is 0.9s. (the wait for response is supposed to hang until it gets the desired number of nodes, no matter the intermediate responses)

Error message was: "Cannot re-register task id * for completion"

Otherwise, on a peer switchover, it's possible to redownload chunks of headers, and then have to skip the next chunk, which really borks things.

carver changed the title ~~Some sync crash fixups~~ [WIP] Some sync crash fixups Sep 26, 2018

pipermerriam reviewed Sep 26, 2018

View reviewed changes

carver added 3 commits September 26, 2018 12:02

bugfix: new peer header sync caused duplicate task

be8e832

Error message was: "Cannot re-register task id * for completion"

Only skip existing headers at start of peer sync

6d8a17e

Otherwise, on a peer switchover, it's possible to redownload chunks of headers, and then have to skip the next chunk, which really borks things.

Log more info when peer is lost during request

39026da

carver force-pushed the fix-sync-switch-crash branch from f562759 to 39026da Compare September 26, 2018 19:02

carver changed the title ~~[WIP] Some sync crash fixups~~ Some sync crash fixups Sep 26, 2018

carver merged commit f61a486 into ethereum:master Sep 26, 2018

carver deleted the fix-sync-switch-crash branch September 26, 2018 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some sync crash fixups #1321

Some sync crash fixups #1321

carver commented Sep 26, 2018

pipermerriam Sep 26, 2018

pipermerriam Sep 26, 2018 •

edited

carver Sep 26, 2018

pipermerriam Sep 26, 2018

pipermerriam left a comment

carver commented Sep 26, 2018 •

edited

Some sync crash fixups #1321

Some sync crash fixups #1321

Conversation

carver commented Sep 26, 2018

What was wrong?

How was it fixed?

Cute Animal Picture

pipermerriam Sep 26, 2018

Choose a reason for hiding this comment

pipermerriam Sep 26, 2018 • edited

Choose a reason for hiding this comment

carver Sep 26, 2018

Choose a reason for hiding this comment

pipermerriam Sep 26, 2018

Choose a reason for hiding this comment

pipermerriam left a comment

Choose a reason for hiding this comment

carver commented Sep 26, 2018 • edited

pipermerriam Sep 26, 2018 •

edited

carver commented Sep 26, 2018 •

edited