Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some sync crash fixups #1321

Merged
merged 3 commits into from Sep 26, 2018
Merged

Some sync crash fixups #1321

merged 3 commits into from Sep 26, 2018

Conversation

carver
Copy link
Contributor

@carver carver commented Sep 26, 2018

What was wrong?

Two crashes during sync, both during peer switchover:

  1. The new peer reinserts a header into the queue that the fast-syncer is already trying to download a body for.
  2. The full syncer catches up just after a new peer started syncing. It causes havoc to have the peer start skipping some headers midstream. Better to let it just insert the duplicates and let the solution to # 1 handle the situation.

How was it fixed?

  1. Fixed with a new DuplicateTasks exception that is caught, so the relevant headers can be skipped.
  2. Only check if headers are already in the database on the very first run

Plus some logging improvements.

There's another typing tangent that I wasted a lot of time on. Grumble. Any suggestions for making it work? (where the type: ignore is)

Cute Animal Picture

put a cute animal picture link inside the parentheses

@carver carver changed the title Some sync crash fixups [WIP] Some sync crash fixups Sep 26, 2018
# Tried a bunch of things that failed to make mypy happy:
# - cast(DuplicateTasks[BlockHeader], exc) -- refuses to cast
# - except DuplicateTasks[BlockHeader]: -- doesn't catch the exception
# - OrderedTaskPreparation.DuplicateTasks to inherit TTask -- mypy refuses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line up above the try:

exc: DuplicateTask[BlockHeader]
try:
    ...
except DuplicateTasks as exc:
   ...

I think that's how you tell mypy ahead of time what type something is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate idea:

duplicates = cast(Tuple[BlockHeader, ...], exc.duplicates)
self.header_queue.complete(batch_id, duplicates)

Less elegant/type-safe...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-defining the exception type still complained... going with the duplicates cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rabbit hole'd this a while ago. There's a solution that works for simple types but not this one but which has a runtime cost.

if isinstance(v, SomeType):
    do_SomeType_specific_thing(v)

mypy correctly understands that within the if context v must be SomeType. However, there's no way to do the equivalent of if isinstance(v, Tuple[SomeType, ...]) that I could find. 😢

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review requested yet, but I did a full pass on this and it looks 👍

@carver
Copy link
Contributor Author

carver commented Sep 26, 2018

Yeah, marked WIP because I had no idea what the p2p.discovery failure is. I spent some time digging into it. I can't reproduce locally. I am having a hard time even guessing what might have happened:

  • it's not a timeout, the test didn't run that long
  • it didn't get nonsense data, that would have shown up in a warning log

So far, I think maybe both alice and bob binded to the same UDP port, and alice responded to her own request for nodes with an empty list. Still not sure why that happened in 0.02s when the timeout is 0.9s. (the wait for response is supposed to hang until it gets the desired number of nodes, no matter the intermediate responses)

Error message was: "Cannot re-register task id * for completion"
Otherwise, on a peer switchover, it's possible to redownload chunks of
headers, and then have to skip the next chunk, which really borks
things.
@carver carver changed the title [WIP] Some sync crash fixups Some sync crash fixups Sep 26, 2018
@carver carver merged commit f61a486 into ethereum:master Sep 26, 2018
@carver carver deleted the fix-sync-switch-crash branch September 26, 2018 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants