Skip to content
This repository has been archived by the owner on Jul 1, 2021. It is now read-only.

Fix pruning with consecutive finished dependencies #347

Merged
merged 7 commits into from
Mar 8, 2019

Conversation

carver
Copy link
Contributor

@carver carver commented Mar 1, 2019

What was wrong?

Fixes #344

I added a test to reproduce: if you set two chained finished dependencies, then try to prune the oldest one, it crashes because we didn't link one finished dependencies as a dependency of the other.

How was it fixed?

Check if the parent of a finished dependency is already present, and link it as a dependency, if so. Pruning works as expected, now.

Cute Animal Picture

put a cute animal picture link inside the parentheses

@carver carver mentioned this pull request Mar 1, 2019
@cburgdorf
Copy link
Contributor

cburgdorf commented Mar 2, 2019

This is from 7b9e956fedf936c63ed1413ebec7face6962fe5c (this PR's commit, time of writing)

HeaderNotFound: https://gist.github.com/cburgdorf/3d74f0b9a1c56ee8c9bda108ac7fcf51

Copy link
Contributor

@cburgdorf cburgdorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a real disapproval but wanted to raise awareness for the new attached log showing a HeaderNotFound bug that was produced under this PR.

@cburgdorf
Copy link
Contributor

It also seems that, with this PR, I still run in that error where the BrokenProcessPool dies.

29157-   DEBUG  03-03 16:23:34      HeaderMeatSyncer  Problem downloading headers from peer, dropping...
29158-Traceback (most recent call last):
29159-  File "/home/ubuntu/trinity/trinity/sync/common/headers.py", line 605, in _run_fetch_segment
29160-    completed_headers = await peer.wait(self._fetch_segment(peer, parent_header, length))
29161-  File "/home/ubuntu/trinity/p2p/cancellable.py", line 20, in wait
29162-    return await self.wait_first(awaitable, token=token, timeout=timeout)
29163-  File "/home/ubuntu/trinity/p2p/cancellable.py", line 42, in wait_first
29164-    return await token_chain.cancellable_wait(*awaitables, timeout=timeout)
29165-  File "/home/ubuntu/trinity/venv/lib/python3.6/site-packages/cancel_token/token.py", line 152, in cancellable_wait
29166-    return done.pop().result()
29167-  File "/home/ubuntu/trinity/trinity/sync/common/headers.py", line 689, in _fetch_segment
29168-    self._stitcher.register_tasks(headers, ignore_duplicates=True)
29169-  File "/home/ubuntu/trinity/trinity/_utils/datastructures.py", line 561, in register_tasks
29170-    self._roots.add(task_id, dependency_id)
29171-  File "/home/ubuntu/trinity/trinity/_utils/tree_root.py", line 247, in add
29172-    node_root, original_depth = self._get_new_root(node_id, parent_id)
29173-  File "/home/ubuntu/trinity/trinity/_utils/tree_root.py", line 326, in _get_new_root
29174-    parent_root = self._roots[parent_id]
29175-KeyError: b'\n;\xa9\x1c\xe90r\x80\xaf\xbb\xf1\x94\xe6\xf84^\xdd\x89X6T\x89\x96`\xf7AlqL\xe57\xd2'
29176-   DEBUG  03-03 16:23:34            FullServer  Receiving handshake from Address(54.154.125.249:udp:33354|tcp:33354)
29177-   DEBUG  03-03 16:23:34               ETHPeer  Disconnecting from remote peer <Node(0xa1f8@54.154.125.249)>; reason: too_many_peers
29178:   ERROR  03-03 16:23:36   FastChainBodySyncer  Unknown error when getting receipts
29179-Traceback (most recent call last):
29180-  File "/home/ubuntu/trinity/trinity/sync/full/chain.py", line 809, in _request_receipts
29181-    receipt_bundles = await peer.requests.get_receipts(batch)
29182-  File "/home/ubuntu/trinity/trinity/protocol/eth/exchanges.py", line 127, in __call__
29183-    timeout,
29184-  File "/home/ubuntu/trinity/trinity/protocol/common/exchanges.py", line 83, in get_result
29185-    timeout,
29186-  File "/home/ubuntu/trinity/trinity/protocol/common/managers.py", line 293, in get_result
29187-    payload
29188-  File "/home/ubuntu/trinity/p2p/service.py", line 236, in _run_in_executor
29189-    return await self.wait(loop.run_in_executor(executor, callback, *args))
29190-  File "uvloop/loop.pyx", line 2512, in uvloop.loop.Loop.run_in_executor
29191-  File "/usr/lib/python3.6/concurrent/futures/process.py", line 452, in submit
29192-    raise BrokenProcessPool('A child process terminated '
29193-concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
29194- WARNING  03-03 16:23:36               ETHPeer  Task <coroutine object FastChainBodySyncer._run_receipt_download_batch at 0x7f780f274938> finished unexpectedly: A child process terminated abruptly, the process pool is not usable anymore

@carver
Copy link
Contributor Author

carver commented Mar 8, 2019

I'm having a very difficult time reproducing these. Sync is running for hours with nothing, I added a more extensive test in the meantime, and a bit more logging. Hopefully it will provide a bit more info the next time it happens.

I think the best option is to merge, since this fixes a bug that happens more often. I'll be continuing to test and improve sync after merge. I'll leave the PR overnight and merge it in the morning if there are no objections.

Copy link
Contributor

@cburgdorf cburgdorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 👍 on merging this and track #377 individually

@carver carver mentioned this pull request Mar 8, 2019
@carver
Copy link
Contributor Author

carver commented Mar 8, 2019

Great, tracking #387 and #377 separately. Will merge after the lint issue is confirmed fixed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash on task pruning
2 participants