Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Realtime Fetcher to wait for small skips #2470

Merged
merged 1 commit into from
Oct 14, 2019
Merged

Conversation

pasqu4le
Copy link
Contributor

@pasqu4le pasqu4le commented Jul 30, 2019

Closes #1556
and solves this error of fetcher=internal_transaction:

Exception:
** (Postgrex.Error) ERROR 23503 (foreign_key_violation) insert or update on table "internal_transactions" violates foreign key constraint "internal_transactions_transaction_hash_fkey"
    table: internal_transactions
    constraint: internal_transactions_transaction_hash_fkey

Motivation

While searching for the root cause of the internal_transactions issue an odd correlation with the order of the realtime's fetcher indexed block was noticed and found to be consistently happening.
Upon further inspection this also seems to be the cause (or at least one of the causes) for that same fetcher to be refetching the same block multiple time.

The source of the problem

These errors both seem to happen when the newHeads websocket subscription skips a small amount of block numbers, just to be followed by those very same numbers that were skipped, without repeating the number that caused the skipping. For example:

...
100
101
104
102
103
105
106
...

This is (correctly) handled by the fetcher as the case number 3 described here: #1189
and it behaves like this:

...
100 -> fetches 100
101 -> fetches 101
104 -> fetches 102, 103, and 104
102 -> fetches 102
103 -> fetches 103
105 -> fetches 104 and 105
106 -> fetches 106
...

effectively treating it as a reorg.

Why this causes both errors

This kind of block number mis-ordering seems to be happening very often and may very well be caused by an incorrect order of events.
Even if they are indeed reorgs, this causes a lot of close-in-time refetching (problem n.1) and because internal_transactions get fetched asynchronously this may happen after the reorg but before the block has been refetched (so their transactions do not exist in the DB yet <-> problem n.2).
For clarity, using the same numbers of the example abobe, this order of events can happen:

  1. fetcher receives number 104 so blocks with number 102, 103 and 104 get fetched and an async task for their internal transactions starts
  2. fetcher receives number 102 and treats it as a reorg, so it waits a little before refetching it
  3. async job for number 102 started in point 1 finished and has internal_transactions contained in the new version of the block. These will fail to insert.
  4. after waiting from point 2 refetch block number 102 will be refetched

Changelog

Bug Fixes

The proposed solution is to allow the Realtime fetcher ignore small (settable) skips and wait for the blocks following the last one inserted to arrive in order.
Obviously if it gets notified of the blocks following the maximum allowed skip the whole window gets fetched and inserted.
This way small skips will not create multiple fetching of the same blocks and the fetching will be at most a few blocks old.

Checklist for your PR

  • I added an entry to CHANGELOG.md with this PR
  • If I added new functionality, I added tests covering it.
  • If I fixed a bug, I added a regression test to prevent the bug from silently reappearing again.
  • I checked whether I should update the docs and did so if necessary

@coveralls
Copy link

coveralls commented Jul 30, 2019

Pull Request Test Coverage Report for Build 29fbc4c9-5280-4b2a-9dc8-03c80c376022

  • 12 of 25 (48.0%) changed or added relevant lines in 2 files are covered.
  • 6 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.06%) to 77.753%

Changes Missing Coverage Covered Lines Changed/Added Lines %
apps/indexer/lib/indexer/block/catchup/fetcher.ex 5 6 83.33%
apps/indexer/lib/indexer/block/realtime/fetcher.ex 7 19 36.84%
Files with Coverage Reduction New Missed Lines %
apps/indexer/lib/indexer/fetcher/token.ex 1 80.0%
apps/indexer/lib/indexer/fetcher/token_balance.ex 2 87.88%
apps/indexer/lib/indexer/block/fetcher.ex 3 89.02%
Totals Coverage Status
Change from base Build 83e998d5-3840-4b2b-9716-89de2de57b30: -0.06%
Covered Lines: 5323
Relevant Lines: 6846

💛 - Coveralls

@vbaranov
Copy link
Member

vbaranov commented Aug 1, 2019

@pasqu4le great update! Could you please fix merging conflicts to keep the feature with indexing in the range, that Anatoly implemented here #2458?

@pasqu4le
Copy link
Contributor Author

pasqu4le commented Aug 1, 2019

@vbaranov sure, done

@pasqu4le pasqu4le force-pushed the pp-realtime-skipping branch 2 times, most recently from 45a0256 to 11879de Compare August 1, 2019 16:34
Copy link
Member

@vbaranov vbaranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pasqu4le

From stg (after installing the current update):

2019-08-02T11:58:21.543 application=indexer fetcher=block_realtime block_number=8271158 count=179 [debug] fetching transaction receipts
2019-08-02T11:58:23.669 application=indexer fetcher=block_realtime block_number=8271158 count=179 [debug] fetching transaction receipts

Should this PR cover this issue? Or this is not an issue at all?

@pasqu4le
Copy link
Contributor Author

pasqu4le commented Aug 5, 2019

Hey @vbaranov yes it should cover this in most cases where it is an issue, but it may very well not be in your specific case.

For instance this could be a legitimate reorg and you'd see that in the logs.

@vbaranov
Copy link
Member

vbaranov commented Aug 5, 2019

Hey @vbaranov yes it should cover this in most cases where it is an issue, but it may very well not be in your specific case.

For instance this could be a legitimate reorg and you'd see that in the logs.

@pasqu4le

I didn't see reorg in logs. In addition, this block not in the list of forked.

One more example (ETH Mainnet chain):

2019-08-05T17:43:17.403 application=indexer fetcher=block_realtime block_number=8290493 count=88 [debug] fetching transaction receipts
2019-08-05T17:43:19.388 application=indexer fetcher=block_realtime block_number=8290493 [debug] Fetched and imported.
2019-08-05T17:43:19.470 application=indexer fetcher=block_realtime block_number=8290493 count=88 [debug] fetching transaction receipts
2019-08-05T17:43:19.624 application=indexer fetcher=block_realtime block_number=8290494 count=141 [debug] fetching transaction receipts
2019-08-05T17:43:24.074 application=indexer fetcher=block_realtime block_number=8290493 [debug] Fetched and imported.
2019-08-05T17:43:24.804 application=indexer fetcher=block_realtime block_number=8290494 [debug] Fetched and imported.

Problem: Sometimes the realtime fetcher's `newHeads` websocket subscription skips a small amount of block numbers, just to be followed by those very same numbers that were skipped, without repeating the number that caused the skipping.
This may be cause by out-of-order events or be a legitimate reorg, but in any case this causes close-in-time refetching which in turn leads to async insertions problems.

Solution: allow the Realtime fetcher to keep track of a small (settable) skip and wait for the block number inside the skip to arrive before fetching the whole window.
Obviously if it gets notified of the blocks following the skip (or it is forced into polling) the whole window gets fetched and inserted.
@vbaranov vbaranov merged commit ee4b87f into master Oct 14, 2019
@vbaranov vbaranov deleted the pp-realtime-skipping branch October 14, 2019 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review This PR is ready for reviews.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Realtime indexer indexing already indexed blocks
4 participants