New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Make sure we don't store duplicate actor events caused to reorgs in events.db #11015
Conversation
b5b6123
to
908c63d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass.
eeb07a7
to
e754715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small things but they're up to you. Otherwise, LGTM!
} | ||
} else { | ||
// event already exists, lets mark it as not reverted | ||
res, err := tx.Stmt(ei.stmtRestoreEvent).Exec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, due to the transaction semantics, if any events are recorded, all should be recorded. So we should be able to check:
- Are any events recorded for this tipset. If so, unmark them as reverted and return.
- Otherwise, insert them.
But, we can also keep the current logic unless it becomes a performance issue.
You'll have to tell the linter to "go away" by appending |
@fridrik01 @Stebalien So, the migration in this PR is...slow. On @TippyFlitsUK's machine, it seemed to be happening at a pace of less than 1 per second, which would take (at least) 400k seconds, or 5 days, for anyone with history going back to FEVM launch. Further, it's blocking the daemon's API coming alive, which is an issue for SPs, RPC service providers, etc. Can we:
|
One second per epoch is really strange. It should be milliseconds (we have a constant number of sql queries per epoch). I've filed an issue and will investigate: #11056. Parallelism should be the easiest way to speed this up, but even then, I'm not sure how much parallelism we can get out of sqlite here. |
Fixes: filecoin-project/ref-fvm#1350
Context
When lotus is configured to enable indexing of actor events (when EnableEthRPC=true and DisableHistoricFilterAPI=false) then Lotus will observe for new tipset and store all the actor events in a sqlite table stored at sqlite/events.db.
However, users have been reporting that these events are duplicated and hard to understand (see filecoin-project/ref-fvm#1350).
To address this, I originally implemented a quick fix (#10897) which addressed this in the API level but the team decided that we instead want a less hacky fix where we do not store duplicate events in the database to start with.
This PR implements this on the DB level where depending on whether we are applying messages to revert or not we:
I introduced a new events migration which adds an index on (height,tipset_key) which I confirmed is used by all relevant sql statements:
Furthermore, this PR also include the migration of existing actor events to not contain duplicate events
Test plan
I started a node with the changes in this PR and let it run for 40min (epoch 2988646-2988730). Then I started another node from master branch and then compared the generated actor events within those two epoch ranges between the two instances:
As expected there were no differences.