Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:chain: Message Index #10452

Merged
merged 48 commits into from
Mar 22, 2023
Merged

feat:chain: Message Index #10452

merged 48 commits into from
Mar 22, 2023

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Mar 11, 2023

Overview

This pr introduces an MVP of a message index that allows us to accelrate StateSearchMessage and related functionality, and eventually accelerate critical chain calls (follow up).

The index is implemented using an sqlite db on disk, and is maintained in the background using head change notifications.

How to enable

The index is optional and can be enabled via configuration:

[Index]
EnableMsgIndex = true

Management Tools

We provide a lotus-shed command with two tools:

  • lotus-shed msgindex backfill allows you to backfill the index.
  • lotus-shed msgindex prune allows you to prune old index entries.

Related Issue

See #10326

Design Notes

This departs from the discussion in #10326 by implementing a simpler solution that has no memory component and stores everything in sqlite.
On head change notifications, we first delete data from reverted tipsets and then insert data for applied tipsets (in a transaction).
This allows us to trivially deal with finality commitments.

At the db level, the index uses a single table instead of multiple tables with foreign keys. The denormalization is intentional: we don't want to deal with foreign key constraints and be forced to use joins, build indices, etc, as this is a barebones index on its own. The cost is the duplication of the tipset Cid, which bloats records by some 90% (an extra Cid), which is not some terrible storage cost.

chain/index/interface.go Outdated Show resolved Hide resolved
chain/index/interface.go Outdated Show resolved Hide resolved
@vyzo vyzo marked this pull request as ready for review March 12, 2023 13:36
@vyzo vyzo requested a review from a team as a code owner March 12, 2023 13:36
@vyzo vyzo changed the title [WIP] Message Index feat:chain: Message Index Mar 12, 2023
lint

lint
@raulk raulk added this to the Eth RPC perf + fixes milestone Mar 12, 2023
chain/index/msgindex.go Outdated Show resolved Hide resolved
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through it, expect more comments shortly.

}
}

ctx, cancel := context.WithCancel(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right context to use. Let's make the constructor accept a context, like the Mpool, and feed the lifecycle context during dependency injection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree in this. This is a background component, it is certainly the right context.

Passing a lifecycle context is an antipattern, and we are doing it wrong.
I am sure @Stebalien has strong opinions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using contexts for lifecycle management, agree, but that's Lotus' reality today and we should just adhere to it. If we want to move away from that pattern, it should be done as part of a larger refactor, not in a piecemeal way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will invoke the @Stebalien shell here. Codewise it is easy to hang this context from the lifecycle context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifecycle context should be fine to use. If it is created before the OnClose hook is set, it is guaranteed to be cancelled after Close is called - which is the whole point of the lifecycle system. Lifecycle contexts are somewhat of a hack, but in most cases they just make cleanup of things slightly more robust.

The other thing that context is used for is metrics / tracing, and in this specific case I believe there actually are metrics which will be dropped due to this background context not being wired to the main MetricsCtx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well ok, i will wire it then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments, let's reuse the event observer logic by extracting it first to a common component.

chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
for {
select {
case <-x.sema:
err := x.processHeadChanges(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this logic is not as straightforward as it appears here. This index only cares about executed tipsets, which means that we need to consume the parent of each head, but only when the head has actually been executed itself. Sadly this logic consumes speculative tipsets while they are being formed, which is not what we want.

In other words, we should track tipsets with a confidence of 2, i.e. once the execution tipset has been committed to chain. Then we can add the messages from the execution tipset, which is head.height - 2 at minimum, or more if there are null rounds.

inclusion ts <--- execution ts <--- head

Also, we should extract and reuse the event observer whcih already tracks tipsets, performs reverts, and does reconciliation by replaying a stream of reverts and applies for a particular chain path.

That component is actually not interdependent with the event index, so it can be extracted easily!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readers of this pr: see comments below, we are going inclusion tipset.

chain/stmgr/searchwait.go Outdated Show resolved Hide resolved
@vyzo
Copy link
Contributor Author

vyzo commented Mar 12, 2023

We discussed sync, and we think that:

  1. It is ok to use the inclusion tipset.
  2. We can drop the execution index.

In this light, we probably need to do very little.

We can even keep the coaleacer instead of the event observer if that gets tricky to integrate. Also note that confidence can be implemented at read time as well, by comparing inclusion height with the current head.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a chat with @vyzo. While having the execution tipset and the execution index would be ideal, it'll be easier to get this merged if we stick to what the discussion in #10326 ends up suggesting. So:

  1. Let's store the inclusion tipset, which is what this logic is actually storing now (even though the field is incorrectly named as execution tipset -- fix that).
  2. Let's remove the execution index.
  3. Reuse the events observer, extracting it from the chain/events package since it's not coupled to events.
  4. Set a confidence of 1 (see how the headEventsObserver does it). Not 2 since we're tracking the inclusion tipset and not the execution tipset now.
  5. Align the code with the existing event index to avoid code entropy.

TODOs in addition to the above

(perhaps incomplete, @vyzo please add if I omitted something)

  • Add itests.
  • Add a backfill command in lotus-shed.
  • Make sure we're using the right sqlite pragmas to: (1) fsync on transaction commit, and (2) allow to open the sqlite db from multiple processes (so that the backfill script can operate while the node is live).
  • Populate the index on snapshot import (run a synthetic reconciliation?)

Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This generally looks good. I have a few open questions:

  • Did you try / can we try having the ChainStore own this index? I think it belongs there more, "searching back for a message" is very much a chain operation (even if its caller is currently the stmgr).
  • As written, we can only backfill manually. That means if your node is down for 30 mins and then rebooted, you'll be missing index entries for that period. Are we okay with that?
  • We currently only reconcile on startup. Did you consider triggering the reconciliation process internally if the indexer ends up in some wonky state (eg. errors while processing head change). It also might be useful for callers of the index to be able to "tell" the indexer that it's inconsistent, and trigger a reconciliation (eg. if for some reason we didn't actually find the message in the returned tipset)
  • On the flip-side, as written with the current user of this index, we only need the Tipset CID in the MsgInfo struct for reconciliation, a process we don't expect to need -- we don't actually need it for the lookback, because we load the "execution" tipset and pull the info out of it anyway. This means that we could drop the size of this db by ~half (one of the 2 CIDs), and have this just me a (msgCid, height) mapping. In essence we can get a much smaller database footprint (50%) at the cost of much slower / worse reconciliation (either manually inspecting every record, or just dropping any records that callers report as "bad").
  • I have pretty high confidence in the correctness of this thanks to how simple it is. Even so, could we consider adding a "paranoid" mode, that uses the cache, then does the lookback anyway, and screams if results don't match? This can be run by us, as well as any integration partners with high security standards for this portion of the codebase (we have some such partners).
  • I think it'll be the case that if this is slow, it'll slow down overall sync. Is that so, and if so can we maybe add some timing metrics?
  • I'm strongly in favour of having a single table with the minor duplication here 👍

chain/stmgr/searchwait.go Outdated Show resolved Hide resolved

db, err := sql.Open("sqlite3", dbPath)
if err != nil {
// TODO [nice to have]: automaticaly delete corrupt databases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we capture this (and other TODOs) in an Enhancements issue in Lotus please (but feel free to leave the TODOs in code too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, of course.

Part of making the TODO was to discuss them, see what we need to do in the initial MVP, and the rest can go into a follow up issue.

chain/index/interface.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Outdated Show resolved Hide resolved
chain/index/msgindex.go Show resolved Hide resolved
chain/index/interface.go Outdated Show resolved Hide resolved
chain/stmgr/searchwait.go Outdated Show resolved Hide resolved
chain/stmgr/searchwait.go Show resolved Hide resolved
chain/stmgr/searchwait.go Outdated Show resolved Hide resolved
return xerrors.Errorf("error preparing statement: %w", err)
}

curTs := cs.GetHeaviestTipSet()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is racy unless we can guarantee that the chainstore hasn't been "started" yet. Is that the case @arajasek?

Otherwise, we need a way to atomically get the head and subscribe at the same time. Alternatively, we can defer this reconcile step until we get the first event from the chainstore and reconcile then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that second part is kind of nice because it avoids blocking on start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I am not convinced it is a problem -- let's wait for @arajasek to chime in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think this is a problem -- the chainstore shouldn't have "started", and even if it had, I don't think I see a problem if this races?

I do agree with @Stebalien that reconciling on first event would be nicer, though, but it's non-blocking IMO.

msgIndex := &msgIndex{
db: db,
cs: cs,
sema: make(chan struct{}, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ultra-nit: this is a flag, not a semaphore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha! I can rename if you insist, you are technically right.

@vyzo
Copy link
Contributor Author

vyzo commented Mar 14, 2023

@arajasek thanks for the review, these are some great questions.

Did you try / can we try having the ChainStore own this index? I think it belongs there more, "searching back for a message" is very much a chain operation (even if its caller is currently the stmgr).

I have considered this and I think it has great potential.
If we had the index owned by the ChainStore a whole lot of things could be significantly improved in terms of performance.

Some very basic things like GetTipsetByHeight, GetMessagesForTipset, etc can go through the index.

The main reason I didn't do it for the MVP is that it is a much more invasive change in the most consensus critical part of the system. So a different standard of applicability and testing must be adhered there, and I'd rather do it in a follow up.

The main question is whether it needs to become synchronous in terms of head change notification processing in order to make ChainStore keep the authoritative source of truth invariant. We discussed this with @Stebalien and we think it doesn't have to, we can check if the current tipset (or the one we are based on in general) is in the index, and in that case proceed and use it -- otherwise fallback to what we already have. We should probably tweak down the coalescer delay a bit too, and generally experiment with it. Note that the reason for not wanting synchronous operation is that we will thrash the database with writes/deletes while the tipset is building and this is something we want to avoid for obvious reasons (perf, durability of SSDs, and so on).

At any rate, I think that having the index owned by the ChainStore is a natural progression from the MVP and a very important follow up for subsequent prs.

As written, we can only backfill manually. That means if your node is down for 30 mins and then rebooted, you'll be missing index entries for that period. Are we okay with that?

I think we are ok for MVP, but we do need to do something about it. The first thing to do (as discussed with @raulk) is to fill the index on snapshot import, and that should probably cover us. We can also consider backfilling when first creating the index as well, but I think snapshot import is a better place to handle this.

At any rate, part of the follow up plan to address this.

We currently only reconcile on startup. Did you consider triggering the reconciliation process internally if the indexer ends up in some wonky state (eg. errors while processing head change). It also might be useful for callers of the index to be able to "tell" the indexer that it's inconsistent, and trigger a reconciliation (eg. if for some reason we didn't actually find the message in the returned tipset)

Yeah, that's a good point. The problem is the synchronous nature of it, hence startup seemed like the logical place to do it. We should definitely consider expanding the adaptive capabilities in follow up, but I am not convinced it is necessary.

On the flip-side, as written with the current user of this index, we only need the Tipset CID in the MsgInfo struct for reconciliation, a process we don't expect to need -- we don't actually need it for the lookback, because we load the "execution" tipset and pull the info out of it anyway. This means that we could drop the size of this db by ~half (one of the 2 CIDs), and have this just me a (msgCid, height) mapping. In essence we can get a much smaller database footprint (50%) at the cost of much slower / worse reconciliation (either manually inspecting every record, or just dropping any records that callers report as "bad").

I don't think we should do that, having the tipset cid in the index opens up great opportunities. See the notes above about ChainStore ownership, but at the very least we can return the execution tipset this way in addition to the inclsion tipset on queries if we have it. I left a TODO for an optimization follow up in the search implementation.

I have pretty high confidence in the correctness of this thanks to how simple it is. Even so, could we consider adding a "paranoid" mode, that uses the cache, then does the lookback anyway, and screams if results don't match? This can be run by us, as well as any integration partners with high security standards for this portion of the codebase (we have some such partners).

That's probably a good idea, and it will definitely helps us build confidence harder.

I think it'll be the case that if this is slow, it'll slow down overall sync. Is that so, and if so can we maybe add some timing metrics?

I don't think it will be slow (sqlite is an amazing piece of software), but you are absolutely right that we should have some timing measurements. I will add some logging to that regard.

I'm strongly in favour of having a single table with the minor duplication here +1

yeah, it makes things so much simpler! And it is not that big of a waste, that 2x is not that bad.

@vyzo
Copy link
Contributor Author

vyzo commented Mar 16, 2023

Updates:

  • lotus-shed now expands the repo dir to avoid the tilde footgun
  • added an ON CONFLICT clause in the table for the message cid, to avoid conflicts from stuff inserted manually with shed
  • addressed @arajasek review comments.

@arajasek
Copy link
Contributor

arajasek commented Mar 20, 2023

I think this is good to land, but I would like to see the "followup" plan laid out soon. Can you quickly write that up / open some issues, @vyzo? Stuff that we've discussed (in decreasing order of importance) includes:

  • PR filling the index on snapshot import (basically a must-have)
  • issue describing moving the index into chainstore (and challenges / open qs described above)
  • issues for the TODOs in code (happy to have the TODOs too, but we need issues to track them)
  • issue / PR for "paranoid" mode

(and I'm sure you have more next steps ideas in your 🧠)

@vyzo
Copy link
Contributor Author

vyzo commented Mar 20, 2023

yessir, i shall do.

@jennijuju
Copy link
Member

merge block: can we have some user documentation is the PR description? i.e: what kind of the configuration exists? any tooling to manage the db?

@vyzo
Copy link
Contributor Author

vyzo commented Mar 21, 2023

merge block: can we have some user documentation is the PR description? i.e: what kind of the configuration exists? any tooling to manage the db?

@jennijuju done, updated the pr description.

@vyzo vyzo mentioned this pull request Mar 22, 2023
8 tasks
@vyzo
Copy link
Contributor Author

vyzo commented Mar 22, 2023

@arajasek follow up issue: #10536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants