feat:chain: Message Index #10452

vyzo · 2023-03-11T14:53:19Z

Overview

This pr introduces an MVP of a message index that allows us to accelrate StateSearchMessage and related functionality, and eventually accelerate critical chain calls (follow up).

The index is implemented using an sqlite db on disk, and is maintained in the background using head change notifications.

How to enable

The index is optional and can be enabled via configuration:

[Index]
EnableMsgIndex = true

Management Tools

We provide a lotus-shed command with two tools:

lotus-shed msgindex backfill allows you to backfill the index.
lotus-shed msgindex prune allows you to prune old index entries.

Related Issue

See #10326

Design Notes

This departs from the discussion in #10326 by implementing a simpler solution that has no memory component and stores everything in sqlite.
On head change notifications, we first delete data from reverted tipsets and then insert data for applied tipsets (in a transaction).
This allows us to trivially deal with finality commitments.

At the db level, the index uses a single table instead of multiple tables with foreign keys. The denormalization is intentional: we don't want to deal with foreign key constraints and be forced to use joins, build indices, etc, as this is a barebones index on its own. The cost is the duplication of the tipset Cid, which bloats records by some 90% (an extra Cid), which is not some terrible storage cost.

chain/index/interface.go

lint lint

chain/index/msgindex.go

raulk

Going through it, expect more comments shortly.

raulk · 2023-03-12T18:13:33Z

chain/index/msgindex.go

+		}
+	}
+
+	ctx, cancel := context.WithCancel(context.Background())


This isn't the right context to use. Let's make the constructor accept a context, like the Mpool, and feed the lifecycle context during dependency injection.

I disagree in this. This is a background component, it is certainly the right context.

Passing a lifecycle context is an antipattern, and we are doing it wrong.
I am sure @Stebalien has strong opinions here.

We shouldn't be using contexts for lifecycle management, agree, but that's Lotus' reality today and we should just adhere to it. If we want to move away from that pattern, it should be done as part of a larger refactor, not in a piecemeal way.

I will invoke the @Stebalien shell here. Codewise it is easy to hang this context from the lifecycle context.

The lifecycle context should be fine to use. If it is created before the OnClose hook is set, it is guaranteed to be cancelled after Close is called - which is the whole point of the lifecycle system. Lifecycle contexts are somewhat of a hack, but in most cases they just make cleanup of things slightly more robust.

The other thing that context is used for is metrics / tracing, and in this specific case I believe there actually are metrics which will be dropped due to this background context not being wired to the main MetricsCtx

well ok, i will wire it then.

chain/index/msgindex.go

raulk

More comments, let's reuse the event observer logic by extracting it first to a common component.

chain/index/msgindex.go

raulk · 2023-03-12T18:49:01Z

chain/index/msgindex.go

+	for {
+		select {
+		case <-x.sema:
+			err := x.processHeadChanges(ctx)


Unfortunately this logic is not as straightforward as it appears here. This index only cares about executed tipsets, which means that we need to consume the parent of each head, but only when the head has actually been executed itself. Sadly this logic consumes speculative tipsets while they are being formed, which is not what we want.

In other words, we should track tipsets with a confidence of 2, i.e. once the execution tipset has been committed to chain. Then we can add the messages from the execution tipset, which is head.height - 2 at minimum, or more if there are null rounds.

inclusion ts <--- execution ts <--- head

Also, we should extract and reuse the event observer whcih already tracks tipsets, performs reverts, and does reconciliation by replaying a stream of reverts and applies for a particular chain path.

That component is actually not interdependent with the event index, so it can be extracted easily!

For readers of this pr: see comments below, we are going inclusion tipset.

chain/stmgr/searchwait.go

vyzo · 2023-03-12T19:46:33Z

We discussed sync, and we think that:

It is ok to use the inclusion tipset.
We can drop the execution index.

In this light, we probably need to do very little.

We can even keep the coaleacer instead of the event observer if that gets tricky to integrate. Also note that confidence can be implemented at read time as well, by comparing inclusion height with the current head.

raulk

Just had a chat with @vyzo. While having the execution tipset and the execution index would be ideal, it'll be easier to get this merged if we stick to what the discussion in #10326 ends up suggesting. So:

Let's store the inclusion tipset, which is what this logic is actually storing now (even though the field is incorrectly named as execution tipset -- fix that).
Let's remove the execution index.
Reuse the events observer, extracting it from the chain/events package since it's not coupled to events.
Set a confidence of 1 (see how the headEventsObserver does it). Not 2 since we're tracking the inclusion tipset and not the execution tipset now.
Align the code with the existing event index to avoid code entropy.

TODOs in addition to the above

(perhaps incomplete, @vyzo please add if I omitted something)

Add itests.
Add a backfill command in lotus-shed.
Make sure we're using the right sqlite pragmas to: (1) fsync on transaction commit, and (2) allow to open the sqlite db from multiple processes (so that the backfill script can operate while the node is live).
Populate the index on snapshot import (run a synthetic reconciliation?)

arajasek

Thanks for the PR! This generally looks good. I have a few open questions:

Did you try / can we try having the ChainStore own this index? I think it belongs there more, "searching back for a message" is very much a chain operation (even if its caller is currently the stmgr).
As written, we can only backfill manually. That means if your node is down for 30 mins and then rebooted, you'll be missing index entries for that period. Are we okay with that?
We currently only reconcile on startup. Did you consider triggering the reconciliation process internally if the indexer ends up in some wonky state (eg. errors while processing head change). It also might be useful for callers of the index to be able to "tell" the indexer that it's inconsistent, and trigger a reconciliation (eg. if for some reason we didn't actually find the message in the returned tipset)
On the flip-side, as written with the current user of this index, we only need the Tipset CID in the MsgInfo struct for reconciliation, a process we don't expect to need -- we don't actually need it for the lookback, because we load the "execution" tipset and pull the info out of it anyway. This means that we could drop the size of this db by ~half (one of the 2 CIDs), and have this just me a (msgCid, height) mapping. In essence we can get a much smaller database footprint (50%) at the cost of much slower / worse reconciliation (either manually inspecting every record, or just dropping any records that callers report as "bad").
I have pretty high confidence in the correctness of this thanks to how simple it is. Even so, could we consider adding a "paranoid" mode, that uses the cache, then does the lookback anyway, and screams if results don't match? This can be run by us, as well as any integration partners with high security standards for this portion of the codebase (we have some such partners).
I think it'll be the case that if this is slow, it'll slow down overall sync. Is that so, and if so can we maybe add some timing metrics?
I'm strongly in favour of having a single table with the minor duplication here 👍

chain/stmgr/searchwait.go

arajasek · 2023-03-13T19:27:07Z

chain/index/msgindex.go

+
+	db, err := sql.Open("sqlite3", dbPath)
+	if err != nil {
+		// TODO [nice to have]: automaticaly delete corrupt databases


Can we capture this (and other TODOs) in an Enhancements issue in Lotus please (but feel free to leave the TODOs in code too).

yes, of course.

Part of making the TODO was to discuss them, see what we need to do in the initial MVP, and the rest can go into a follow up issue.

chain/index/interface.go

chain/index/msgindex.go

chain/index/interface.go

chain/stmgr/searchwait.go

Stebalien · 2023-03-14T00:29:28Z

chain/index/msgindex.go

+		return xerrors.Errorf("error preparing statement: %w", err)
+	}
+
+	curTs := cs.GetHeaviestTipSet()


This is racy unless we can guarantee that the chainstore hasn't been "started" yet. Is that the case @arajasek?

Otherwise, we need a way to atomically get the head and subscribe at the same time. Alternatively, we can defer this reconcile step until we get the first event from the chainstore and reconcile then.

Actually, that second part is kind of nice because it avoids blocking on start.

Yeah, but I am not convinced it is a problem -- let's wait for @arajasek to chime in here.

Yeah, I don't think this is a problem -- the chainstore shouldn't have "started", and even if it had, I don't think I see a problem if this races?

I do agree with @Stebalien that reconciling on first event would be nicer, though, but it's non-blocking IMO.

Stebalien · 2023-03-14T00:32:43Z

chain/index/msgindex.go

+	msgIndex := &msgIndex{
+		db:     db,
+		cs:     cs,
+		sema:   make(chan struct{}, 1),


ultra-nit: this is a flag, not a semaphore.

ha! I can rename if you insist, you are technically right.

vyzo · 2023-03-14T04:04:33Z

@arajasek thanks for the review, these are some great questions.

Did you try / can we try having the ChainStore own this index? I think it belongs there more, "searching back for a message" is very much a chain operation (even if its caller is currently the stmgr).

I have considered this and I think it has great potential.
If we had the index owned by the ChainStore a whole lot of things could be significantly improved in terms of performance.

Some very basic things like GetTipsetByHeight, GetMessagesForTipset, etc can go through the index.

The main reason I didn't do it for the MVP is that it is a much more invasive change in the most consensus critical part of the system. So a different standard of applicability and testing must be adhered there, and I'd rather do it in a follow up.

The main question is whether it needs to become synchronous in terms of head change notification processing in order to make ChainStore keep the authoritative source of truth invariant. We discussed this with @Stebalien and we think it doesn't have to, we can check if the current tipset (or the one we are based on in general) is in the index, and in that case proceed and use it -- otherwise fallback to what we already have. We should probably tweak down the coalescer delay a bit too, and generally experiment with it. Note that the reason for not wanting synchronous operation is that we will thrash the database with writes/deletes while the tipset is building and this is something we want to avoid for obvious reasons (perf, durability of SSDs, and so on).

At any rate, I think that having the index owned by the ChainStore is a natural progression from the MVP and a very important follow up for subsequent prs.

As written, we can only backfill manually. That means if your node is down for 30 mins and then rebooted, you'll be missing index entries for that period. Are we okay with that?

I think we are ok for MVP, but we do need to do something about it. The first thing to do (as discussed with @raulk) is to fill the index on snapshot import, and that should probably cover us. We can also consider backfilling when first creating the index as well, but I think snapshot import is a better place to handle this.

At any rate, part of the follow up plan to address this.

We currently only reconcile on startup. Did you consider triggering the reconciliation process internally if the indexer ends up in some wonky state (eg. errors while processing head change). It also might be useful for callers of the index to be able to "tell" the indexer that it's inconsistent, and trigger a reconciliation (eg. if for some reason we didn't actually find the message in the returned tipset)

Yeah, that's a good point. The problem is the synchronous nature of it, hence startup seemed like the logical place to do it. We should definitely consider expanding the adaptive capabilities in follow up, but I am not convinced it is necessary.

On the flip-side, as written with the current user of this index, we only need the Tipset CID in the MsgInfo struct for reconciliation, a process we don't expect to need -- we don't actually need it for the lookback, because we load the "execution" tipset and pull the info out of it anyway. This means that we could drop the size of this db by ~half (one of the 2 CIDs), and have this just me a (msgCid, height) mapping. In essence we can get a much smaller database footprint (50%) at the cost of much slower / worse reconciliation (either manually inspecting every record, or just dropping any records that callers report as "bad").

I don't think we should do that, having the tipset cid in the index opens up great opportunities. See the notes above about ChainStore ownership, but at the very least we can return the execution tipset this way in addition to the inclsion tipset on queries if we have it. I left a TODO for an optimization follow up in the search implementation.

I have pretty high confidence in the correctness of this thanks to how simple it is. Even so, could we consider adding a "paranoid" mode, that uses the cache, then does the lookback anyway, and screams if results don't match? This can be run by us, as well as any integration partners with high security standards for this portion of the codebase (we have some such partners).

That's probably a good idea, and it will definitely helps us build confidence harder.

I think it'll be the case that if this is slow, it'll slow down overall sync. Is that so, and if so can we maybe add some timing metrics?

I don't think it will be slow (sqlite is an amazing piece of software), but you are absolutely right that we should have some timing measurements. I will add some logging to that regard.

I'm strongly in favour of having a single table with the minor duplication here +1

yeah, it makes things so much simpler! And it is not that big of a waste, that 2x is not that bad.

Check that receipt is non nil, and the message was indeed found.

vyzo · 2023-03-16T15:54:25Z

Updates:

lotus-shed now expands the repo dir to avoid the tilde footgun
added an ON CONFLICT clause in the table for the message cid, to avoid conflicts from stuff inserted manually with shed
addressed @arajasek review comments.

arajasek · 2023-03-20T14:43:13Z

I think this is good to land, but I would like to see the "followup" plan laid out soon. Can you quickly write that up / open some issues, @vyzo? Stuff that we've discussed (in decreasing order of importance) includes:

PR filling the index on snapshot import (basically a must-have)
issue describing moving the index into chainstore (and challenges / open qs described above)
issues for the TODOs in code (happy to have the TODOs too, but we need issues to track them)
issue / PR for "paranoid" mode

(and I'm sure you have more next steps ideas in your 🧠)

vyzo · 2023-03-20T15:56:11Z

yessir, i shall do.

jennijuju · 2023-03-20T16:57:37Z

merge block: can we have some user documentation is the PR description? i.e: what kind of the configuration exists? any tooling to manage the db?

vyzo · 2023-03-21T14:27:48Z

merge block: can we have some user documentation is the PR description? i.e: what kind of the configuration exists? any tooling to manage the db?

@jennijuju done, updated the pr description.

vyzo · 2023-03-22T14:59:00Z

@arajasek follow up issue: #10536

scaffolding

a0e95eb

raulk reviewed Mar 11, 2023

View reviewed changes

chain/index/interface.go Outdated Show resolved Hide resolved

use tipset cid instead of key for posterity.

cc83a7c

raulk reviewed Mar 11, 2023

View reviewed changes

chain/index/interface.go Outdated Show resolved Hide resolved

vyzo added 11 commits March 11, 2023 17:25

we don't need the block

eab728c

abstract ChainStore interface

9144f9c

implementation details

3649ae3

better logger name

7fcf228

more implementation

d97c6b2

implement reconciliation

b5dd4e3

implement Close

bf9ae23

test test test

d9ca214

use the transaction Luke!

0d274df

hook the index into the rest of lotus

171734e

make gen

0077fa2

vyzo marked this pull request as ready for review March 12, 2023 13:36

vyzo requested a review from a team as a code owner March 12, 2023 13:36

vyzo changed the title ~~[WIP] Message Index~~ feat:chain: Message Index Mar 12, 2023

lint

1415391

lint lint

vyzo force-pushed the vyzo/feat/chain/msgindex branch from 40d570b to 1415391 Compare March 12, 2023 13:50

vyzo added 2 commits March 12, 2023 16:02

dummy index for itests

3b765a3

more lint

88d7a4e

raulk added this to the Eth RPC perf + fixes milestone Mar 12, 2023

raulk added the release/backport label Mar 12, 2023

raulk reviewed Mar 12, 2023

View reviewed changes

chain/index/msgindex.go Outdated Show resolved Hide resolved

raulk reviewed Mar 12, 2023

View reviewed changes

chain/stmgr/searchwait.go Outdated Show resolved Hide resolved

raulk reviewed Mar 12, 2023

View reviewed changes

arajasek reviewed Mar 13, 2023

View reviewed changes

Stebalien reviewed Mar 14, 2023

View reviewed changes

vyzo added 10 commits March 16, 2023 17:17

shed: expand homedir in repo path for msgindex tools

1676d51

add ON CONFLICT REPLACE clause in messages

91fccc4

chain errors in searchForIndexedMsg

8bceaad

fix typos

90c4763

complain if head change processing is building backlog

ff22a46

shut down the index if there is an error during head processing

1e7f5c6

second error variable name to avoid confusing they yushie.

9c087cc

reword funny comment

ef2f2b0

add TODO for WaitForMessage to use the index

3710768

add sanity check in SearchForIndexedMsg call site

883bbf8

Check that receipt is non nil, and the message was indeed found.

arajasek approved these changes Mar 20, 2023

View reviewed changes

jennijuju removed the release/backport label Mar 21, 2023

vyzo mentioned this pull request Mar 22, 2023

Epic: Message Index #10536

Closed

8 tasks

vyzo merged commit 8abe0ea into master Mar 22, 2023

vyzo deleted the vyzo/feat/chain/msgindex branch March 22, 2023 15:02

This was referenced Mar 22, 2023

Populate the index on snapshot import #10537

Closed

feat: populate the index on snapshot import #10556

Merged

Accelerate StateWaitForMessage, in addition to StateSearchMessage. #10583

Closed

fridrik01 mentioned this pull request Apr 13, 2023

Investigate using the message index in the ChainStore #10663

Closed

jennijuju mentioned this pull request May 26, 2023

Add docs for msg index filecoin-project/lotus-docs#582

Closed

elvin-du mentioned this pull request Jul 6, 2023

[venus] snapshot index perf opt / snapshot index 优化 filecoin-project/venus#5880

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat:chain: Message Index #10452

feat:chain: Message Index #10452

vyzo commented Mar 11, 2023 •

edited

raulk left a comment

raulk Mar 12, 2023

vyzo Mar 12, 2023

raulk Mar 12, 2023

vyzo Mar 13, 2023

magik6k Mar 13, 2023

vyzo Mar 13, 2023

vyzo Mar 13, 2023

raulk left a comment

raulk Mar 12, 2023

vyzo Mar 13, 2023

vyzo commented Mar 12, 2023 •

edited

raulk left a comment •

edited by vyzo

arajasek left a comment •

edited

arajasek Mar 13, 2023

vyzo Mar 14, 2023

Stebalien Mar 14, 2023

Stebalien Mar 14, 2023

vyzo Mar 14, 2023

arajasek Mar 20, 2023

Stebalien Mar 14, 2023

vyzo Mar 14, 2023

vyzo commented Mar 14, 2023

vyzo commented Mar 16, 2023

arajasek commented Mar 20, 2023 •

edited

vyzo commented Mar 20, 2023

jennijuju commented Mar 20, 2023

vyzo commented Mar 21, 2023

vyzo commented Mar 22, 2023

feat:chain: Message Index #10452

feat:chain: Message Index #10452

Conversation

vyzo commented Mar 11, 2023 • edited

Overview

How to enable

Management Tools

Related Issue

Design Notes

raulk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented Mar 12, 2023 • edited

raulk left a comment • edited by vyzo

Choose a reason for hiding this comment

TODOs in addition to the above

arajasek left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyzo commented Mar 14, 2023

vyzo commented Mar 16, 2023

arajasek commented Mar 20, 2023 • edited

vyzo commented Mar 20, 2023

jennijuju commented Mar 20, 2023

vyzo commented Mar 21, 2023

vyzo commented Mar 22, 2023

vyzo commented Mar 11, 2023 •

edited

vyzo commented Mar 12, 2023 •

edited

raulk left a comment •

edited by vyzo

arajasek left a comment •

edited

arajasek commented Mar 20, 2023 •

edited