Use Raft index to determine if database needs auto-backup #1594

otoolep · 2024-01-13T00:31:31Z

Started with #1585

@jtackaberry -- thinking more about your suggestion. The initial upload check is difficult to line up with the Raft log. It could be done, but would involve a bunch of code throughout the system. But I do think it would be worth reducing the initial check to just sum generation, and not dump-and-sum. This can be done easily enough, at the cost of a read (but that's faster than reading and writing). If the sum differs then just dump.

I've also thought about changing to md5, sha265 is probably too much -- and slower.

jtackaberry · 2024-01-13T03:16:02Z

This can be done easily enough, at the cost of a read (but that's faster than reading and writing). If the sum differs then just dump.

It's unfortunate there's no way to get this from the database file itself. I got a bit excited when I skimmed over pragma data_version but, no, that's not suitable.

What will be read? The underlying sqlite file or querying the data via the sqlite API? If the former, is it reliable that the sqlite file will be bit-identical across nodes (once the WAL is checkpointed obviously)?

In any case, I agree the read-only check first should be significantly faster than a dump-and-sum and it's worth avoiding that when it's unnecessary for larger datasets (especially those that don't change much).

I've also thought about changing to md5, sha265 is probably too much -- and slower.

If you're looking to change to md5 for performance reasons, you might look at hash/fnv instead. FNV-1a should be a bit faster still than MD5.

BTW, where does compression happen on upload nowadays? gzip logic was removed from uploader.go in aa2cf2d but it's not clear how compression works now.

otoolep · 2024-01-13T05:29:16Z

It's unfortunate there's no way to get this from the database file itself. I got a bit excited when I skimmed over pragma data_version but, no, that's not suitable.

No, in principle it would have to come from a read of the Raft log, since its indices are the only shared authoritative knowledge across the cluster. You'd need to know Raft log index was associated with a specific, and upload that with the data to S3. But threading that info through to the uploader might be complicated. Furthermore, not every index in the Raft log represents an actual change to the database (STRONG read, for example). So next you have to weed out non-database-affecting log entries. It starts to get complicated.

What will be read? The underlying sqlite file or querying the data via the sqlite API? If the former, is it reliable that the sqlite file will be bit-identical across nodes (once the WAL is checkpointed obviously)?

Actually, that's a good point. Not necessarily, no, not bit-for-bit. The logical contents of the database will be the same, but checkpoint can take place at different place on each node. Hmm, maybe my idea woudn't work after all. Even my current optimization probably doesn't work as well as I hoped. Good catch. We might be stuck with a dump-and-sum after a Leader change. Well, at least it's robust.

BTW, where does compression happen on upload nowadays? gzip logic was removed from uploader.go in aa2cf2d but it's not clear how compression works now.

Here: https://github.com/rqlite/rqlite/blob/v8.16.4/store/store.go#L1250 (the Backup() method is called by the Provider).

otoolep · 2024-01-13T05:29:40Z

https://github.com/rqlite/rqlite/blob/v8.16.4/store/provider.go#L58

otoolep · 2024-01-13T05:35:19Z

We might be stuck with a dump-and-sum after a Leader change. Well, at least it's robust.

And even that wouldn't help. While the database may logically be the same, underlying file may not be bit-for-bit identical. I reckon we will still get a fair amount of unnecessary uploads for the reasons you state. Hmmm, I'll need to think about this. Because you are right that the only bulletproof logic is "this database data in the Cloud is associated with this Raft log index, and if there are committed log entries later than that index which actually change the local database, an upload is needed".

It could be done, but I need to think whether it's worth the effort, relative to everything else I could work on in rqlite right now.....hmmm.

otoolep · 2024-01-13T13:26:32Z

Changing title to reflect what is the only bullet-proof fix to this issue. It might go something like this:

The system (store) would need to track which Raft index last resulted in an actual change to the database. Call that index LX.
When the Store provides a copy of the database to the Uploader, it would also provide LX.
The Uploader would store the LX in the metadata it would upload along with the object
One of the conditions for upload would then become "has there been an entry committed to the Raft log that is later than LX, and is of a type which changes the database? If so, upload required."
When a node wants to learn LX, it fetches it from the storage system. From then on it keeps a local copy of the value.
When a node becomes Leader, it needs to zero out any previous value it had for LX, as it may be invalid.

otoolep · 2024-01-13T15:02:34Z

Actually, this mightn't be as change-intensive as I thought. Playing around with an idea here: #1596

jtackaberry · 2024-01-13T17:20:18Z

I find all this extremely relatable: have some random idea, be not entirely sure if it's worth the effort in practice, but some aspect of begins to pique your technical curiosity, begin dwelling on it more and develop clarity on how to do it, see that from a technical purity perspective it's clearly correct but still probably ensure about whether it's worth doing in practice, yet still ultimately compelled to do it because it's The Right Way.

Or at least that's my interpretation of the above. Possibly I'm just projecting. :)

otoolep · 2024-01-13T19:22:21Z

Or at least that's my interpretation of the above. Possibly I'm just projecting. :)

Yes, that tends to be how it goes! Thanks for pushing on this, I believe it will be better in the end.

otoolep · 2024-01-13T19:25:18Z

#1599 adds an end-to-end test to check that a Leader change doesn't result in an upload. It may actually pass because the SQLite file under the node that becomes Leader actually is bit-for-bit the same. In such a simple case as this it's not surprising. In fact from looking at the SQLite file format doc, it may be possible to prove that the files will always be bit-for-bit the same after the WAL has been checkpointed (which is what happens just before a backup). I'm not sure.

However, I plan to add the Raft index based check anyway, since that should be bullet-proof. I can force the SQLite to be different so the node fails the hash check by modifying it directly for the purposes of testing.

otoolep · 2024-01-13T21:33:48Z

This is not actually done yet.

otoolep · 2024-01-14T04:46:12Z

All done, thanks again @jtackaberry -- this should be more solid, and actually was more elegant in the end. There may still be the odd case where an upload happens without being strictly necessary, but I think those will be few and far between.

otoolep changed the title ~~Only do checksum check (not dump) when checking for initial upload~~ Use Raft index to determine if database needs auto-backup Jan 13, 2024

otoolep mentioned this issue Jan 13, 2024

Refactor Store-level index tracking #1598

Merged

otoolep linked a pull request Jan 13, 2024 that will close this issue

Test no upload after Leader change #1599

Merged

otoolep closed this as completed in #1599 Jan 13, 2024

otoolep reopened this Jan 13, 2024

otoolep linked a pull request Jan 13, 2024 that will close this issue

Use Raft index for upload checks #1600

Merged

otoolep closed this as completed in #1600 Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Raft index to determine if database needs auto-backup #1594

Use Raft index to determine if database needs auto-backup #1594

otoolep commented Jan 13, 2024

jtackaberry commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024 •

edited

otoolep commented Jan 13, 2024

jtackaberry commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024 •

edited

otoolep commented Jan 13, 2024

otoolep commented Jan 14, 2024

Use Raft index to determine if database needs auto-backup #1594

Use Raft index to determine if database needs auto-backup #1594

Comments

otoolep commented Jan 13, 2024

jtackaberry commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024 • edited

otoolep commented Jan 13, 2024

jtackaberry commented Jan 13, 2024

otoolep commented Jan 13, 2024

otoolep commented Jan 13, 2024 • edited

otoolep commented Jan 13, 2024

otoolep commented Jan 14, 2024

otoolep commented Jan 13, 2024 •

edited

otoolep commented Jan 13, 2024 •

edited