Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Raft index to determine if database needs auto-backup #1594

Closed
otoolep opened this issue Jan 13, 2024 · 11 comments · Fixed by #1599 or #1600
Closed

Use Raft index to determine if database needs auto-backup #1594

otoolep opened this issue Jan 13, 2024 · 11 comments · Fixed by #1599 or #1600

Comments

@otoolep
Copy link
Member

otoolep commented Jan 13, 2024

Started with #1585

@jtackaberry -- thinking more about your suggestion. The initial upload check is difficult to line up with the Raft log. It could be done, but would involve a bunch of code throughout the system. But I do think it would be worth reducing the initial check to just sum generation, and not dump-and-sum. This can be done easily enough, at the cost of a read (but that's faster than reading and writing). If the sum differs then just dump.

I've also thought about changing to md5, sha265 is probably too much -- and slower.

@jtackaberry
Copy link
Contributor

This can be done easily enough, at the cost of a read (but that's faster than reading and writing). If the sum differs then just dump.

It's unfortunate there's no way to get this from the database file itself. I got a bit excited when I skimmed over pragma data_version but, no, that's not suitable.

What will be read? The underlying sqlite file or querying the data via the sqlite API? If the former, is it reliable that the sqlite file will be bit-identical across nodes (once the WAL is checkpointed obviously)?

In any case, I agree the read-only check first should be significantly faster than a dump-and-sum and it's worth avoiding that when it's unnecessary for larger datasets (especially those that don't change much).

I've also thought about changing to md5, sha265 is probably too much -- and slower.

If you're looking to change to md5 for performance reasons, you might look at hash/fnv instead. FNV-1a should be a bit faster still than MD5.

BTW, where does compression happen on upload nowadays? gzip logic was removed from uploader.go in aa2cf2d but it's not clear how compression works now.

@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

It's unfortunate there's no way to get this from the database file itself. I got a bit excited when I skimmed over pragma data_version but, no, that's not suitable.

No, in principle it would have to come from a read of the Raft log, since its indices are the only shared authoritative knowledge across the cluster. You'd need to know Raft log index was associated with a specific, and upload that with the data to S3. But threading that info through to the uploader might be complicated. Furthermore, not every index in the Raft log represents an actual change to the database (STRONG read, for example). So next you have to weed out non-database-affecting log entries. It starts to get complicated.

What will be read? The underlying sqlite file or querying the data via the sqlite API? If the former, is it reliable that the sqlite file will be bit-identical across nodes (once the WAL is checkpointed obviously)?

Actually, that's a good point. Not necessarily, no, not bit-for-bit. The logical contents of the database will be the same, but checkpoint can take place at different place on each node. Hmm, maybe my idea woudn't work after all. Even my current optimization probably doesn't work as well as I hoped. Good catch. We might be stuck with a dump-and-sum after a Leader change. Well, at least it's robust.

BTW, where does compression happen on upload nowadays? gzip logic was removed from uploader.go in aa2cf2d but it's not clear how compression works now.

Here: https://github.com/rqlite/rqlite/blob/v8.16.4/store/store.go#L1250 (the Backup() method is called by the Provider).

@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

We might be stuck with a dump-and-sum after a Leader change. Well, at least it's robust.

And even that wouldn't help. While the database may logically be the same, underlying file may not be bit-for-bit identical. I reckon we will still get a fair amount of unnecessary uploads for the reasons you state. Hmmm, I'll need to think about this. Because you are right that the only bulletproof logic is "this database data in the Cloud is associated with this Raft log index, and if there are committed log entries later than that index which actually change the local database, an upload is needed".

It could be done, but I need to think whether it's worth the effort, relative to everything else I could work on in rqlite right now.....hmmm.

@otoolep otoolep changed the title Only do checksum check (not dump) when checking for initial upload Use Raft index to determine if database needs auto-backup Jan 13, 2024
@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

Changing title to reflect what is the only bullet-proof fix to this issue. It might go something like this:

  • The system (store) would need to track which Raft index last resulted in an actual change to the database. Call that index LX.
  • When the Store provides a copy of the database to the Uploader, it would also provide LX.
  • The Uploader would store the LX in the metadata it would upload along with the object
  • One of the conditions for upload would then become "has there been an entry committed to the Raft log that is later than LX, and is of a type which changes the database? If so, upload required."
  • When a node wants to learn LX, it fetches it from the storage system. From then on it keeps a local copy of the value.
  • When a node becomes Leader, it needs to zero out any previous value it had for LX, as it may be invalid.

@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

Actually, this mightn't be as change-intensive as I thought. Playing around with an idea here: #1596

@jtackaberry
Copy link
Contributor

I find all this extremely relatable: have some random idea, be not entirely sure if it's worth the effort in practice, but some aspect of begins to pique your technical curiosity, begin dwelling on it more and develop clarity on how to do it, see that from a technical purity perspective it's clearly correct but still probably ensure about whether it's worth doing in practice, yet still ultimately compelled to do it because it's The Right Way.

Or at least that's my interpretation of the above. Possibly I'm just projecting. :)

@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

Or at least that's my interpretation of the above. Possibly I'm just projecting. :)

Yes, that tends to be how it goes! Thanks for pushing on this, I believe it will be better in the end.

@otoolep otoolep linked a pull request Jan 13, 2024 that will close this issue
@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

#1599 adds an end-to-end test to check that a Leader change doesn't result in an upload. It may actually pass because the SQLite file under the node that becomes Leader actually is bit-for-bit the same. In such a simple case as this it's not surprising. In fact from looking at the SQLite file format doc, it may be possible to prove that the files will always be bit-for-bit the same after the WAL has been checkpointed (which is what happens just before a backup). I'm not sure.

However, I plan to add the Raft index based check anyway, since that should be bullet-proof. I can force the SQLite to be different so the node fails the hash check by modifying it directly for the purposes of testing.

@otoolep
Copy link
Member Author

otoolep commented Jan 13, 2024

This is not actually done yet.

@otoolep otoolep reopened this Jan 13, 2024
@otoolep otoolep linked a pull request Jan 13, 2024 that will close this issue
@otoolep
Copy link
Member Author

otoolep commented Jan 14, 2024

All done, thanks again @jtackaberry -- this should be more solid, and actually was more elegant in the end. There may still be the odd case where an upload happens without being strictly necessary, but I think those will be few and far between.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants