-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Raft index to determine if database needs auto-backup #1594
Comments
It's unfortunate there's no way to get this from the database file itself. I got a bit excited when I skimmed over What will be read? The underlying sqlite file or querying the data via the sqlite API? If the former, is it reliable that the sqlite file will be bit-identical across nodes (once the WAL is checkpointed obviously)? In any case, I agree the read-only check first should be significantly faster than a dump-and-sum and it's worth avoiding that when it's unnecessary for larger datasets (especially those that don't change much).
If you're looking to change to md5 for performance reasons, you might look at hash/fnv instead. FNV-1a should be a bit faster still than MD5. BTW, where does compression happen on upload nowadays? gzip logic was removed from uploader.go in aa2cf2d but it's not clear how compression works now. |
No, in principle it would have to come from a read of the Raft log, since its indices are the only shared authoritative knowledge across the cluster. You'd need to know Raft log index was associated with a specific, and upload that with the data to S3. But threading that info through to the uploader might be complicated. Furthermore, not every index in the Raft log represents an actual change to the database (STRONG read, for example). So next you have to weed out non-database-affecting log entries. It starts to get complicated.
Actually, that's a good point. Not necessarily, no, not bit-for-bit. The logical contents of the database will be the same, but checkpoint can take place at different place on each node. Hmm, maybe my idea woudn't work after all. Even my current optimization probably doesn't work as well as I hoped. Good catch. We might be stuck with a dump-and-sum after a Leader change. Well, at least it's robust.
Here: https://github.com/rqlite/rqlite/blob/v8.16.4/store/store.go#L1250 (the Backup() method is called by the Provider). |
And even that wouldn't help. While the database may logically be the same, underlying file may not be bit-for-bit identical. I reckon we will still get a fair amount of unnecessary uploads for the reasons you state. Hmmm, I'll need to think about this. Because you are right that the only bulletproof logic is "this database data in the Cloud is associated with this Raft log index, and if there are committed log entries later than that index which actually change the local database, an upload is needed". It could be done, but I need to think whether it's worth the effort, relative to everything else I could work on in rqlite right now.....hmmm. |
Changing title to reflect what is the only bullet-proof fix to this issue. It might go something like this:
|
Actually, this mightn't be as change-intensive as I thought. Playing around with an idea here: #1596 |
I find all this extremely relatable: have some random idea, be not entirely sure if it's worth the effort in practice, but some aspect of begins to pique your technical curiosity, begin dwelling on it more and develop clarity on how to do it, see that from a technical purity perspective it's clearly correct but still probably ensure about whether it's worth doing in practice, yet still ultimately compelled to do it because it's The Right Way. Or at least that's my interpretation of the above. Possibly I'm just projecting. :) |
Yes, that tends to be how it goes! Thanks for pushing on this, I believe it will be better in the end. |
#1599 adds an end-to-end test to check that a Leader change doesn't result in an upload. It may actually pass because the SQLite file under the node that becomes Leader actually is bit-for-bit the same. In such a simple case as this it's not surprising. In fact from looking at the SQLite file format doc, it may be possible to prove that the files will always be bit-for-bit the same after the WAL has been checkpointed (which is what happens just before a backup). I'm not sure. However, I plan to add the Raft index based check anyway, since that should be bullet-proof. I can force the SQLite to be different so the node fails the hash check by modifying it directly for the purposes of testing. |
This is not actually done yet. |
All done, thanks again @jtackaberry -- this should be more solid, and actually was more elegant in the end. There may still be the odd case where an upload happens without being strictly necessary, but I think those will be few and far between. |
Started with #1585
@jtackaberry -- thinking more about your suggestion. The initial upload check is difficult to line up with the Raft log. It could be done, but would involve a bunch of code throughout the system. But I do think it would be worth reducing the initial check to just sum generation, and not dump-and-sum. This can be done easily enough, at the cost of a read (but that's faster than reading and writing). If the sum differs then just dump.
I've also thought about changing to md5, sha265 is probably too much -- and slower.
The text was updated successfully, but these errors were encountered: