-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immediate put version should be kept before full_history_ts_low to guarantee atomicity #9106
Comments
Could you elaborate on this? Currently, the purpose of |
To clarify, transaction here means building transaction on top of RocksDB instead of RocksDB's own transaction layer. To reserve spaces, applications use MVCC will usually only keep partial history of a key. That's the exact purpose of
It doesn't work. The example given in the issue doesn't involve any on-going transactions, all queries are performed after updates are finished and compaction is done. |
Thanks @BusyJay for catching this. I don't think the current behavior is intended. It is a bug triggered when there are two different versions of the same user key, one is larger than or equal to full_history_ts_low, while the other is smaller than full_history_ts_low. The correct behavior should be that we still keep the largest version below full_history_ts_low. Its timestamp may be zeroed out, though. |
I am going to do the fix in #9116. |
Closing since the fix is in master and will be available in next release. |
Expected behavior
One usage of user timestamp is to provide transaction functionality. Same transactions usually are traced with the same user timestamp. Supposing four timestamps t1 < t2 < t3 < t4, and three transactions have been committed as follow:
And
full_history_ts_low
is set to t3. To meet the requirement of Atomicity, a transaction should either all be read or none be read. So following query results should be guaranteed:Actual behavior
Because rocksdb will delete all keys before
full_history_ts_low
unless it's the largest version, so the query 1 will only return v2 and None for both k1 and k3. Query 2 result is meet.Whether the behavior is correct depends how we define
full_history_ts_low
. Iffull_history_ts_low
is defined as the minimal available readable version, then current behavior is correct. But that will makefull_history_ts_low
useless for correct transactions usage. And application needs to develop its own gc algorithm. If it's defined as the minimal timestamp that guarantee ACID, then current behavior is wrong. The immediate put version should be kept beforefull_history_ts_low
. And user timestamp will be a drop in design for common transaction timestamp.The text was updated successfully, but these errors were encountered: