-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update WriteBatch::AssignTimestamp API #9205
Conversation
This pull request was exported from Phabricator. Differential Revision: D31735186 |
This pull request was exported from Phabricator. Differential Revision: D31735186 |
b7a3e74
to
4f3e6e7
Compare
This pull request was exported from Phabricator. Differential Revision: D31735186 |
4f3e6e7
to
3b35bde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @riversand963 !
This pull request was exported from Phabricator. Differential Revision: D31735186 |
3b35bde
to
a4d5506
Compare
This pull request was exported from Phabricator. Differential Revision: D31735186 |
a4d5506
to
32b2d3b
Compare
Thanks @ltamasi for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just have some minor comments. Thanks @riversand963 !
This pull request was exported from Phabricator. Differential Revision: D31735186 |
32b2d3b
to
72b7c43
Compare
This pull request was exported from Phabricator. Differential Revision: D31735186 |
72b7c43
to
ef24532
Compare
Summary: Pull Request resolved: facebook#9205 Update WriteBatch::AssignTimestamp() APIs so that they take an additional argument, i.e. a function object called `checker` indicating the user-specified logic of performing checks on timestamp sizes. WriteBatch is a building block used by multiple other RocksDB components, each of which may track timestamp information in different data structures. For example, transaction can either write to `WriteBatchWithIndex` which is a `WriteBatch` with index, or write directly to raw `WriteBatch` if `Transaction::DisableIndexing()` is called. `WriteBatchWithIndex` keeps mapping from column family id to comparator, and transaction needs to keep similar information for the `WriteBatch` if user calls `Transaction::DisableIndexing()` (dynamically) so that we will know the size of each timestamp later. The bookkeeping info maintained by `WriteBatchWithIndex` and `Transaction` should not overlap. When we later call `WriteBatch::AssignTimestamp()`, we need to use these data structures to guarantee that we do not accidentally assign timestamps for keys from column families that disable timestamp. Reviewed By: ltamasi Differential Revision: D31735186 fbshipit-source-id: 7e8dd4af5dcb85e1368e98b10b222a241c20f91b
This pull request was exported from Phabricator. Differential Revision: D31735186 |
ef24532
to
48efc98
Compare
The Linter error is a warning about memcpy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very awkward and ad-hoc addition. Why not have the column families know their own timestamp size to do their own checking, rather than requiring every user to awkwardly opt-in to checking? Then any read or write op can use the same info to validate timestamp sizes.
|
||
// Experimental. | ||
// Assign timestamp to write batch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean overwrite the timestamp of all existing entries in the write batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all entries. If a key is for a column family that disables timestamp, it will skip this key. Since all keys in the write batch, if enable timestamp, share the same timestamp, I think it's OK to say "assign a timestamp to the write batch".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all keys in the write batch, if enable timestamp, share the same timestamp
Where does this come from? The API documentation for WriteBatch::Put suggest you can assign a timestamp per key.
Regarding that: the API doc for Status Put(ColumnFamilyHandle* column_family, const Slice& key, const Slice& value)
suggests it supports timestamp, which there is no test for that I can find.
From API:
as long as
key
points to a contiguous buffer with timestamp appended after user key.
The way this is phrased suggests to me that space for the timestamp has to be part of the buffer pointed to by the Slice, but beyond the bounds (size) of the Slice. (To match DB::Put, key Slice size does not include timestamp.)
Also,
as long as the timestamp is the last Slice in (SliceParts)
key
I don't see such a limitation in the implementation, and it seems to break the intended flexibility / encapsulation provided by SliceParts: the user of SliceParts should treat it logically as one contiguous data buffer. ("A set of Slices that are virtually concatenated together.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are talking about AssignTimestamp(const Slice& ts)
not Put
. The result of this API is that all keys, if enable timestamp, will have the same timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of this API is that all keys, if enable timestamp, will have the same timestamp.
What about keys added after AssignTimestamp
? Your description doesn't make it clear. If the data model were more clear to the API user, this might be obvious, but the data model with timestamps is not clear from the API comments. The name "AssignTimestamp" alone (not to mention "Assign timestamp to write batch") suggests a data model in which a timestamp is tracked separately from individual entries to apply to them at Write time. I had to look at the implementations to understand the contract for AssignTimestamp(s) (and Put).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keys that do not exist in the WB when calling the AssignTimestamp
API will obviously not have their timestamp set to the value. We can make it clear in the API comments.
Since the WriteBatch::Handler
is public, user can always define their own handler, call Iterate()
and then insert new keys after that. Therefore, your concern applies to the entire WriteBatch::Handler
, and I do not have a good way of preventing that.
#include <atomic> | ||
#include <functional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed if not using std::function? (Would std::function be better?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove #include<functional>
.
// This requires that all keys, if enable timestamp, (possibly from multiple | ||
// column families) in the write batch have timestamps of the same format. | ||
// checker: callable object to check the timestamp sizes of column families. | ||
// User can call checker(uint32_t cf, size_t& ts_sz) which does the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User can call
This seems like the user is providing the function (of course someone can call their own function(!)), and the implementation expects this behavior from the provided function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's expected from user if they opt in.
// 2. if cf's timestamp size is 0, then set ts_sz to 0 and return OK. | ||
// 3. otherwise, compare ts_sz with cf's timestamp size and return | ||
// Status::InvalidArgument() if different. | ||
template <typename Checker = TimestampChecker> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the Checker aspect of this API is effectively for internal use only, because users would have to provide their own implementation of non-default template instantiation.
If you choose to store extra information in each I have thought about storing just the |
Also keep in mind that we cannot assume the number of column families is small and the space overhead is minimal. |
Why not either
By the way, I had to look at the implementation of |
And/or checking on subsequent |
This is suboptimal. Some users, e.g. WriteBatchWithIndex already tracks cf->comparator mapping. Why do we need to consume extra space and CPU to build another set of ColumnFamilyHandles? Consider another case, a transaction tracks the cf->comparator mapping for certain writes, but writes to the raw WriteBatch for others.
Same. If many column families share the same timestamp size, this can waste space. I do not need a map from cf->timestamp, I just need a (unordered) set. There is a plan to improve WriteBatch::Put(), Delete(), etc. APIs, as part of #8946 (TBD). After that or as part of that, we can improve the documentation. |
Transaction can use internal APIs with fewer safeguards against user error. It looks like this functionality could have been added to WriteBatchInternal.
Why? NewIterator and other functions take a ColumnFamilyHandle, so you have access to the comparator whenever needed without saving to the WBWI, right? Even with all entries in one skip list, you do not need comparator for column family n + 1 to know that an entry in column family n comes before it. But to avoid getting distracted by design critiques and performance concerns, let's re-focus on a key point: HISTORY.md
This is false . The feature only works if including internal header db/write_batch_internal.h. |
Not really.
This is and should be a backward-compatible public API change. We should allow users to update/set timestamps of keys in a write batch which is already a public API. In many cases, you do NOT know the timestamp until you decide to commit the write batch. Given the current implementation, I believe the checking should be user-defined too. If we are accurate and explicit about the behavior of |
Thinking about this more to bridge the gap between current code and review comments from @pdillinger . One solution may be removing the |
I am also not a big fan of that, but it's also used to index into the |
I don't think that helps the public API user. prot_info_ is private, and ProtectionInfo, where the indexing occurs, is not defined in the public API.
I think that would be better. |
Even with #9278 it's still an awkward API, with unclear/inconsistent documentation. Why does it say "user can call checker" rather than "AssignTimestamp calls checker"? What does it mean to "Assign timestamp to write batch" if some entries are in CFs without timestamp? And (repeating myself) does the timestamp apply to future entries added to the write batch? And if it's called "assign" instead of "overwrite," does that mean the entries did not or should not have timestamps before call? "ret: OK if assignment succeeds" and "if cf's timestamp size is 0, then set ts_sz to 0 and return OK" seem inconsistent. If there is no timestamp for an entry's CF, there is no timestamp assignment for that entry. If you are intending that 'checker' could be rather general/stable for use across many WriteBatches and potentially other APIs to be added to WriteBatch, why is it tied to the notion of checking the particular semantics for AssignTimestamp/AssignTimestamp (which BTW happen to be different from each other in handling of no timestamp case)? Why not just have the user provide a function that takes cf_id and returns expected timestamp size (or SIZE_MAX for "I don't know but fail")? What's wrong with that simpler callback? And should we have the user provide this function at WriteBatch construction time rather than at AssignTimestamp time? If you are intending the user to exercise more flexibility in the implementation of their checker (e.g. I don't know, skip some CFs with timestamp?), why is there no flexibility in the specified behavior of checker? |
@pdillinger |
Summary:
Update WriteBatch::AssignTimestamp() APIs so that they take an
additional argument, i.e. a function object called
checker
indicating the user-specified logic of performingchecks on timestamp sizes.
WriteBatch is a building block used by multiple other RocksDB components, each of which may track
timestamp information in different data structures. For example, transaction can either write to
WriteBatchWithIndex
which is aWriteBatch
with index, or write directly to rawWriteBatch
ifTransaction::DisableIndexing()
is called.WriteBatchWithIndex
keeps mapping from column family id to comparator, and transaction needsto keep similar information for the
WriteBatch
if user callsTransaction::DisableIndexing()
(dynamically)so that we will know the size of each timestamp later. The bookkeeping info maintained by
WriteBatchWithIndex
and
Transaction
should not overlap.When we later call
WriteBatch::AssignTimestamp()
, we need to use these data structures to guaranteethat we do not accidentally assign timestamps for keys from column families that disable timestamp.
Differential Revision: D31735186