New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvdb+etcd+tests: change etcd flattened bucket key derivation to make it compatible with bbolt #4411
Conversation
2125a37
to
7329734
Compare
c7c9e82
to
488debf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look pretty good, definitely makes sense now that prefixing would produce inconsistent orderings 👍
main comment is that we should move the new test to the kvdb interface level so that we are positive it actually matches the bbolt implementation.
02026f6
to
9c62a42
Compare
|
||
// Use the prefetched value only if it is for | ||
// an existing key. | ||
if getValue.rev != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we assume bucket keys always have a revision of 0 since they're never modified after initial creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea is that if a key is non-existent then its ModVersion
is always zero which we use to assert upon commit that certain keys (bucket/value) are non existent.
In this particular case we check if the prefetch set contains the key we're trying to read with a ModVersion
of 0. The reason such key might be in the prefetch is because upon commit we update the read set values with the transaction response and use that as our prefetch set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ⛩
channeldb/kvdb/etcd/stm.go
Outdated
rset := s.rset.cmps(s.lset) | ||
// Remove excluded keys from the read set. | ||
for key := range s.xset { | ||
delete(s.rset, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's clear what this does, but it's not totally obvious why we would do this. if the bucket is accessed during the transaction, it's not clear to me why it should be removed from the read set and how we can guarantee consistency with this change.
is this commit necessary to correct the range iteration? seems maybe like an orthogonal optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this offline. To summarize: exclude set while in theory would reduce the cmp set and thereby make transactions less likely to collide are dangerous when another transaction tries to delete a bucket that is excluded in the current transaction. As a conclusion I removed the exclude set concept completely (while still keeping the changes for removing the lock set which is still wrong due to increasing contention on top level buckets).
0c15c47
to
c379c9c
Compare
This commit changes the key derivation algo we use to emulate buckets similar to bbolt. The issue with prefixing keys with either a bucket or a value prefix is that the cursor couldn't effectively iterate trough all keys in a bucket, as it skipped the bucket keys. While there are multiple ways to fix that issue (eg. two pointers, iterating value keys then bucket keys, etc), the cleanest is to instead of prefixes in keys we use a postfix indicating whether a key is a bucket or a value. This also simplifies all operations where we (recursively) iterate a bucket and is equivalent with the prefixing key derivation with the addition that bucket and value keys are now continous.
… tests This commit moves makeTestDB to db.go and exports it so that we'll be able to use this function in other unit tests to make them testable with etcd if needed.
This commit removes the lock set which was used to only add bucket keys to the tx predicate while also bumping their mod version. This was useful to reduce the size of the compare set but wasn't useful to reduce contention as top level buckets were always in the lock set.
9780956
to
aecccb6
Compare
e835d8b
to
57fc106
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
This commit extends compatibility with the bbolt kvdb implementation, which returns ErrIncompatibleValue in case of a bucket/value key collision. Furthermore the commit also adds an extra precondition to the transaction when a key doesn't exist. This is needed as we fix reads to a snapshot revision and other writers may commit the key otherwise.
This PR is part of a multi PR effort to speed up our etcd wrapper and make all tests succeed:
Key derivation: #4411
Range cache: #4433
Integration tests: #4402
This PR enables testing some remaining packages that use channeldb on etcd backend.
Furthermore the PR also changes the flattened bucket key derivation such that the bucket/value prefix becomes a postfix instead. This fixes incompatibility with the bbolt backend, where buckets containing bucket keys were not picked up by our cursor wrappers. The issue was discovered on while investigating failing nursery tests which are also migrated to be able to run on etcd too.