-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SST Partitioner interface that allows to split SST files #6957
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The PR is highly overlap with #5201, which I leave behind for long while (sorry!) I'm closing mine and hope this one can get merged.
include/rocksdb/sst_partitioner.h
Outdated
// Called with key that is right after the key that was stored into the SST | ||
// Returns true of partition boundary was detected and compaction should | ||
// create new file. | ||
virtual bool ShouldPartition(const Slice& key) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall this method also take current output file size as parameter, so the partitioner can also split SST by size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split by size is functionality that compaction knows and there is a lot of options around it. Do we really want to introduce another way? Isn't it confusing? Main purpose of this change is to split based on data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For our use case splitting SST is just a heuristic, not a mandatory. I want to be more flexible in that I can do "split SST based on key if the file is not too small or too large, otherwise override the behavior by size".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method meant to tell you where it is safe to partition -- like it is okay to partition between data sets -- or meant to say "you really should not partition now"? If it is the latter, should the logic be reversed (ShouldNotPartition)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mean to say "please partition now".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is "please partition now". I have put there two parameters to make it easier (removed reset)
db/compaction/compaction_job.cc
Outdated
@@ -945,6 +950,10 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) { | |||
key, value, ikey.sequence, ikey.type); | |||
sub_compact->num_output_records++; | |||
|
|||
if (partitioner.get() != nullptr) { | |||
partitioner->Reset(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge the Reset
and ShouldPartition
call into one method call? (echoing the same comment from #5201 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, but then the loop will need to copy previous key. Right now the partitioner can remember only the bit it needs. It is hard to say if that copy will take more time, but I think virtual call is pretty cheap compared to memory copy of X number of bytes. So should I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point about memcopy. But it looks like in the current approach the Reset
needs a memcopy as well. The key
is a slice obtain from c_iter
, whose lifetime is ended after c_iter->Next()
is called. The partitioner needs to make memcopy to keep it. I'm not sure why ASAN test don't complain.
In case I didn't misread the code and memcopy is inevitable, I think we can change to the following flow:
for KV in c_iter {
... // process KV
... // check other SST partition condition
if (partitioner->ShouldPartition(next_key)) {
output_file_ended = true;
}
if (output_file_ended) {
... // partition SST
partitioner->Reset(next_key);
}
}
That way the partitioner is invoke (#keys + #files) times, and it can decide on its own whether to memcopy keys per key or per file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset removed and should partition is with two arguments
@yiwu-arbug thanks for looking on the PR. |
include/rocksdb/sst_partitioner.h
Outdated
// Called with key that is right after the key that was stored into the SST | ||
// Returns true of partition boundary was detected and compaction should | ||
// create new file. | ||
virtual bool ShouldPartition(const Slice& key) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method meant to tell you where it is safe to partition -- like it is okay to partition between data sets -- or meant to say "you really should not partition now"? If it is the latter, should the logic be reversed (ShouldNotPartition)?
include/rocksdb/sst_partitioner.h
Outdated
virtual bool ShouldPartition(const Slice& key) = 0; | ||
|
||
// Called for key that was stored into the SST | ||
virtual void Reset(const Slice& key) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what this method does based on the comment. Also Reset is too close of a name to reset for std::unique_ptr. Can we pick a more descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset removed.
include/rocksdb/sst_partitioner.h
Outdated
virtual const char* Name() const = 0; | ||
}; | ||
|
||
extern SstPartitionerFactory* NewSstPartitionerFixedPrefixFactory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been introducing into classes that have potentially multiple implementations and loading a "CreateFromString" static method. You might want to check out how that is being done and see if something similar would work for you (Env::LoadEnv is a comparable method that I hope to deprecate). I am hoping to introduce the exact same CreateFromString method signature into many classes (as soon as the PRs are reviewed).
#include <memory> | ||
#include <string> | ||
|
||
#include "rocksdb/rocksdb_namespace.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this code be limited by ROCKSDB_LITE or not? I do not know what the plans for LITE is or not and if this is necessary in LITE mode or should be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know. Functionality is like 20 lines of code. Please let me know if I should not include in LITE
db/compaction/compaction_job.cc
Outdated
return false; | ||
} | ||
Slice key_fixed(key.data_, std::min(key.size_, len_)); | ||
return key_fixed.compare(last_key_) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be based on a Comparator rather than raw bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We compare prefix (number of bytes) for equality. I do not see much benefit to use comparator for this partitioner.
@koldat @mrambacher when you guys are happy with the C++ API and functionality, let me know and I will review the Java stuff :-) |
I have rebased to master head and squashed all changesets to make it clean. I have also added current file size to ShouldPartition that I have missed. |
include/rocksdb/sst_partitioner.h
Outdated
|
||
// Returns true if partition boundary was detected and compaction should | ||
// create new file. | ||
virtual bool ShouldPartition(const Slice& last_user_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we create a new options struct for ShouldPartition
parameter, so that the interface is extendable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. We have the Context there already. If you want to pass struct then we should place there also prev and current key there. Is that what you mean? I have also fixed documentation to make the purpose easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I mean. It makes the API easier to extend later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed to
virtual PartitionerResult ShouldPartition(const PartitionerRequest& request) = 0;
I have rebased and put latest review changes. I have changed the signature to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Anyone from the rocksdb team can do further review and merge? Thanks much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a follow up, but I hope this feature is added to crash_test. Start with adding an option in db_stress and add to the crash_test parameter list.
// If non-nullptr, use the specified factory for a function to determine the | ||
// partitioning of sst files. This helps compaction to split the files | ||
// on interesting boundaries (key prefixes) to make propagation of sst | ||
// files less write amplifying (covering the whole key space). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we comment and say it's a experimental feature for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@koldat if you rebase against master and add a comment for the feature is experimental, we can try to merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forward some our internal static analysis warnings. It will be good to address them.
db/compaction/sst_partitioner.cc
Outdated
public: | ||
SstPartitionerFixedPrefix(size_t len) : len_(len) {} | ||
|
||
virtual ~SstPartitionerFixedPrefix(){}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db/compaction/sst_partitioner.cc:17:41: warning: extra ';' after member function definition
virtual ~SstPartitionerFixedPrefix(){};
^
and
db/compaction/sst_partitioner.cc:17:11: warning: '~SstPartitionerFixedPrefix' overrides a destructor but is not marked 'override'
virtual ~SstPartitionerFixedPrefix(){};
^
include/rocksdb/sst_partitioner.h:46:11: note: overridden virtual function is here
virtual ~SstPartitioner(){};
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
db/compaction/sst_partitioner.cc
Outdated
size_t len_; | ||
}; | ||
|
||
class SstPartitionerFixedPrefixFactory : public SstPartitionerFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Struct and class definitions should either be defined in a header file, declared in a header file with the same base name and path as the definition file, or in an anonymous namespace to prevent possible shadowing of other classes."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
db/compaction/sst_partitioner.cc
Outdated
|
||
class SstPartitionerFixedPrefix : public SstPartitioner { | ||
public: | ||
SstPartitionerFixedPrefix(size_t len) : len_(len) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter asks us to add "explicit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
db/compaction/sst_partitioner.cc
Outdated
|
||
std::shared_ptr<SstPartitionerFactory> NewSstPartitionerFixedPrefixFactory( | ||
size_t prefix_len) { | ||
return std::shared_ptr<SstPartitionerFactory>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter asks us to use make_shared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
db/compaction/sst_partitioner.cc
Outdated
|
||
class SstPartitionerFixedPrefixFactory : public SstPartitionerFactory { | ||
public: | ||
SstPartitionerFixedPrefixFactory(size_t len) : len_(len) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. add explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
db/compaction/sst_partitioner.cc
Outdated
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
class SstPartitionerFixedPrefix : public SstPartitioner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Struct and class definitions should either be defined in a header file, declared in a header file with the same base name and path as the definition file, or in an anonymous namespace to prevent possible shadowing of other classes."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@koldat ping |
@yiwu-arbug I'll do next week. Was on vacation, sorry. |
@koldat has updated the pull request. You must reimport the pull request before landing. |
@siying I have done all review changes, squashed and rebased to master. The only question I have is about crash test. You want me to write new test in db_stress using sst_partitioner? It would be quite big task, because we need to define first what should be outcome of such a test. The partitioning is always application specific. On the other hand the only impact on compaction is just "two lines" and in case it is not enabled there is no impact. Please let me know if I should go and create something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@koldat there can be multiple levels on the partitioned about what we can validate. From a minimal level, we can just apply a prefix partitioned, enable it in stress test by say 1/3 of the time, and watch it doesn't generate wrong query results or crash the service. That's already very helpful. Validating logic correctness is lower priority. Again, I still hope the stress test to be done, but can be done as a follow-up PR. |
@koldat has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@koldat sorry but after rebasing the branch to latest master, it seems to show more errors. Can you help fix them? |
…titioning function
@koldat has updated the pull request. You must reimport the pull request before landing. |
@siying I have removed your merge, rebased to master and fixed the build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This pull request has been merged in cd4592c. |
2 similar comments
This pull request has been merged in cd4592c. |
This pull request has been merged in cd4592c. |
Thank you for your contribution! |
…) (#184) Summary: SST Partitioner interface that allows to split SST files during compactions. It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space). Pull Request resolved: facebook#6957 Reviewed By: ajkr Differential Revision: D22461239 fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e Signed-off-by: Yi Wu <yiwu@pingcap.com> Co-authored-by: Tomas Kolda <koldat@gmail.com>
Summary: SST Partitioner interface that allows to split SST files during compactions. It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space). Pull Request resolved: facebook#6957 Reviewed By: ajkr Differential Revision: D22461239 fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e
SST Partitioner interface that allows to split SST files during compactions.
It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).