Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SST Partitioner interface that allows to split SST files #6957

Closed
wants to merge 1 commit into from

Conversation

koldat
Copy link
Contributor

@koldat koldat commented Jun 8, 2020

SST Partitioner interface that allows to split SST files during compactions.

It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The PR is highly overlap with #5201, which I leave behind for long while (sorry!) I'm closing mine and hope this one can get merged.

include/rocksdb/sst_partitioner.h Show resolved Hide resolved
include/rocksdb/sst_partitioner.h Outdated Show resolved Hide resolved
db/compaction/compaction_job.cc Outdated Show resolved Hide resolved
db/compaction/compaction.cc Outdated Show resolved Hide resolved
db/compaction/compaction_job.cc Outdated Show resolved Hide resolved
db/compaction/compaction_job.cc Outdated Show resolved Hide resolved
include/rocksdb/sst_partitioner.h Outdated Show resolved Hide resolved
// Called with key that is right after the key that was stored into the SST
// Returns true of partition boundary was detected and compaction should
// create new file.
virtual bool ShouldPartition(const Slice& key) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this method also take current output file size as parameter, so the partitioner can also split SST by size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split by size is functionality that compaction knows and there is a lot of options around it. Do we really want to introduce another way? Isn't it confusing? Main purpose of this change is to split based on data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our use case splitting SST is just a heuristic, not a mandatory. I want to be more flexible in that I can do "split SST based on key if the file is not too small or too large, otherwise override the behavior by size".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method meant to tell you where it is safe to partition -- like it is okay to partition between data sets -- or meant to say "you really should not partition now"? If it is the latter, should the logic be reversed (ShouldNotPartition)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mean to say "please partition now".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is "please partition now". I have put there two parameters to make it easier (removed reset)

@@ -945,6 +950,10 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
key, value, ikey.sequence, ikey.type);
sub_compact->num_output_records++;

if (partitioner.get() != nullptr) {
partitioner->Reset(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge the Reset and ShouldPartition call into one method call? (echoing the same comment from #5201 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, but then the loop will need to copy previous key. Right now the partitioner can remember only the bit it needs. It is hard to say if that copy will take more time, but I think virtual call is pretty cheap compared to memory copy of X number of bytes. So should I do that?

Copy link
Contributor

@yiwu-arbug yiwu-arbug Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point about memcopy. But it looks like in the current approach the Reset needs a memcopy as well. The key is a slice obtain from c_iter, whose lifetime is ended after c_iter->Next() is called. The partitioner needs to make memcopy to keep it. I'm not sure why ASAN test don't complain.

In case I didn't misread the code and memcopy is inevitable, I think we can change to the following flow:

for KV in c_iter {
    ... // process KV
    ... // check other SST partition condition
    if (partitioner->ShouldPartition(next_key)) {
         output_file_ended = true;
    }
    if (output_file_ended) {
         ... // partition SST
         partitioner->Reset(next_key);
    }
}

That way the partitioner is invoke (#keys + #files) times, and it can decide on its own whether to memcopy keys per key or per file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset removed and should partition is with two arguments

@koldat
Copy link
Contributor Author

koldat commented Jun 17, 2020

@yiwu-arbug thanks for looking on the PR.

// Called with key that is right after the key that was stored into the SST
// Returns true of partition boundary was detected and compaction should
// create new file.
virtual bool ShouldPartition(const Slice& key) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method meant to tell you where it is safe to partition -- like it is okay to partition between data sets -- or meant to say "you really should not partition now"? If it is the latter, should the logic be reversed (ShouldNotPartition)?

include/rocksdb/sst_partitioner.h Outdated Show resolved Hide resolved
java/rocksjni/sst_partitioner.cc Outdated Show resolved Hide resolved
include/rocksdb/sst_partitioner.h Outdated Show resolved Hide resolved
virtual bool ShouldPartition(const Slice& key) = 0;

// Called for key that was stored into the SST
virtual void Reset(const Slice& key) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this method does based on the comment. Also Reset is too close of a name to reset for std::unique_ptr. Can we pick a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset removed.

virtual const char* Name() const = 0;
};

extern SstPartitionerFactory* NewSstPartitionerFixedPrefixFactory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been introducing into classes that have potentially multiple implementations and loading a "CreateFromString" static method. You might want to check out how that is being done and see if something similar would work for you (Env::LoadEnv is a comparable method that I hope to deprecate). I am hoping to introduce the exact same CreateFromString method signature into many classes (as soon as the PRs are reviewed).

#include <memory>
#include <string>

#include "rocksdb/rocksdb_namespace.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this code be limited by ROCKSDB_LITE or not? I do not know what the plans for LITE is or not and if this is necessary in LITE mode or should be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know. Functionality is like 20 lines of code. Please let me know if I should not include in LITE

java/rocksjni/options.cc Show resolved Hide resolved
return false;
}
Slice key_fixed(key.data_, std::min(key.size_, len_));
return key_fixed.compare(last_key_) != 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be based on a Comparator rather than raw bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We compare prefix (number of bytes) for equality. I do not see much benefit to use comparator for this partitioner.

@adamretter
Copy link
Collaborator

@koldat @mrambacher when you guys are happy with the C++ API and functionality, let me know and I will review the Java stuff :-)

@koldat
Copy link
Contributor Author

koldat commented Jun 23, 2020

I have rebased to master head and squashed all changesets to make it clean. I have also added current file size to ShouldPartition that I have missed.


// Returns true if partition boundary was detected and compaction should
// create new file.
virtual bool ShouldPartition(const Slice& last_user_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we create a new options struct for ShouldPartition parameter, so that the interface is extendable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. We have the Context there already. If you want to pass struct then we should place there also prev and current key there. Is that what you mean? I have also fixed documentation to make the purpose easier to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I mean. It makes the API easier to extend later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed to
virtual PartitionerResult ShouldPartition(const PartitionerRequest& request) = 0;

options/options.cc Outdated Show resolved Hide resolved
@koldat
Copy link
Contributor Author

koldat commented Jul 2, 2020

I have rebased and put latest review changes. I have changed the signature to:
virtual PartitionerResult ShouldPartition( const PartitionerRequest& request) = 0;
Result is enum where we can add later kPartitionMaybe, for now there is no/must. Request has same arguments as before, but is extendable (additional fields can be added without API change).

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Anyone from the rocksdb team can do further review and merge? Thanks much.

@yiwu-arbug
Copy link
Contributor

@siying @ajkr

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as a follow up, but I hope this feature is added to crash_test. Start with adding an option in db_stress and add to the crash_test parameter list.

// If non-nullptr, use the specified factory for a function to determine the
// partitioning of sst files. This helps compaction to split the files
// on interesting boundaries (key prefixes) to make propagation of sst
// files less write amplifying (covering the whole key space).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we comment and say it's a experimental feature for now?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Jul 9, 2020

@koldat if you rebase against master and add a comment for the feature is experimental, we can try to merge it.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forward some our internal static analysis warnings. It will be good to address them.

public:
SstPartitionerFixedPrefix(size_t len) : len_(len) {}

virtual ~SstPartitionerFixedPrefix(){};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db/compaction/sst_partitioner.cc:17:41: warning: extra ';' after member function definition
  virtual ~SstPartitionerFixedPrefix(){};
                                        ^

and

db/compaction/sst_partitioner.cc:17:11: warning: '~SstPartitionerFixedPrefix' overrides a destructor but is not marked 'override'
  virtual ~SstPartitionerFixedPrefix(){};
          ^

include/rocksdb/sst_partitioner.h:46:11: note: overridden virtual function is here
  virtual ~SstPartitioner(){};
          ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

size_t len_;
};

class SstPartitionerFixedPrefixFactory : public SstPartitionerFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Struct and class definitions should either be defined in a header file, declared in a header file with the same base name and path as the definition file, or in an anonymous namespace to prevent possible shadowing of other classes."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


class SstPartitionerFixedPrefix : public SstPartitioner {
public:
SstPartitionerFixedPrefix(size_t len) : len_(len) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter asks us to add "explicit".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


std::shared_ptr<SstPartitionerFactory> NewSstPartitionerFixedPrefixFactory(
size_t prefix_len) {
return std::shared_ptr<SstPartitionerFactory>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter asks us to use make_shared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


class SstPartitionerFixedPrefixFactory : public SstPartitionerFactory {
public:
SstPartitionerFixedPrefixFactory(size_t len) : len_(len) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. add explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


namespace ROCKSDB_NAMESPACE {

class SstPartitionerFixedPrefix : public SstPartitioner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Struct and class definitions should either be defined in a header file, declared in a header file with the same base name and path as the definition file, or in an anonymous namespace to prevent possible shadowing of other classes."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@yiwu-arbug
Copy link
Contributor

@koldat ping

@koldat
Copy link
Contributor Author

koldat commented Jul 17, 2020

@yiwu-arbug I'll do next week. Was on vacation, sorry.

@facebook-github-bot
Copy link
Contributor

@koldat has updated the pull request. You must reimport the pull request before landing.

@koldat
Copy link
Contributor Author

koldat commented Jul 22, 2020

@siying I have done all review changes, squashed and rebased to master.

The only question I have is about crash test. You want me to write new test in db_stress using sst_partitioner? It would be quite big task, because we need to define first what should be outcome of such a test. The partitioning is always application specific. On the other hand the only impact on compaction is just "two lines" and in case it is not enabled there is no impact. Please let me know if I should go and create something.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Jul 23, 2020

@koldat there can be multiple levels on the partitioned about what we can validate. From a minimal level, we can just apply a prefix partitioned, enable it in stress test by say 1/3 of the time, and watch it doesn't generate wrong query results or crash the service. That's already very helpful. Validating logic correctness is lower priority. Again, I still hope the stress test to be done, but can be done as a follow-up PR.

@facebook-github-bot
Copy link
Contributor

@koldat has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Jul 23, 2020

@koldat sorry but after rebasing the branch to latest master, it seems to show more errors. Can you help fix them?

@facebook-github-bot
Copy link
Contributor

@koldat has updated the pull request. You must reimport the pull request before landing.

@koldat
Copy link
Contributor Author

koldat commented Jul 24, 2020

@siying I have removed your merge, rebased to master and fixed the build.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in cd4592c.

2 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in cd4592c.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in cd4592c.

@siying
Copy link
Contributor

siying commented Jul 27, 2020

Thank you for your contribution!

yiwu-arbug pushed a commit to tikv/rocksdb that referenced this pull request Aug 11, 2020
…) (#184)

Summary:
SST Partitioner interface that allows to split SST files during compactions.

It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).

Pull Request resolved: facebook#6957

Reviewed By: ajkr

Differential Revision: D22461239

fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e
Signed-off-by: Yi Wu <yiwu@pingcap.com>

Co-authored-by: Tomas Kolda <koldat@gmail.com>
codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
SST Partitioner interface that allows to split SST files during compactions.

It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).

Pull Request resolved: facebook#6957

Reviewed By: ajkr

Differential Revision: D22461239

fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants