-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a class for measuring the amount of garbage generated during compaction #8426
Conversation
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
const std::unordered_map<uint64_t, BlobInOutFlow>& flows() const { | ||
return flows_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's our plan on how flows()
will be used? Feels like it's exposing too much internal information. Will it worth defining some functions for that? like GetGarbageBytes(file_num)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to log the amount of additional garbage for all blob files, we will need to iterate through the entire flows_
map (i.e. it's not just lookup by file number), along these lines:
const auto& flows = blob_garbage_meter.flows();
for (const auto& pair : flows) {
const uint64_t blob_file_number = pair.first;
const BlobGarbageMeter::BlobInOutFlow& flow = pair.second;
assert(flow.IsValid());
if (flow.HasGarbage()) {
// ... process flow, log garbage
}
}
P.S. It also comes in handy for unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, then we might need some comments about BlobInOutFlow
and BlobStats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me add a bit of explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I would prefer hiding more information as a general rule of class design, but on the other hand, it's a relatively simple internal classes (and maybe also for performance reason), it looks good to me.
const BlobStats& GetInFlow() const { return in_flow_; } | ||
const BlobStats& GetOutFlow() const { return out_flow_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same question as flows()
, do we really need to give the caller for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used in unit tests; otherwise, they're not necessary strictly speaking.
Agree about hiding as much as possible as a general rule. In this particular case though, we do need to be able to both iterate through the entire collection and do lookups by key (i.e. map/dictionary functionality). Theoretically we could hide the map behind an interface but like you said, that would introduce a performance penalty. |
@ltamasi has updated the pull request. You must reimport the pull request before landing. |
@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Follow-up to #8426 . The patch adds a new kind of `InternalIterator` that wraps another one and passes each key-value encountered to `BlobGarbageMeter` as inflow. This iterator will be used as an input iterator for compactions when the input SSTs reference blob files. Pull Request resolved: #8443 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D29311987 Pulled By: ltamasi fbshipit-source-id: b4493b4c0c0c2e3c2ecc33c8969a5ef02de5d9d8
…ST (#8450) Summary: The patch builds on `BlobGarbageMeter` and `BlobCountingIterator` (introduced in #8426 and #8443 respectively) and ties it all together. It measures the amount of garbage generated by a compaction and logs the corresponding `BlobFileGarbage` records as part of the compaction job's `VersionEdit`. Note: in order to have accurate results, `kRemoveAndSkipUntil` for compaction filters is implemented using iteration. Pull Request resolved: #8450 Test Plan: Ran `make check` and the crash test script. Reviewed By: jay-zhuang Differential Revision: D29338207 Pulled By: ltamasi fbshipit-source-id: 4381c432ac215139439f6d6fb801a6c0e4d8c128
Summary:
This is part of an alternative approach to #8316.
Unlike that approach, this one relies on key-values getting processed one by one
during compaction, and does not involve persistence.
Specifically, the patch adds a class
BlobGarbageMeter
that can track the numberand total size of blobs in a (sub)compaction's input and output on a per-blob file
basis. This information can then be used to compute the amount of additional
garbage generated by the compaction for any given blob file by subtracting the
"outflow" from the "inflow."
Note: this patch only adds
BlobGarbageMeter
and associated unit tests. I plan tohook up this class to the input and output of
CompactionIterator
in a subsequent PR.Test Plan:
make check