Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a class for measuring the amount of garbage generated during compaction #8426

Closed
wants to merge 19 commits into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Jun 18, 2021

Summary:
This is part of an alternative approach to #8316.
Unlike that approach, this one relies on key-values getting processed one by one
during compaction, and does not involve persistence.

Specifically, the patch adds a class BlobGarbageMeter that can track the number
and total size of blobs in a (sub)compaction's input and output on a per-blob file
basis. This information can then be used to compute the amount of additional
garbage generated by the compaction for any given blob file by subtracting the
"outflow" from the "inflow."

Note: this patch only adds BlobGarbageMeter and associated unit tests. I plan to
hook up this class to the input and output of CompactionIterator in a subsequent PR.

Test Plan:
make check

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment on lines +87 to +89
const std::unordered_map<uint64_t, BlobInOutFlow>& flows() const {
return flows_;
}
Copy link
Contributor

@jay-zhuang jay-zhuang Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's our plan on how flows() will be used? Feels like it's exposing too much internal information. Will it worth defining some functions for that? like GetGarbageBytes(file_num).

Copy link
Contributor Author

@ltamasi ltamasi Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to log the amount of additional garbage for all blob files, we will need to iterate through the entire flows_ map (i.e. it's not just lookup by file number), along these lines:

     const auto& flows = blob_garbage_meter.flows();

     for (const auto& pair : flows) {
        const uint64_t blob_file_number = pair.first;
        const BlobGarbageMeter::BlobInOutFlow& flow = pair.second;

        assert(flow.IsValid());
        if (flow.HasGarbage()) {
            // ... process flow, log garbage
        }
     }

P.S. It also comes in handy for unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then we might need some comments about BlobInOutFlow and BlobStats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add a bit of explanation.

Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I would prefer hiding more information as a general rule of class design, but on the other hand, it's a relatively simple internal classes (and maybe also for performance reason), it looks good to me.

Comment on lines +57 to +58
const BlobStats& GetInFlow() const { return in_flow_; }
const BlobStats& GetOutFlow() const { return out_flow_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question as flows(), do we really need to give the caller for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used in unit tests; otherwise, they're not necessary strictly speaking.

@ltamasi
Copy link
Contributor Author

ltamasi commented Jun 22, 2021

LGTM.

I would prefer hiding more information as a general rule of class design, but on the other hand, it's a relatively simple internal classes (and maybe also for performance reason), it looks good to me.

Agree about hiding as much as possible as a general rule. In this particular case though, we do need to be able to both iterate through the entire collection and do lookups by key (i.e. map/dictionary functionality). Theoretically we could hide the map behind an interface but like you said, that would introduce a performance penalty.

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ltamasi merged this pull request in 065bea1.

facebook-github-bot pushed a commit that referenced this pull request Jun 23, 2021
Summary:
Follow-up to #8426 .

The patch adds a new kind of `InternalIterator` that wraps another one and
passes each key-value encountered to `BlobGarbageMeter` as inflow.
This iterator will be used as an input iterator for compactions when the input
SSTs reference blob files.

Pull Request resolved: #8443

Test Plan: `make check`

Reviewed By: jay-zhuang

Differential Revision: D29311987

Pulled By: ltamasi

fbshipit-source-id: b4493b4c0c0c2e3c2ecc33c8969a5ef02de5d9d8
facebook-github-bot pushed a commit that referenced this pull request Jun 24, 2021
…ST (#8450)

Summary:
The patch builds on `BlobGarbageMeter` and `BlobCountingIterator`
(introduced in #8426 and
#8443 respectively)
and ties it all together. It measures the amount of garbage
generated by a compaction and logs the corresponding `BlobFileGarbage`
records as part of the compaction job's `VersionEdit`. Note: in order
to have accurate results, `kRemoveAndSkipUntil` for compaction filters
is implemented using iteration.

Pull Request resolved: #8450

Test Plan: Ran `make check` and the crash test script.

Reviewed By: jay-zhuang

Differential Revision: D29338207

Pulled By: ltamasi

fbshipit-source-id: 4381c432ac215139439f6d6fb801a6c0e4d8c128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants