Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a BlobDB-specific table property collector #8316

Closed
wants to merge 29 commits into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented May 18, 2021

Summary:
The patch adds a BlobDB-specific table property collector that processes
any blob indexes added to an SST and keeps track of the total number and
size of blobs referenced by the SST on a per-blob file basis. This aggregated
information is then persisted as a new table property rocksdb.blob.file.mapping.

This information will be used to calculate the amount of garbage generated by
compactions. Namely, the amount of additional garbage for any given blob file
can be computed by subtracting the "outflow" (total number/size of blobs in the
output SSTs) from the "inflow" (total number/size of blobs in the input SSTs).
Tracking the amount of garbage in blob files in turn will allow us to optimize GC
performance. (Note: one might consider counting blobs e.g. in
CompactionIterator as we process key-values during compaction;
however, this would be very intrusive, and there are actually cases when
CompactionIterator does not process all keys individually, for instance
with deletion/TTL compactions or when a compaction filter returns
kRemoveAndSkipUntil.)

Test Plan:
make check

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should such aggregated information be put into a meta block, instead of a table property?

Comment on lines +434 to +441
assert(dynamic_cast<const BlobTablePropertiesCollectorFactory*>(
int_tbl_prop_collector_factories_.front().get()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the restriction that BlobTablePropCollectorFactory has to be the first one is not so obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's added before the user-specified ones (see lines 558-559 in column_family.cc) so it's guaranteed to be at the head of the list. I can add a comment though to point this out.

// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).

#include "db/blob/blob_stats_record.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just general code style question, should for example, blob_stats.h, blob_stats_record.h and blob_stats_collection.h be combined into one file? And the same for *.cc files.

I have the impression that our codebase has lots of related structures defined in one file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a hard and fast rule for this; I personally prefer to have separate source files because they can reduce dependencies. (For example, blob_table_properties_collector.h only includes blob_stats.h but not the other two.) As for the pre-existing stuff, there has been some cleanup work done recently to split up large source files to make file sizes more manageable.

Comment on lines +5987 to +5988
ASSERT_NE(user_props.find(TablePropertiesNames::kBlobFileMapping),
user_props.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't also check the value of property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that would be redundant since we have dedicated unit tests for that. Here, I'm just trying to make sure BlobDB's property collector was active as expected.

@ltamasi
Copy link
Contributor Author

ltamasi commented May 20, 2021

Thanks for the review @jay-zhuang !

@ltamasi
Copy link
Contributor Author

ltamasi commented May 20, 2021

Should such aggregated information be put into a meta block, instead of a table property?

You mean a dedicated metablock of its own? (Table properties are also stored in a metablock.) I feel that would be overkill; it would involve a lot of changes for the sake of a single property. The table property collector framework gives us exactly what we need in this case.

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jay-zhuang
Copy link
Contributor

Should such aggregated information be put into a meta block, instead of a table property?

You mean a dedicated metablock of its own? (Table properties are also stored in a metablock.) I feel that would be overkill; it would involve a lot of changes for the sake of a single property. The table property collector framework gives us exactly what we need in this case.

Yeah, a dedicated metablock, which maybe only for blobDB, I think it will be more customizable for more information in the future. But anyway, a property here would be easier to implement and use I guess.

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jun 22, 2021
…action (#8426)

Summary:
This is part of an alternative approach to #8316.
Unlike that approach, this one relies on key-values getting processed one by one
during compaction, and does not involve persistence.

Specifically, the patch adds a class `BlobGarbageMeter` that can track the number
and total size of blobs in a (sub)compaction's input and output on a per-blob file
basis. This information can then be used to compute the amount of additional
garbage generated by the compaction for any given blob file by subtracting the
"outflow" from the "inflow."

Note: this patch only adds `BlobGarbageMeter` and associated unit tests. I plan to
hook up this class to the input and output of `CompactionIterator` in a subsequent PR.

Pull Request resolved: #8426

Test Plan: `make check`

Reviewed By: jay-zhuang

Differential Revision: D29242250

Pulled By: ltamasi

fbshipit-source-id: 597e50ad556540e413a50e804ba15bc044d809bb
@ltamasi
Copy link
Contributor Author

ltamasi commented Jun 25, 2021

Closing since we ended up going for a different approach (see #8450 ).

@ltamasi ltamasi closed this Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants