Add a BlobDB-specific table property collector #8316

ltamasi · 2021-05-18T23:04:52Z

Summary:
The patch adds a BlobDB-specific table property collector that processes
any blob indexes added to an SST and keeps track of the total number and
size of blobs referenced by the SST on a per-blob file basis. This aggregated
information is then persisted as a new table property rocksdb.blob.file.mapping.

This information will be used to calculate the amount of garbage generated by
compactions. Namely, the amount of additional garbage for any given blob file
can be computed by subtracting the "outflow" (total number/size of blobs in the
output SSTs) from the "inflow" (total number/size of blobs in the input SSTs).
Tracking the amount of garbage in blob files in turn will allow us to optimize GC
performance. (Note: one might consider counting blobs e.g. in
CompactionIterator as we process key-values during compaction;
however, this would be very intrusive, and there are actually cases when
CompactionIterator does not process all keys individually, for instance
with deletion/TTL compactions or when a compaction filter returns
kRemoveAndSkipUntil.)

Test Plan:
make check

facebook-github-bot · 2021-05-18T23:06:58Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-18T23:12:08Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-05-18T23:12:29Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-18T23:18:09Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-05-18T23:18:31Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jay-zhuang

Should such aggregated information be put into a meta block, instead of a table property?

jay-zhuang · 2021-05-20T19:04:37Z

db/column_family.h

+    assert(dynamic_cast<const BlobTablePropertiesCollectorFactory*>(
+        int_tbl_prop_collector_factories_.front().get()));


I feel the restriction that BlobTablePropCollectorFactory has to be the first one is not so obvious.

It's added before the user-specified ones (see lines 558-559 in column_family.cc) so it's guaranteed to be at the head of the list. I can add a comment though to point this out.

jay-zhuang · 2021-05-20T19:14:34Z

db/blob/blob_stats_record.cc

+//  COPYING file in the root directory) and Apache 2.0 License
+//  (found in the LICENSE.Apache file in the root directory).
+
+#include "db/blob/blob_stats_record.h"


Just general code style question, should for example, blob_stats.h, blob_stats_record.h and blob_stats_collection.h be combined into one file? And the same for *.cc files.

I have the impression that our codebase has lots of related structures defined in one file.

I don't think we have a hard and fast rule for this; I personally prefer to have separate source files because they can reduce dependencies. (For example, blob_table_properties_collector.h only includes blob_stats.h but not the other two.) As for the pre-existing stuff, there has been some cleanup work done recently to split up large source files to make file sizes more manageable.

jay-zhuang · 2021-05-20T19:20:10Z

db/db_compaction_test.cc

+  ASSERT_NE(user_props.find(TablePropertiesNames::kBlobFileMapping),
+            user_props.end());


Why don't also check the value of property?

I felt that would be redundant since we have dedicated unit tests for that. Here, I'm just trying to make sure BlobDB's property collector was active as expected.

ltamasi · 2021-05-20T19:56:29Z

Thanks for the review @jay-zhuang !

ltamasi · 2021-05-20T19:59:20Z

Should such aggregated information be put into a meta block, instead of a table property?

You mean a dedicated metablock of its own? (Table properties are also stored in a metablock.) I feel that would be overkill; it would involve a lot of changes for the sake of a single property. The table property collector framework gives us exactly what we need in this case.

facebook-github-bot · 2021-05-20T20:34:00Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-05-20T20:34:15Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jay-zhuang · 2021-05-20T21:08:42Z

Should such aggregated information be put into a meta block, instead of a table property?

You mean a dedicated metablock of its own? (Table properties are also stored in a metablock.) I feel that would be overkill; it would involve a lot of changes for the sake of a single property. The table property collector framework gives us exactly what we need in this case.

Yeah, a dedicated metablock, which maybe only for blobDB, I think it will be more customizable for more information in the future. But anyway, a property here would be easier to implement and use I guess.

…all cleanup

…are generated

…ront

…head of the vector

facebook-github-bot · 2021-05-20T23:54:13Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-05-20T23:54:24Z

@ltamasi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…action (#8426) Summary: This is part of an alternative approach to #8316. Unlike that approach, this one relies on key-values getting processed one by one during compaction, and does not involve persistence. Specifically, the patch adds a class `BlobGarbageMeter` that can track the number and total size of blobs in a (sub)compaction's input and output on a per-blob file basis. This information can then be used to compute the amount of additional garbage generated by the compaction for any given blob file by subtracting the "outflow" from the "inflow." Note: this patch only adds `BlobGarbageMeter` and associated unit tests. I plan to hook up this class to the input and output of `CompactionIterator` in a subsequent PR. Pull Request resolved: #8426 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D29242250 Pulled By: ltamasi fbshipit-source-id: 597e50ad556540e413a50e804ba15bc044d809bb

ltamasi · 2021-06-25T21:44:04Z

Closing since we ended up going for a different approach (see #8450 ).

ltamasi requested a review from jay-zhuang May 18, 2021 23:04

facebook-github-bot added the CLA Signed label May 18, 2021

jay-zhuang approved these changes May 20, 2021

View reviewed changes

ltamasi added 17 commits May 20, 2021 16:08

Implement BlobTablePropertiesCollector

9bbca9a

Factor out BlobStats from collector, add a new AddBlobs method

b08f6ce

Factor out encoding/decoding into a new class BlobStatsRecord

e03a328

Add a BlobStatsCollection template

47fb302

Turn BlobStatsCollection into a bunch of function templates

4bd335c

Move part of the BlobStatsRecord implementation to a .cc file

c3a47d3

Change BlobStatsCollection::EncodeTo's signature to take a container

a25ac86

Start a unit test suite

5b417d7

Add test cases for encoding/decoding collections

b3ba3d8

Add a test case for errors during decoding a collection

667c58b

Small cleanup in CompactionJob::Prepare

439c17d

Add Compaction::ShouldCollectBlobProperties

281ed51

Turn ShouldCollectBlobProperties into DoesInputReferenceBlobFiles; sm…

ee81d20

…all cleanup

Fix rebase damage

65cd43e

Add skeleton of blob_table_properties_collector_test

7ae9209

Add a simple unit test

bc2e342

Add BlobTablePropertiesCollectorTest.InternalAddMultipleAndFinish

e56bf7b

ltamasi added 12 commits May 20, 2021 16:08

Add InternalAddPlainValueAndFinish

dfcc449

Cover failure scenarios

c0a261e

Extend flush/recovery/compaction tests so they make sure table props …

dd5ba94

…are generated

Fix LITE build

0f672b4

Update comment

72ebbe6

Add missing #include

9ab1119

Small const correctness improvements

6129d0b

Add an assertion to make sure the blob property collector is at the f…

2ecf41c

…ront

A bit more cleanup in OpenCompactionOutputFile

515e5ac

Fix unused variable issue

7b5bdb3

Make clang happy

c62a365

Add a comment about BlobTablePropertiesCollectorFactory being at the …

360d3d9

…head of the vector

ltamasi force-pushed the blob_prop branch from a05ad15 to 360d3d9 Compare May 20, 2021 23:54

ammarfaizi2 approved these changes Jun 5, 2021

View reviewed changes

ltamasi mentioned this pull request Jun 18, 2021

Add a class for measuring the amount of garbage generated during compaction #8426

Closed

ltamasi closed this Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a BlobDB-specific table property collector #8316

Add a BlobDB-specific table property collector #8316

ltamasi commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

jay-zhuang left a comment

jay-zhuang May 20, 2021

ltamasi May 20, 2021

jay-zhuang May 20, 2021

ltamasi May 20, 2021

jay-zhuang May 20, 2021

ltamasi May 20, 2021

ltamasi commented May 20, 2021

ltamasi commented May 20, 2021

facebook-github-bot commented May 20, 2021

facebook-github-bot commented May 20, 2021

jay-zhuang commented May 20, 2021

facebook-github-bot commented May 20, 2021

facebook-github-bot commented May 20, 2021

ltamasi commented Jun 25, 2021

		assert(dynamic_cast<const BlobTablePropertiesCollectorFactory*>(
		int_tbl_prop_collector_factories_.front().get()));

		ASSERT_NE(user_props.find(TablePropertiesNames::kBlobFileMapping),
		user_props.end());

Add a BlobDB-specific table property collector #8316

Add a BlobDB-specific table property collector #8316

Conversation

ltamasi commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

facebook-github-bot commented May 18, 2021

jay-zhuang left a comment

Choose a reason for hiding this comment

jay-zhuang May 20, 2021

Choose a reason for hiding this comment

ltamasi May 20, 2021

Choose a reason for hiding this comment

jay-zhuang May 20, 2021

Choose a reason for hiding this comment

ltamasi May 20, 2021

Choose a reason for hiding this comment

jay-zhuang May 20, 2021

Choose a reason for hiding this comment

ltamasi May 20, 2021

Choose a reason for hiding this comment

ltamasi commented May 20, 2021

ltamasi commented May 20, 2021

facebook-github-bot commented May 20, 2021

facebook-github-bot commented May 20, 2021

jay-zhuang commented May 20, 2021

facebook-github-bot commented May 20, 2021

facebook-github-bot commented May 20, 2021

ltamasi commented Jun 25, 2021