Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor FilterPolicies toward Customizable #9567

Closed

Conversation

pdillinger
Copy link
Contributor

Summary: Some changes to make it easier to make FilterPolicy
customizable. Especially, create distinct classes for the different
testing-only and user-facing built-in FilterPolicy modes.

Test Plan: tests updated, with no intended difference in functionality
tested. No difference in test performance seen as a result of moving to
string-based filter type configuration.

Summary: Some changes to make it easier to make FilterPolicy
customizable. Especially, create distinct classes for the different
testing-only and user-facing built-in FilterPolicy modes.

Test Plan: tests updated, with no intended difference in functionality
tested. No difference in test performance seen as a result of moving to
string-based filter type configuration.
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment on lines 61 to 62
virtual std::string ToString() const = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with Customizable, please rename this GetId(). Same signature


inline int TEST_GetStartingLevelForB() const { return starting_level_for_b_; }
static constexpr const char* kName = "bloomfilter";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the standard I have used for other Customizable classes is a function, not constexpr (static const char* kName() { return "bloomfilter"; }

Also, for Customizable, this becomes the "public" name of the class. So you will be creating "bloomfilter:4" for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for Customizable, this becomes the "public" name of the class. So you will be creating "bloomfilter:4" for example.

Putting something like "bloomfilter:4:false" into the options file is intentional for forward compatibility. I don't think it's a good idea to deviate from that.

Comment on lines 1440 to 1443
std::string BloomFilterPolicy::ToString() const {
// Including ":false" for better forward-compatibility
return kName + GetBitsPerKeySuffix() + ":false";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me what the ":false" is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the ":false" for compatibility though? Reading the code in options test, it accepts "bloomfilter:4" as well as "bloomfilter:4:false". See https://github.com/facebook/rocksdb/blob/main/options/options_test.cc#L934

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have linked to 'main'. I have linked to '6.29.fb'. Here's the test in 6.29.fb: https://github.com/facebook/rocksdb/blob/6.29.fb/options/options_test.cc#L931

table/block_based/filter_policy.cc Show resolved Hide resolved
table/block_based/block_based_filter_block.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

db/db_bloom_filter_test.cc Show resolved Hide resolved
db/db_bloom_filter_test.cc Outdated Show resolved Hide resolved
db/db_bloom_filter_test.cc Outdated Show resolved Hide resolved
db/db_bloom_filter_test.cc Outdated Show resolved Hide resolved
util/filter_bench.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. LGTM.

const FilterBuildingContext&) const override;
static constexpr size_t kSecretBitsPerKeyStart = 1234567890U;

static const char* kName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to put the implementation here as well? static const char* kName() { return "x.y.z"; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions defined in headers slow down compilation and linking, so I avoid them when not performance critical or templated. Is there a good reason for making these functions as requested or just "consistency"?

ajkr added a commit to ajkr/rocksdb that referenced this pull request Feb 17, 2022
This reverts commit 8c68108.

The reason is the crash test began failing on the below assertion
following this commit.

```
db_stress: table/block_based/filter_policy.cc:316: rocksdb::{anonymous}::FastLocalBloomBitsBuilder::FastLocalBloomBitsBuilder(int, std::atomic<long int>*, std::shared_ptr<rocksdb::CacheReservationManager>, bool): Assertion `millibits_per_key >= 1000' failed.
```
facebook-github-bot pushed a commit that referenced this pull request Feb 17, 2022
Summary:
Users can set the priority for file reads associated with their operation by setting `ReadOptions::rate_limiter_priority` to something other than `Env::IO_TOTAL`. Rate limiting `VerifyChecksum()` and `VerifyFileChecksums()` is the motivation for this PR, so it also includes benchmarks and minor bug fixes to get that working.

`RandomAccessFileReader::Read()` already had support for rate limiting compaction reads. I changed that rate limiting to be non-specific to compaction, but rather performed according to the passed in `Env::IOPriority`. Now the compaction read rate limiting is supported by setting `rate_limiter_priority = Env::IO_LOW` on its `ReadOptions`.

There is no default value for the new `Env::IOPriority` parameter to `RandomAccessFileReader::Read()`. That means this PR goes through all callers (in some cases multiple layers up the call stack) to find a `ReadOptions` to provide the priority. There are TODOs for cases I believe it would be good to let user control the priority some day (e.g., file footer reads), and no TODO in cases I believe it doesn't matter (e.g., trace file reads).

The API doc only lists the missing cases where a file read associated with a provided `ReadOptions` cannot be rate limited. For cases like file ingestion checksum calculation, there is no API to provide `ReadOptions` or `Env::IOPriority`, so I didn't count that as missing.

Pull Request resolved: #9424

Test Plan:
- new unit tests
- new benchmarks on ~50MB database with 1MB/s read rate limit and 100ms refill interval; verified with strace reads are chunked (at 0.1MB per chunk) and spaced roughly 100ms apart.
  - setup command: `./db_bench -benchmarks=fillrandom,compact -db=/tmp/testdb -target_file_size_base=1048576 -disable_auto_compactions=true -file_checksum=true`
  - benchmarks command: `strace -ttfe pread64 ./db_bench -benchmarks=verifychecksum,verifyfilechecksums -use_existing_db=true -db=/tmp/testdb -rate_limiter_bytes_per_sec=1048576 -rate_limit_bg_reads=1 -rate_limit_user_ops=true -file_checksum=true`
- crash test using IO_USER priority on non-validation reads with #9567 reverted: `python3 tools/db_crashtest.py blackbox --max_key=1000000 --write_buffer_size=524288 --target_file_size_base=524288 --level_compaction_dynamic_level_bytes=true --duration=3600 --rate_limit_bg_reads=true --rate_limit_user_ops=true --rate_limiter_bytes_per_sec=10485760 --interval=10`

Reviewed By: hx235

Differential Revision: D33747386

Pulled By: ajkr

fbshipit-source-id: a2d985e97912fba8c54763798e04f006ccc56e0c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants