Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a sorted vector instead of a map to store blob file metadata #9526

Closed
wants to merge 16 commits into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Feb 8, 2022

Summary:
The patch replaces std::map with a sorted std::vector for
VersionStorageInfo::blob_files_ and preallocates the space
for the vector before saving the BlobFileMetaData into the
new VersionStorageInfo in VersionBuilder::Rep::SaveBlobFilesTo.
These changes reduce the time the DB mutex is held while
saving new Versions, and using a sorted vector also makes
lookups faster thanks to better memory locality.

In addition, the patch introduces helper methods
VersionStorageInfo::GetBlobFileMetaData and
VersionStorageInfo::GetBlobFileMetaDataLB that can be used by
clients to perform lookups in the vector, and does some general
cleanup in the parts of code where blob file metadata are used.

Test Plan:
Ran make check and the crash test script for a while.

Performance was tested using a load-optimized benchmark (fillseq with vector memtable, no WAL) and small file sizes so that a significant number of files are produced:

numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/ltamasi-dbbench --wal_dir=/data/ltamasi-dbbench --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --enable_blob_files=1 --blob_file_size=16777216 --min_blob_size=0 --blob_compression_type=lz4 --enable_blob_garbage_collection=1 --seed=<some value>

Final statistics before the patch:

Cumulative writes: 0 writes, 700M keys, 0 commit groups, 0.0 writes per commit group, ingest: 284.62 GB, 121.27 MB/s
Interval writes: 0 writes, 334K keys, 0 commit groups, 0.0 writes per commit group, ingest: 139.28 MB, 72.46 MB/s

With the patch:

Cumulative writes: 0 writes, 760M keys, 0 commit groups, 0.0 writes per commit group, ingest: 308.66 GB, 131.52 MB/s
Interval writes: 0 writes, 445K keys, 0 commit groups, 0.0 writes per commit group, ingest: 185.35 MB, 93.15 MB/s

Total time to complete the benchmark is 2611 seconds with the patch, down from 2986 secs.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ltamasi for the improvement.

db/version_set.h Outdated
const BlobFiles& GetBlobFiles() const { return blob_files_; }

// REQUIRES: This version has been saved (see VersionSet::SaveTo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realize: seems SaveTo is a method of VersionBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, nice catch :) Will fix this across the board (there are some preexisting occurrences as well)

@facebook-github-bot
Copy link
Contributor

@ltamasi has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Feb 10, 2022
…BlobGC (#9542)

Summary:
Fixes a bug introduced in #9526 where we index one position past the
end of a `vector`.

Pull Request resolved: #9542

Test Plan:
`make asan_check`

Will add a unit test in a separate PR.

Reviewed By: akankshamahajan15

Differential Revision: D34145825

Pulled By: ltamasi

fbshipit-source-id: 4e87c948407dee489d669a3e41f59e2fcc1228d8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants