Use a sorted vector instead of a map to store blob file metadata #9526

ltamasi · 2022-02-08T17:19:17Z

Summary:
The patch replaces std::map with a sorted std::vector for
VersionStorageInfo::blob_files_ and preallocates the space
for the vector before saving the BlobFileMetaData into the
new VersionStorageInfo in VersionBuilder::Rep::SaveBlobFilesTo.
These changes reduce the time the DB mutex is held while
saving new Versions, and using a sorted vector also makes
lookups faster thanks to better memory locality.

In addition, the patch introduces helper methods
VersionStorageInfo::GetBlobFileMetaData and
VersionStorageInfo::GetBlobFileMetaDataLB that can be used by
clients to perform lookups in the vector, and does some general
cleanup in the parts of code where blob file metadata are used.

Test Plan:
Ran make check and the crash test script for a while.

Performance was tested using a load-optimized benchmark (fillseq with vector memtable, no WAL) and small file sizes so that a significant number of files are produced:

numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/ltamasi-dbbench --wal_dir=/data/ltamasi-dbbench --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --enable_blob_files=1 --blob_file_size=16777216 --min_blob_size=0 --blob_compression_type=lz4 --enable_blob_garbage_collection=1 --seed=<some value>

Final statistics before the patch:

Cumulative writes: 0 writes, 700M keys, 0 commit groups, 0.0 writes per commit group, ingest: 284.62 GB, 121.27 MB/s
Interval writes: 0 writes, 334K keys, 0 commit groups, 0.0 writes per commit group, ingest: 139.28 MB, 72.46 MB/s

With the patch:

Cumulative writes: 0 writes, 760M keys, 0 commit groups, 0.0 writes per commit group, ingest: 308.66 GB, 131.52 MB/s
Interval writes: 0 writes, 445K keys, 0 commit groups, 0.0 writes per commit group, ingest: 185.35 MB, 93.15 MB/s

Total time to complete the benchmark is 2611 seconds with the patch, down from 2986 secs.

…thod in Version

facebook-github-bot · 2022-02-08T19:03:26Z

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963

LGTM. Thanks @ltamasi for the improvement.

riversand963 · 2022-02-09T17:31:16Z

db/version_set.h

  const BlobFiles& GetBlobFiles() const { return blob_files_; }

+  // REQUIRES: This version has been saved (see VersionSet::SaveTo)


Just realize: seems SaveTo is a method of VersionBuilder?

Oops, nice catch :) Will fix this across the board (there are some preexisting occurrences as well)

facebook-github-bot · 2022-02-09T18:22:47Z

@ltamasi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-02-09T18:23:08Z

@ltamasi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…BlobGC (#9542) Summary: Fixes a bug introduced in #9526 where we index one position past the end of a `vector`. Pull Request resolved: #9542 Test Plan: `make asan_check` Will add a unit test in a separate PR. Reviewed By: akankshamahajan15 Differential Revision: D34145825 Pulled By: ltamasi fbshipit-source-id: 4e87c948407dee489d669a3e41f59e2fcc1228d8

facebook-github-bot added the CLA Signed label Feb 8, 2022

ltamasi requested a review from riversand963 February 8, 2022 17:19

ltamasi added 14 commits February 8, 2022 11:02

First cut, couple of test failures

a2ba0ab

Fix VersionStorageInfoTest.ForcedBlobGC

0dd3a8e

Fix up VersionBuilderTest.SaveBlobFilesTo

9a05002

Add VersionStorageInfo::GetBlobFileMetaData(Impl)

b22b625

Use new method in listener_test

123715b

Use new method in version_builder_test

e66da3c

Rework GetBlobFileMetaDataImpl into GetBlobFileMetaDataLB; use new me…

906a7aa

…thod in Version

Use new method in VersionBuilder

4afcad4

Remove unnecessary #include

5858145

Rework ComputeBlobGarbageCollectionCutoffFileNumber a bit

d3bb6f8

Reserve space for the BlobFileMetaData objects up front

9d50e6e

Some cleanup all over the place

2100080

Fix unused variable issue affecting release builds

3588272

A bit more cleanup

b8d05f0

ltamasi force-pushed the blob_files_sorted_vec branch from a66900f to b8d05f0 Compare February 8, 2022 19:02

riversand963 approved these changes Feb 9, 2022

View reviewed changes

ltamasi added 2 commits February 9, 2022 10:21

Fix method name in comments

0cdade5

Update HISTORY

843425e

facebook-github-bot closed this in 320d9a8 Feb 9, 2022

ltamasi mentioned this pull request Feb 10, 2022

Fix off-by-one bug in VersionStorageInfo::ComputeFilesMarkedForForcedBlobGC #9542

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a sorted vector instead of a map to store blob file metadata #9526

Use a sorted vector instead of a map to store blob file metadata #9526

ltamasi commented Feb 8, 2022 •

edited

Loading

facebook-github-bot commented Feb 8, 2022

riversand963 left a comment

riversand963 Feb 9, 2022

ltamasi Feb 9, 2022

facebook-github-bot commented Feb 9, 2022

facebook-github-bot commented Feb 9, 2022

		const BlobFiles& GetBlobFiles() const { return blob_files_; }

		// REQUIRES: This version has been saved (see VersionSet::SaveTo)

Use a sorted vector instead of a map to store blob file metadata #9526

Use a sorted vector instead of a map to store blob file metadata #9526

Conversation

ltamasi commented Feb 8, 2022 • edited Loading

facebook-github-bot commented Feb 8, 2022

riversand963 left a comment

Choose a reason for hiding this comment

riversand963 Feb 9, 2022

Choose a reason for hiding this comment

ltamasi Feb 9, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Feb 9, 2022

facebook-github-bot commented Feb 9, 2022

ltamasi commented Feb 8, 2022 •

edited

Loading