Account for DB ID in stress testing block cache keys #10388

pdillinger · 2022-07-19T21:30:57Z

Summary: I recently discovered that block cache keys are slightly lower
quality than previously thought, because my stress testing tool failed
to simulate the effect of DB ID differences. This change updates the
tool and gives us data to guide future developments. (No changes to
production code here and now.)

Nevertheless, the following promise still holds

// In fact, if our SST files are all < 4TB (see
// BlockBasedTable::kMaxFileSizeStandardEncoding), then SST files generated
// in a single process are guaranteed to have unique cache keys, unless/until
// number session ids * max file number = 2**86 ...

because although different DB IDs could cause collision in file number
and offset data, that would have to be using the same DB session (lower)
to cause a block cache key collision, which is not possible in the same
process. (A session is associated with only one DB ID.)

This change fixes cache_bench -stress_cache_key to set and reset DB IDs in
a parameterized way to evaluate the effect. Previous results assumed to
be representative (using -sck_keep_bits=43):

15 collisions after 15 x 90 days, est 90 days between (1.03763e+20 corrected)

or expected collision on a single machine every 104 billion billion
days (see "corrected" value).

After accounting for DB IDs, test never really changing, intermediate, and very
frequently changing (using default -sck_db_count=100):

-sck_newdb_nreopen=1000000000:
15 collisions after 2 x 90 days, est 12 days between (1.38351e+19 corrected)
-sck_newdb_nreopen=10000:
17 collisions after 2 x 90 days, est 10.5882 days between (1.22074e+19 corrected)
-sck_newdb_nreopen=100:
19 collisions after 2 x 90 days, est 9.47368 days between (1.09224e+19 corrected)

or roughly 10x more often than previously thought (still extremely if
not impossibly rare), and better than random base cache keys
(with -sck_randomize), though < 10x better than random:

31 collisions after 1 x 90 days, est 2.90323 days between (3.34719e+18 corrected)

If we simply fixed this by ignoring DB ID for cache keys, we would
potentially have a shortage of entropy for some cases, such as small
file numbers and offsets (e.g. many short-lived processes each using
SstFileWriter to create a small file), because existing DB session IDs
only provide ~103 bits of entropy. We could upgrade the entropy in DB
session IDs to accommodate, but it's not known what all would be
affected by changing from 20 digit session IDs to something larger.

Instead, my plan is to

Move to block cache keys derived from SST unique IDs (so that we can
derive block cache keys from manifest data without reading file on
storage), and show no significant regression in expected collision
rate.
Generate better SST unique IDs in format_version=6 (format_version=6 and context-aware block checksums #9058),
which should have ~100x lower expected/predicted collision rate based
on simulations with this stress test:

./cache_bench -stress_cache_key -sck_keep_bits=39 -sck_newdb_nreopen=100 -sck_footer_unique_id
...
15 collisions after 19 x 90 days, est 114 days between (2.10293e+21 corrected)

Test Plan: no production changes

Summary: I recently discovered that block cache keys are slightly lower quality than previously thought, because my stress testing tool failed to simulate the effect of DB ID differences. This change updates the tool and gives us data to guide future developments. (No changes to production code here and now.) Nevertheless, the following promise still holds ``` // In fact, if our SST files are all < 4TB (see // BlockBasedTable::kMaxFileSizeStandardEncoding), then SST files generated // in a single process are guaranteed to have unique cache keys, unless/until // number session ids * max file number = 2**86 ... ``` because although different DB IDs could cause collision in file number and offset data, that would have to be using the same DB session (lower) to cause a block cache key collision, which is not possible in the same process. (A session is associated with only one DB ID.) This change fixes cache_bench -stress_cache_key to set and reset DB IDs in a parameterized way to evaluate the effect. Previously results assumed to be representative (using -sck_keep_bits=43): ``` 15 collisions after 15 x 90 days, est 90 days between (1.03763e+20 corrected) ``` or expected collision on a single machine every 104 billion billion days. After accounting for DB IDs, never really changing through very frequently changing (using default -sck_db_count=100): ``` -sck_newdb_nreopen=1000000000: 15 collisions after 2 x 90 days, est 12 days between (1.38351e+19 corrected) -sck_newdb_nreopen=10000: 17 collisions after 2 x 90 days, est 10.5882 days between (1.22074e+19 corrected) -sck_newdb_nreopen=100: 19 collisions after 2 x 90 days, est 9.47368 days between (1.09224e+19 corrected) ``` or roughly 10x more often than previously thought (still extremely if not impossibly rare), and < 10x better than random base cache keys (with -sck_randomize): ``` 31 collisions after 1 x 90 days, est 2.90323 days between (3.34719e+18 corrected) ``` If we simply fixed this by ignoring DB ID for cache keys, we would potentially have a shortage of entropy for some cases, such as small file numbers and offsets (e.g. many short-lived processes each using SstFileWriter to create a small file), because existing DB session IDs only provide ~103 bits of entropy. We could upgrade the entropy in DB session IDs to accommodate, but that comes with side effects. Instead, my plan is to 1) Move to block cache keys derived from SST unique IDs (so that we can derive block cache keys from manifest data without reading file on storage), and show no significant regression in expected collision rate. 2) Generate better SST unique IDs in format_version=6, which will have ~100x lower expected/predicted collision rate based on simulations. Test Plan: no production changes

facebook-github-bot · 2022-07-20T02:57:58Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jay-zhuang

LGTM

Summary: ... so that cache keys can be derived from DB manifest data before reading the file from storage--so that every part of the file can potentially go in a persistent cache. See updated comments in cache_key.cc for technical details. Importantly, the new cache key encoding uses some fancy but efficient math to pack data into the cache key without depending on the sizes of the various pieces. This simplifies some existing code creating cache keys, like cache warming before the file size is known. This should provide us an essentially permanent mapping between SST unique IDs and base cache keys, with the ability to "upgrade" SST unique IDs (and thus cache keys) with new SST format_versions. These cache keys are of similar, perhaps indistinguishable quality to the previous generation. Before this change (see "corrected" days between collision): ``` ./cache_bench -stress_cache_key -sck_keep_bits=43 18 collisions after 2 x 90 days, est 10 days between (1.15292e+19 corrected) ``` After this change (keep 43 bits, up through 50, to validate "trajectory" is ok on "corrected" days between collision): ``` 19 collisions after 3 x 90 days, est 14.2105 days between (1.63836e+19 corrected) 16 collisions after 5 x 90 days, est 28.125 days between (1.6213e+19 corrected) 15 collisions after 7 x 90 days, est 42 days between (1.21057e+19 corrected) 15 collisions after 17 x 90 days, est 102 days between (1.46997e+19 corrected) 15 collisions after 49 x 90 days, est 294 days between (2.11849e+19 corrected) 15 collisions after 62 x 90 days, est 372 days between (1.34027e+19 corrected) 15 collisions after 53 x 90 days, est 318 days between (5.72858e+18 corrected) 15 collisions after 309 x 90 days, est 1854 days between (1.66994e+19 corrected) ``` However, the change does modify (probably weaken) the "guaranteed unique" promise from this > SST files generated in a single process are guaranteed to have unique cache keys, unless/until number session ids * max file number = 2**86 to this (see #10388) > With the DB id limitation, we only have nice guaranteed unique cache keys for files generated in a single process until biggest session_id_counter and offset_in_file reach combined 64 bits I don't think this is a practical concern, though. Pull Request resolved: #10394 Test Plan: unit tests updated, see simulation results above Reviewed By: jay-zhuang Differential Revision: D38667529 Pulled By: pdillinger fbshipit-source-id: 49af3fe7f47e5b61162809a78b76c769fd519fba

pdillinger requested a review from jay-zhuang July 19, 2022 21:30

facebook-github-bot added the CLA Signed label Jul 19, 2022

Attempt fix analyze failure

0537de7

pdillinger mentioned this pull request Jul 20, 2022

Derive cache keys from SST unique IDs #10394

Closed

jay-zhuang approved these changes Jul 22, 2022

View reviewed changes

facebook-github-bot closed this in 01a2e20 Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for DB ID in stress testing block cache keys #10388

Account for DB ID in stress testing block cache keys #10388

pdillinger commented Jul 19, 2022

facebook-github-bot commented Jul 20, 2022

jay-zhuang left a comment

Account for DB ID in stress testing block cache keys #10388

Account for DB ID in stress testing block cache keys #10388

Conversation

pdillinger commented Jul 19, 2022

facebook-github-bot commented Jul 20, 2022

jay-zhuang left a comment

Choose a reason for hiding this comment