New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache #8912

Closed

zhichao-cao wants to merge 4 commits into facebook:main from zhichao-cao:cache_dump

Contributor

zhichao-cao commented Sep 14, 2021

Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.

Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.

Test plan: add new tests to lru_cache_test and pass make check.

zhichao-cao requested review from pdillinger and anand1976

September 14, 2021 00:00

facebook-github-bot added the CLA Signed label

Contributor Author

zhichao-cao commented Sep 14, 2021

This is the early version of the project development, will add comments and more testing cases shortly.

pdillinger reviewed

View reviewed changes

include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

anand1976 reviewed

View reviewed changes

include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved

utilities/cache_dump_load_impl.cc Outdated Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved

mrambacher reviewed

View reviewed changes

utilities/cache_dump_load.cc Outdated Show resolved Hide resolved

utilities/cache_dump_load.cc Outdated Show resolved Hide resolved

utilities/cache_dump_load_impl.cc Outdated Show resolved Hide resolved

utilities/cache_dump_load_impl.cc Show resolved Hide resolved

utilities/cache_dump_load_impl.cc Outdated Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

pdillinger reviewed

View reviewed changes

include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved

utilities/cache_dump_load_impl.h Show resolved Hide resolved

utilities/cache_dump_load_impl.h Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

zhichao-cao force-pushed the cache_dump branch from 7036805 to 41c1ab2 Compare

September 27, 2021 23:48

pdillinger requested changes

View reviewed changes

Contributor

pdillinger left a comment

Generally looking good. Some open concerns about the API. If we mark it EXPERIMENTAL, I'll be happy to approve without sorting out all the API details.

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved

cache/lru_cache_test.cc

    
                }

                uint32_t dump_insert = tmp_cache->GetInsertCount() - start_insert;

                uint32_t dump_lookup = tmp_cache->GetLookupcount() - start_lookup;

                ASSERT_EQ(63,

Contributor

pdillinger Sep 30, 2021

Btw, I think that ASSERT_EQ(63U, dump_insert) would work. (Not a problem.)

zhichao-cao force-pushed the cache_dump branch from 41c1ab2 to 5ea8f19 Compare

October 2, 2021 00:52

pdillinger approved these changes

View reviewed changes

Contributor

pdillinger left a comment

LGTM once the CircleCI issues are resolved. Thanks!

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request


          Support SST unique IDs

0d2c3db

Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* TODO: explain internal vs. external ID

Intended follow-up (avoid conflicts with facebook#8912): use the internal unique IDs
in cache keys. (The file offset can be XORed into the third 64-bit value of
the unique ID.)

Test Plan: Unit tests added
TODO: stress test support

pdillinger mentioned this pull request

Experimental support for SST unique IDs #8990

Closed

zhichao-cao added 3 commits

October 6, 2021 14:25


          Initial change of cache dump and load

e31f9f5


          Address review feedback and update the comments

620e107


          change the cache dump reader and writer

zhichao-cao force-pushed the cache_dump branch from 5ea8f19 to 446b3da Compare

October 6, 2021 21:37

Contributor

facebook-github-bot commented Oct 6, 2021

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zhichao-cao force-pushed the cache_dump branch from 446b3da to ec70d6d Compare

October 6, 2021 21:55

Contributor

facebook-github-bot commented Oct 6, 2021

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

zhichao-cao force-pushed the cache_dump branch from ec70d6d to b8d308f Compare

October 6, 2021 22:51

Contributor

facebook-github-bot commented Oct 6, 2021

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

zhichao-cao force-pushed the cache_dump branch from b8d308f to e86d7e0 Compare

October 6, 2021 23:33

Contributor

facebook-github-bot commented Oct 6, 2021

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.


          Add to HISTORY.md and fix the test failure

3d3d771

zhichao-cao force-pushed the cache_dump branch from e86d7e0 to 3d3d771 Compare

October 7, 2021 00:11

Contributor

facebook-github-bot commented Oct 7, 2021

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

2 similar comments

Contributor

facebook-github-bot commented Oct 7, 2021

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

Contributor

facebook-github-bot commented Oct 7, 2021

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

Contributor

facebook-github-bot commented Oct 7, 2021

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot closed this in

699f450

facebook-github-bot pushed a commit that referenced this pull request


          Experimental support for SST unique IDs (#8990)

ad5325a

Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with #8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (#8968)

Pull Request resolved: #8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243

yoori pushed a commit to yoori/rocksdb that referenced this pull request


          Introduce a mechanism to dump out blocks from block cache and re-inse…

ee3315f

…rt to secondary cache (#8912)

Summary:
Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.

Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.

Pull Request resolved: facebook/rocksdb#8912

Test Plan: add new tests to lru_cache_test and pass make check.

Reviewed By: pdillinger

Differential Revision: D31452871

Pulled By: zhichao-cao

fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48

yoori pushed a commit to yoori/rocksdb that referenced this pull request


          Experimental support for SST unique IDs (#8990)

3984f41

Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with facebook/rocksdb#8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (facebook/rocksdb#8968)

Pull Request resolved: facebook/rocksdb#8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment