Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache #8912

Closed
wants to merge 4 commits into from

Conversation

zhichao-cao
Copy link
Contributor

Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.

Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.

Test plan: add new tests to lru_cache_test and pass make check.

@zhichao-cao
Copy link
Contributor Author

This is the early version of the project development, will add comments and more testing cases shortly.

include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved
utilities/cache_dump_load_impl.cc Outdated Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved
utilities/cache_dump_load.cc Outdated Show resolved Hide resolved
utilities/cache_dump_load.cc Outdated Show resolved Hide resolved
utilities/cache_dump_load_impl.cc Outdated Show resolved Hide resolved
utilities/cache_dump_load_impl.cc Show resolved Hide resolved
utilities/cache_dump_load_impl.cc Outdated Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Outdated Show resolved Hide resolved
utilities/cache_dump_load_impl.h Show resolved Hide resolved
utilities/cache_dump_load_impl.h Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved
Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good. Some open concerns about the API. If we mark it EXPERIMENTAL, I'll be happy to approve without sorting out all the API details.

include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved
include/rocksdb/utilities/cache_dump_load.h Show resolved Hide resolved
}
uint32_t dump_insert = tmp_cache->GetInsertCount() - start_insert;
uint32_t dump_lookup = tmp_cache->GetLookupcount() - start_lookup;
ASSERT_EQ(63,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I think that ASSERT_EQ(63U, dump_insert) would work. (Not a problem.)

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the CircleCI issues are resolved. Thanks!

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Oct 6, 2021
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* TODO: explain internal vs. external ID

Intended follow-up (avoid conflicts with facebook#8912): use the internal unique IDs
in cache keys. (The file offset can be XORed into the third 64-bit value of
the unique ID.)

Test Plan: Unit tests added
TODO: stress test support
@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Oct 19, 2021
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with #8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (#8968)

Pull Request resolved: #8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
yoori pushed a commit to yoori/rocksdb that referenced this pull request Nov 26, 2023
…rt to secondary cache (#8912)

Summary:
Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.

Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.

Pull Request resolved: facebook/rocksdb#8912

Test Plan: add new tests to lru_cache_test and pass make check.

Reviewed By: pdillinger

Differential Revision: D31452871

Pulled By: zhichao-cao

fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48
yoori pushed a commit to yoori/rocksdb that referenced this pull request Nov 26, 2023
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with facebook/rocksdb#8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (facebook/rocksdb#8968)

Pull Request resolved: facebook/rocksdb#8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants