-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Refactor block cache tracing APIs #10811
Conversation
Referring PR #9326 |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
69f2310
to
8b8e3ca
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Update HISTORY.md.
Summary: Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter. This same philosophy can be applied to KV and IO tracing as well. Test Plan: existing unit tests Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
da65306
to
f6e275e
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
||
Status StartBlockCacheTrace( | ||
const BlockCacheTraceOptions& options, | ||
std::unique_ptr<BlockCacheTraceWriter>&& trace_writer) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if StartBlockCacheTrace()
accepts a TraceRecord::Handler
for users who want to do custom record tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about that approach as I am not familiar with the query tracing. Let me take a look at it. cc @anand1976 Do you have any thoughts on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it would be pretty different. I can try that approach for KV tracing and see how it looks. Feel free to land this now if you need it in 7.8 release as I think it'll take a while to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the code and it seems its more involved with Handler as the process for query tracing is quite different from block cache tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if
StartBlockCacheTrace()
accepts aTraceRecord::Handler
for users who want to do custom record tracing?
It's probably more general than needed for this PR's stated purpose. The downside to generalizing the write side so much is the read side can't make any assumptions, e.g., that a user can provide a TraceReader
to read a trace. This PR doesn't handle the read side either but I think the design should allow it. However, you would need to introduce yet another independent interface (BlockCacheTraceReader
).
What if instead we made a TraceCodec
interface that defines the string
<->TraceRecord
conversion? It could be provided together with TraceWriter
to all the write functions, and provided together with TraceReader
to all the read functions. That way there is just one new interface for the combinations of (block cache, KV, I/O) x (read, write).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the string<->TraceRecord
conversion? How it would be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, a TraceCodec
would include the following functions:
class TraceCodec {
...
virtual std::string Encode(const BlockCacheTraceRecord&) = 0;
virtual BlockCacheTraceRecord Decode(const std::string&) = 0;
...
};
Then BlockCacheTraceWriter
can be internal again, but needs to be configured with a TraceCodec
that comes from StartBlockCacheTrace()
. Say we store it in an instance variable, trace_codec_
. Then BlockCacheTraceWriter
can use it like this:
Status BlockCacheTraceWriter::WriteBlockAccess(
const BlockCacheTraceRecord& record) {
return trace_writer_->Write(trace_codec_->Encode(record));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. That's something that can be done. I will give it a shot and see how it goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, is your client use case OK with using TraceWriter
for writing the physical trace records? If not, something like the BlockCacheTraceWriter
seems necessary to customize writing logical trace records.
edit: giving users the ability to customize writing logical trace records (what you did here) could be a good approach in any case. I am mostly interested in unifying the class hierarchies so will think more how that would look for this approach.
namespace ROCKSDB_NAMESPACE { | ||
// A record for block cache lookups/inserts. This is passed by the table | ||
// reader to the BlockCacheTraceWriter for every block cache op. | ||
struct BlockCacheTraceRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be in the TraceRecord
class hierarchy?
Summary: Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter. `DB::StartBlockTrace` will internally redirect to changed `BlockCacheTrace::StartBlockCacheTrace`. New API `DB::StartBlockTrace` is also added that directly takes `BlockCacheTraceWriter` pointer. This same philosophy can be applied to KV and IO tracing as well. Pull Request resolved: facebook#10811 Test Plan: existing unit tests Old API DB::StartBlockTrace checked with db_bench tool create database ``` ./db_bench --benchmarks="fillseq" \ --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \ --cache_index_and_filter_blocks --cache_size=1048576 \ --disable_auto_compactions=1 --disable_wal=1 --compression_type=none \ --min_level_to_compress=-1 --compression_ratio=1 --num=10000000 ``` To trace block cache accesses when running readrandom benchmark: ``` ./db_bench --benchmarks="readrandom" --use_existing_db --duration=60 \ --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 \ --cache_index_and_filter_blocks --cache_size=1048576 \ --disable_auto_compactions=1 --disable_wal=1 --compression_type=none \ --min_level_to_compress=-1 --compression_ratio=1 --num=10000000 \ --threads=16 \ -block_cache_trace_file="/tmp/binary_trace_test_example" \ -block_cache_trace_max_trace_file_size_in_bytes=1073741824 \ -block_cache_trace_sampling_frequency=1 ``` Reviewed By: anand1976 Differential Revision: D40435289 Pulled By: akankshamahajan15 fbshipit-source-id: fa2755f4788185e19f4605e731641cfd21ab3282
Summary: Refactor the classes, APIs and data structures for block cache tracing to allow a user provided trace writer to be used. Currently, only a TraceWriter is supported, with a default built-in implementation of FileTraceWriter. The TraceWriter, however, takes a flat trace record and is thus only suitable for file tracing. This PR introduces an abstract BlockCacheTraceWriter class that takes a structured BlockCacheTraceRecord. The BlockCacheTraceWriter implementation can then format and log the record in whatever way it sees fit. The default BlockCacheTraceWriterImpl does file tracing using a user provided TraceWriter.
DB::StartBlockTrace
will internally redirect to changedBlockCacheTrace::StartBlockCacheTrace
.New API
DB::StartBlockTrace
is also added that directly takesBlockCacheTraceWriter
pointer.This same philosophy can be applied to KV and IO tracing as well.
Test Plan: existing unit tests
Old API DB::StartBlockTrace checked with db_bench tool
create database
To trace block cache accesses when running readrandom benchmark:
Reviewers:
Subscribers:
Tasks:
Tags: