Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary big for-loop when reporting ticker stats stored in GetContext #3490

Closed

Conversation

miasantreble
Copy link
Contributor

@miasantreble miasantreble commented Feb 12, 2018

Currently in Version::Get when reporting ticker stats stored in GetContext, there is a big for-loop through all Ticker which adds unnecessary cost to overall CPU usage. We can optimize by storing only ticker values that are used in Get() calls in a new struct GetContextStats since only a small fraction of all tickers are used in Get() calls. For comparison, with the new approach we only need to visit 17 values while old approach will require visiting 100+ Ticker

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr
Copy link
Contributor

ajkr commented Feb 12, 2018

Can you include the benchmark command and amount of CPU savings in the PR description?

for (uint32_t t = 0; t < Tickers::TICKER_ENUM_MAX; t++) {
if (get_context.tickers_value[t] > 0) {
RecordTick(db_statistics_, t, get_context.tickers_value[t]);
for (auto ticker_pair : get_context.tickers_pairs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap this with a if (db_statistics_ != nullptr)

for (uint32_t t = 0; t < Tickers::TICKER_ENUM_MAX; t++) {
if (get_context.tickers_value[t] > 0) {
RecordTick(db_statistics_, t, get_context.tickers_value[t]);
for (auto ticker_pair : get_context.tickers_pairs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. No need to do this for loop if db_statistics_ is null.

@@ -27,7 +27,7 @@ class GetContext {
kMerge, // saver contains the current merge result (the operands)
kBlobIndex,
};
uint64_t tickers_value[Tickers::TICKER_ENUM_MAX] = {0};
autovector<std::pair<Tickers, uint64_t>> tickers_pairs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you figure out how many entries we need and set it up in autovector?

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. View: changes, changes since last import

}

void GetContext::InitTickers() {
ticker_pairs_.push_back(std::make_pair<Tickers, uint64_t>(BLOCK_CACHE_HIT, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks kind of confusing and hard to maintain. Does it give any benefit?

@@ -88,10 +89,33 @@ void GetContext::SaveValue(const Slice& value, SequenceNumber seq) {
}

void GetContext::RecordCounters(Tickers ticker, size_t val) {
if (ticker == Tickers::TICKER_ENUM_MAX) {
return;
for (auto it = ticker_pairs_.begin(); it != ticker_pairs_.end(); ++it) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have intuition why this PR is faster overall -- before we had an array lookup here, and now it's a for-loop. Have you measured it? Make sure to use a fairly large DB as then a single Get() will have to execute this for-loop many times.

ticker_pairs_.push_back(std::make_pair<Tickers, uint64_t>(BLOCK_CACHE_DATA_ADD, 0));
ticker_pairs_.push_back(std::make_pair<Tickers, uint64_t>(BLOCK_CACHE_DATA_BYTES_INSERT, 0));
ticker_pairs_.push_back(std::make_pair<Tickers, uint64_t>(BLOCK_CACHE_FILTER_ADD, 0));
ticker_pairs_.push_back(std::make_pair<Tickers, uint64_t>(BLOCK_CACHE_FILTER_BYTES_INSERT, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I didn't realize the list is that long. How about we make a GetStats object and make them all variables? We already have things like this: https://github.com/facebook/rocksdb/blob/5.11.fb/db/compaction_iteration_stats.h and https://github.com/facebook/rocksdb/blob/5.11.fb/db/internal_stats.h#L133-L161 which don't feel very hard to maintain.

@miasantreble
Copy link
Contributor Author

miasantreble commented Feb 15, 2018

Block Size # of Entries Children Self Symbol
1M(master) 1M 95.89% 0.88% rocksdb::Version::Get
5.09% 0.25% rocksdb::(anonymous namespace)::GetEntryFromCache
0.41% 0.31% rocksdb::GetContext::SaveValue
0.34% 0.33% rocksdb::GetContext::RecordCounters
0.07% 0.07% rocksdb::GetContext::GetContext
1M(branch) 1M 96.60% 0.72% rocksdb::Version::Get
5.06% 0.34% rocksdb::(anonymous namespace)::GetEntryFromCache
0.49% 0.38% rocksdb::GetContext::SaveValue
0.25% 0.05% rocksdb::GetContext::ReportCounters
0.04% 0.04% rocksdb::GetContext::GetContext
1M(master) 10M 95.39% 0.90% rocksdb::Version::Get
4.87% 0.22% rocksdb::(anonymous namespace)::GetEntryFromCache
0.29% 0.29% rocksdb::GetContext::RecordCounters
0.39% 0.29% rocksdb::GetContext::SaveValue
0.08% 0.08% rocksdb::GetContext::GetContext
1M(branch) 10M 90.94% 0.75% rocksdb::Version::Get
4.55% 0.31% rocksdb::(anonymous namespace)::GetEntryFromCache
0.81% 0.13% rocksdb::GetContext::ReportCounters
0.50% 0.38% rocksdb::GetContext::SaveValue
0.12% 0.12% rocksdb::GetContext::GetContext
1G(master) 1M 0.54% 0.54% rocksdb::GetContext::GetContext
1G(branch) 1M 0.76% 0.30% rocksdb::Version::Get
0.20% 0.20% rocksdb::GetContext::GetContext
0.11% 0.10% rocksdb::GetContext::ReportCounters
1G(master) 10M 35.98% 1.86% rocksdb::Version::Get
5.91% 0.16% rocksdb::(anonymous namespace)::GetEntryFromCache
0.59% 0.59% rocksdb::GetContext::GetContext
0.28% 0.19% rocksdb::GetContext::SaveValue
0.02% 0.02% rocksdb::GetContext::RecordCounters
1G(branch) 10M 35.49% 0.77% rocksdb::Version::Get
6.25% 0.23% rocksdb::(anonymous namespace)::GetEntryFromCache
1.44% 0.19% rocksdb::GetContext::ReportCounters
0.28% 0.18% rocksdb::GetContext::SaveValue
0.25% 0.24% rocksdb::GetContext::GetContext

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr
Copy link
Contributor

ajkr commented Feb 26, 2018

Is the correct conclusion from the numbers that ReportCounters (this branch) is faster than RecordCounters (master) in the case where the database is small; otherwise it's slower?

@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

@miasantreble
Copy link
Contributor Author

miasantreble commented Jul 19, 2018

Block Size # of Entries Children Self Symbol
1M(master) 1M
0.72% 0.27% rocksdb::GetContext::ReportCounters_Loop
0.34% 0.26% rocksdb::GetContext::SaveValue
0.12% 0.12% rocksdb::GetContext::RecordCounters
0.11% 0.10% rocksdb::GetContext::GetContext
1M(branch) 1M
0.35% 0.26% rocksdb::GetContext::SaveValue
0.35% 0.03% rocksdb::GetContext::ReportCounters
0.04% 0.04% rocksdb::GetContext::GetContext
1M(master) 5M
0.68% 0.25% rocksdb::GetContext::ReportCounters_Loop
0.34% 0.26% rocksdb::GetContext::SaveValue
0.10% 0.10% rocksdb::GetContext::GetContext
0.10% 0.10% rocksdb::GetContext::RecordCounters
1M(branch) 5M
0.58% 0.03% rocksdb::GetContext::ReportCounters
0.36% 0.25% rocksdb::GetContext::SaveValue
0.06% 0.06% rocksdb::GetContext::GetContext

db_bench command used: TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -write_buffer_size=1048576 -compression_type=none && TEST_TMPDIR=/dev/shm perf record -g ./db_bench -benchmarks=readrandom -use_existing_db -readonly -statistics -num=1000000 -threads=32 -cache_size=1048576

ReportCounters_Loop is added in master which is a wrapper for the big for-loop:

GetContext::ReportCounters_Loop() {
  for (uint32_t t = 0; t < Tickers::TICKER_ENUM_MAX; t++) {
    if (tickers_value[t] > 0) {
      RecordTick(statistics_, t, tickers_value[t]);
    }
  }
}

So looks like the new approach is faster for 1M keys + 1M block size case. Tried to run 5M keys + 1M block size but the perf.data becomes too large to open (7GB)

@siying
Copy link
Contributor

siying commented Jul 20, 2018

The approach is good but where is autovector?

@miasantreble miasantreble changed the title replace array with autovector to save cpu cost avoid unnecessary big for-loop when reporting counters Jul 20, 2018
@facebook-github-bot
Copy link
Contributor

@miasantreble has updated the pull request. Re-import the pull request

@siying
Copy link
Contributor

siying commented Jul 20, 2018

Can you improve your title and summary? It's not clear what "counters" you mean.

@miasantreble miasantreble changed the title avoid unnecessary big for-loop when reporting counters Avoid unnecessary big for-loop when reporting ticker stats stored in GetContext Jul 20, 2018
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miasantreble is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@miasantreble miasantreble deleted the optimize-ticker-reporting branch July 21, 2018 05:07
rcane pushed a commit to rcane/rocksdb that referenced this pull request Sep 13, 2018
…GetContext (facebook#3490)

Summary:
Currently in `Version::Get` when reporting ticker stats stored in `GetContext`, there is a big for-loop through all `Ticker` which adds unnecessary cost to overall CPU usage. We can optimize by storing only ticker values that are used in `Get()` calls in a new struct `GetContextStats` since only a small fraction of all tickers are used in `Get()` calls. For comparison, with the new approach we only need to visit 17 values while old approach will require visiting 100+ `Ticker`
Pull Request resolved: facebook#3490

Differential Revision: D6969154

Pulled By: miasantreble

fbshipit-source-id: fc27072965a3a94125a3e6883d20dafcf5b84029
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants