Add PerfContextByLevel to provide per level perf context information #4226

miasantreble · 2018-08-03T21:49:46Z

Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds PerfContextByLevel to decompose the counters by level.
This will be helpful when analyzing point and range query performance as well as tuning bloom filter
Also replaced __thread with thread_local keyword for perf_context

siying · 2018-08-03T22:55:29Z

db/db_impl_open.cc

    }
  }
+  // TODO(Zhongyi): need to initialize this in all threads that might need
+  // access to PerfContextByLevel
+  perf_context.EnablePerLevelPerfContext(max_num_levels);


This is wrong place to do the thing. Open() may not be done in the same thread as query.

And we cannot initialize for all threads. It's bad. People need to enable it their for their thread.

Also be aware that perf_context is not per DB or per CF. It's shared across all DBs and all CFs. Getting max level for one CF and allocate based on it is not a good idea. Maybe we should consider to allocate a fixed number, say 16. All levels 15 or up fall into level 15. Or maybe we can even think of making it std::map from level to struct.

siying · 2018-08-03T22:56:53Z

monitoring/perf_context.cc

  return ss.str();
 #endif
 }

+void PerfContext::EnablePerLevelPerfContext(int levels){
+  num_levels = levels;
+  perf_context_by_level = new PerfContextByLevel[levels];


We should support that people call EnablePerLevelPerfContext() multiple times. So if perf_context_by_level is not null, we shouldn't allocate again.

siying · 2018-08-03T22:57:21Z

monitoring/perf_context.cc

+
+void PerfContext::DisablePerLevelPerfContext(){
+  per_level_perf_context_enabled = false;
+  delete perf_context_by_level;


This maybe expensive. Consider to have disable and clear up as two separate functions.

siying · 2018-08-13T18:53:24Z

db/compaction_job.cc

@@ -811,6 +811,7 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
  if (measure_io_stats_) {
    prev_perf_level = GetPerfLevel();
    SetPerfLevel(PerfLevel::kEnableTime);
+    perf_context.EnablePerLevelPerfContext();


I don't understand why we need it. Is there per-level perf context counters we report for compactions.

siying · 2018-08-13T18:53:50Z

db/flush_job.cc

@@ -207,6 +207,7 @@ Status FlushJob::Run(LogsWithPrepTracker* prep_tracker,
  if (measure_io_stats_) {
    prev_perf_level = GetPerfLevel();
    SetPerfLevel(PerfLevel::kEnableTime);
+    perf_context.EnablePerLevelPerfContext();


Same here. Why do we need this?

siying · 2018-08-13T18:54:21Z

include/rocksdb/perf_context.h

@@ -5,6 +5,7 @@

 #ifndef STORAGE_ROCKSDB_INCLUDE_PERF_CONTEXT_H
 #define STORAGE_ROCKSDB_INCLUDE_PERF_CONTEXT_H
+#define MAX_PERF_CONTEXT_LEVELS 16


This is public API. We should avoid define like this.

siying · 2018-08-13T18:55:46Z

include/rocksdb/perf_context.h

@@ -169,6 +189,8 @@ struct PerfContext {
  uint64_t env_lock_file_nanos;
  uint64_t env_unlock_file_nanos;
  uint64_t env_new_logger_nanos;
+  PerfContextByLevel* perf_context_by_level;


We mentioned that we can consider to use std::map to support all levels. What's your thoughts on this approach?

siying · 2018-08-13T18:56:35Z

monitoring/perf_context.cc

@@ -186,8 +204,30 @@ std::string PerfContext::ToString(bool exclude_zero_counters) const {
  PERF_CONTEXT_OUTPUT(env_lock_file_nanos);
  PERF_CONTEXT_OUTPUT(env_unlock_file_nanos);
  PERF_CONTEXT_OUTPUT(env_new_logger_nanos);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT(bloom_filter_useful);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT(bloom_filter_full_positive);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT(bloom_filter_full_true_positive);


The printing format feels better if all the counters for one level is printed together.

siying · 2018-10-12T15:59:45Z

monitoring/perf_context.cc

@@ -104,6 +104,14 @@ void PerfContext::Reset() {
  env_lock_file_nanos = 0;
  env_unlock_file_nanos = 0;
  env_new_logger_nanos = 0;
+  if (per_level_perf_context_enabled && level_to_perf_context) {
+    for (auto iter = level_to_perf_context->begin(); iter != level_to_perf_context->end(); ++iter) {


This line is over 80-char.

siying · 2018-10-12T16:00:52Z

monitoring/perf_context.cc

@@ -104,6 +104,14 @@ void PerfContext::Reset() {
  env_lock_file_nanos = 0;
  env_unlock_file_nanos = 0;
  env_new_logger_nanos = 0;
+  if (per_level_perf_context_enabled && level_to_perf_context) {
+    for (auto iter = level_to_perf_context->begin(); iter != level_to_perf_context->end(); ++iter) {


Can you do for (auto iter : level_to_perf_context)?

siying · 2018-10-12T16:01:51Z

monitoring/perf_context.cc

+      level_to_perf_context) {                                    \
+    ss << #counter << " = ";                                      \
+    for (auto iter = level_to_perf_context->begin();              \
+         iter != level_to_perf_context->end(); ++iter) {          \


Same question, does for (auto iter : level_to_perf_context) work?

siying · 2018-10-12T16:02:17Z

monitoring/perf_context.cc

+    for (auto iter = level_to_perf_context->begin();              \
+         iter != level_to_perf_context->end(); ++iter) {          \
+      if (iter->second) {                                         \
+        ss << iter->second->counter << ", ";                      \


There may be gap for level numbers. I think we should create level number too.

siying · 2018-10-12T16:03:39Z

monitoring/perf_context.cc

+    << "bloom_filter_full_positive = " \
+    << (*level_to_perf_context)[level]->bloom_filter_full_positive << "," \
+    << "bloom_filter_full_true_positive = " \
+    << (*level_to_perf_context)[level]->bloom_filter_full_true_positive << ",";\


If we add more counters per level, this macro will grow. It doesn't seem to be a scalable solution to me.
I don't see the necessarily of this macro. Can you use a helper function instead?

siying · 2018-10-12T16:06:04Z

include/rocksdb/perf_context.h

@@ -169,6 +189,8 @@ struct PerfContext {
  uint64_t env_lock_file_nanos;
  uint64_t env_unlock_file_nanos;
  uint64_t env_new_logger_nanos;
+  std::map<uint32_t, PerfContextByLevel*>* level_to_perf_context;


You chose PerfContextByLevel* instead of PerfContextByLevel. What's the reason behind it? Is it to save memory waste? One clear downside is that, there will be memory leak after a thread is terminated.

siying · 2018-10-12T16:11:24Z

monitoring/perf_context_imp.h

+  if (perf_level >= PerfLevel::kEnableCount &&                               \
+      perf_context.per_level_perf_context_enabled &&                         \
+      perf_context.level_to_perf_context) {                                  \
+    if ((*(perf_context.level_to_perf_context))[level]) {                    \


I read std::map doc, but determining non existing element to be null and set it if it is still confusing to me: https://en.cppreference.com/w/cpp/container/map/operator_at

I suggest we write it in a more explicit way: find() and if we can't find it, set a new one.

siying · 2018-10-12T16:12:03Z

table/block_based_table_reader.cc

@@ -2307,6 +2307,7 @@ bool BlockBasedTable::FullFilterKeyMayMatch(
  }
  if (may_match) {
    RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_POSITIVE);
+    PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_full_positive, 1, 0/*level*/);


Why level 0?

siying · 2018-10-12T16:12:10Z

table/block_based_table_reader.cc

@@ -2331,6 +2332,8 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
  if (!FullFilterKeyMayMatch(read_options, filter, key, no_io,
                             prefix_extractor)) {
    RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_USEFUL);
+    // TODO(Zhongyi): use the correct level here
+    PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_useful, 1, 0/*level*/);


Why level 0?

siying · 2018-10-12T16:12:22Z

table/block_based_table_reader.cc

@@ -2415,6 +2420,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
    }
    if (matched && filter != nullptr && !filter->IsBlockBased()) {
      RecordTick(rep_->ioptions.statistics, BLOOM_FILTER_FULL_TRUE_POSITIVE);
+      PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_full_true_positive, 1, 0/*level*/);


All the same question. Why level 0?

…e more insight into query performance

miasantreble · 2018-10-12T23:16:50Z

thanks @siying for reviewing the PR, all comments have been addressed

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2018-10-15T16:56:48Z

include/rocksdb/perf_context.h

@@ -168,6 +188,8 @@ struct PerfContext {
  uint64_t env_lock_file_nanos;
  uint64_t env_unlock_file_nanos;
  uint64_t env_new_logger_nanos;
+  std::map<uint32_t, PerfContextByLevel>* level_to_perf_context;


I think about it more. Not just PerfContextByLevel, but the map itself is leaked after the thread exits. Does C++ allow a constructor of PerfContext? If it allows, we should destruct the map. If not, we should clearly comment in the EnablePerLevelPerfContext() that the user is responsible to call ClearPerLevelPerfContext() before the thread exits. Otherwise, something will be leaked.

I tried to add a destructor for PerfContext class but got compilation error:

monitoring/perf_context.cc:18:22: error: non-local variable ‘rocksdb::perf_context’ declared ‘__thread’ has a non-trivial destructor
__thread PerfContext perf_context;
monitoring/perf_context.cc:18:22: note: C++11 ‘thread_local’ allows dynamic initialization and destruction

I will add comments to advise users to always pair EnablePerLevelPerfContext() with ClearPerLevelPerfContext()

I know it will make the diff's scope large, but we didn't use "thread_local" key-word because we wanted to support GCC 4.7. Now 4.7 doesn't work anyway, and 4.8 is the minimum version we support. How about change to use thread_local keyword instead of __thread now?

tested with thread_local and make check passed, I will replace __thread with thread_local in a separate PR

facebook-github-bot · 2018-10-15T17:30:25Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2018-10-15T22:20:47Z

db/db_bloom_filter_test.cc

@@ -171,20 +172,30 @@ TEST_F(DBBloomFilterTest, GetFilterByPrefixBloomCustomPrefixExtractor) {

    ASSERT_EQ("foo", Get("barbarbar"));
    ASSERT_EQ(TestGetTickerCount(options, BLOOM_FILTER_USEFUL), 0);
+    // TODO (Zhongyi): uncomment the asserts involving level_to_perf_context when per
+    // level perf context is enabled in block based table reader
+    // ASSERT_EQ(0, (*(get_perf_context()->level_to_perf_context))[0].bloom_filter_useful);


With all those commented out, the unit tests won't be able to catch any regression bug of per level perf context. I don't think it is good.

the per level perf context is not used in this PR anyway (because level info is missing in current implementation of block based table reader), so there is really nothing much to test at this point

You can manually call the MACRO to set the counters and verify them.
Also, it's not a good idea to have counters that are never set. It confuses people. I suggest the follow-up PR that correctly set those counters be committed as soon as possible.

added test coverage in PerfContextTest.PerfContextByLevelGetSet to do set and get of per level perf context

siying · 2018-10-15T22:22:53Z

monitoring/perf_context.cc

@@ -186,8 +205,29 @@ std::string PerfContext::ToString(bool exclude_zero_counters) const {
  PERF_CONTEXT_OUTPUT(env_lock_file_nanos);
  PERF_CONTEXT_OUTPUT(env_unlock_file_nanos);
  PERF_CONTEXT_OUTPUT(env_new_logger_nanos);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_useful);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_full_positive);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_full_true_positive);


Just curious how did you choose between printing all counters for a level, and under every counter, show every level?

added PerfContextByLevel::ToString to print all counters at a particular level, user can call by (*(get_perf_context()->level_to_perf_context))[level].ToString(true);

I would avoid to add one extra ToString() unless we are convinced that it is needed. My initial question is that, how did you choose between this format:

bloom_filter_useful: 5@0, 4@1, 0@2 bloom_filter_full_positive: 0@0, 0:1, 0:2

and

Level 0: bloom_filter_useful: 5 bloom_filter_full_positive: 0 Level 1: bloom_filter_useful: 4 bloom_filter_full_positive: 0 Level 2: bloom_filter_useful: 0 bloom_filter_full_positive: 0

I'm OK with either way. I'm just curious on how it was decided.

I chose the former as default since I think in most cases we want to know the distribution of a particular counter at different levels, but also added ToString() to support the latter because the latter might also be useful when user want to browse through all counters and get a bigger picture

I suggest we remove this ToString() for now. I'm not sure that people will use it.

facebook-github-bot · 2018-10-16T19:55:07Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2018-10-16T20:50:57Z

include/rocksdb/perf_context.h

@@ -7,6 +7,7 @@

 #include <stdint.h>
 #include <string>
+#include <map>


m is before s.

siying · 2018-10-16T20:52:28Z

monitoring/perf_context.cc

@@ -186,8 +205,29 @@ std::string PerfContext::ToString(bool exclude_zero_counters) const {
  PERF_CONTEXT_OUTPUT(env_lock_file_nanos);
  PERF_CONTEXT_OUTPUT(env_unlock_file_nanos);
  PERF_CONTEXT_OUTPUT(env_new_logger_nanos);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_useful);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_full_positive);
+  PERF_CONTEXT_BY_LEVEL_OUTPUT_ONE_COUNTER(bloom_filter_full_true_positive);


I would avoid to add one extra ToString() unless we are convinced that it is needed. My initial question is that, how did you choose between this format:

bloom_filter_useful: 5@0, 4@1, 0@2 bloom_filter_full_positive: 0@0, 0:1, 0:2

and

Level 0: bloom_filter_useful: 5 bloom_filter_full_positive: 0 Level 1: bloom_filter_useful: 4 bloom_filter_full_positive: 0 Level 2: bloom_filter_useful: 0 bloom_filter_full_positive: 0

I'm OK with either way. I'm just curious on how it was decided.

facebook-github-bot · 2018-10-16T21:09:05Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot · 2018-10-16T21:13:21Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2018-10-16T22:12:40Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2018-10-16T23:45:54Z

monitoring/perf_context.cc

@@ -13,9 +13,9 @@ namespace rocksdb {
 PerfContext perf_context;
 #else
 #if defined(OS_SOLARIS)
-__thread PerfContext perf_context_;
+thread_local PerfContext perf_context_;


Is it supported in OS_SOLARIS?

siying · 2018-10-16T23:47:13Z

include/rocksdb/perf_context.h

 struct PerfContext {

  void Reset(); // reset all performance counters to zero

  std::string ToString(bool exclude_zero_counters = false) const;

+  // enable per level perf context and allocate storage for PerfContextByLevel
+  // Always pair calls to `EnablePerLevelPerfContext()` with
+  // `ClearPerLevelPerfContext`, otherwise there will be memory leak


Where is the destructor that can prevent this leak?

facebook-github-bot · 2018-10-17T00:50:15Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying

Please mention this feature and the thread_local change in HISTORY.md.

siying · 2018-10-17T17:09:13Z

monitoring/perf_context.cc

+
+void PerfContext::ClearPerLevelPerfContext(){
+  if (level_to_perf_context) {
+    level_to_perf_context->clear();


Why do you need to clear before delete it?

facebook-github-bot · 2018-10-17T17:19:57Z

@miasantreble has updated the pull request. Re-import the pull request

facebook-github-bot

miasantreble has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: PR #4226 introduced per-level perf context which allows breaking down perf context by levels. This PR takes advantage of the feature to populate a few counters related to bloom filters Pull Request resolved: #4581 Differential Revision: D10518010 Pulled By: miasantreble fbshipit-source-id: 011244561783ec860d32d5b0fa6bce6e78d70ef8

miasantreble requested a review from siying August 3, 2018 21:49

facebook-github-bot added the CLA Signed label Aug 3, 2018

siying reviewed Aug 3, 2018

View reviewed changes

siying reviewed Aug 13, 2018

View reviewed changes

miasantreble force-pushed the perfcontext-per-level branch from 6325c0e to 09442d0 Compare September 10, 2018 18:44

miasantreble changed the title ~~[WIP] Add PerfContextByLevel to provide by level perf context information~~ Add PerfContextByLevel to provide per level perf context information Sep 10, 2018

siying reviewed Oct 12, 2018

View reviewed changes

miasantreble added 7 commits October 12, 2018 15:04

per-level perfcontext: break down certain counters by level to provid…

819167c

…e more insight into query performance

add PerfContextByLevel

bf4d11b

address review comments

bd6330c

allocate per level perf context whenever necessary

cfe2556

address review comments

954f89c

use std::map to store per level perf context

e0ccaf2

address review comments

2373d26

miasantreble force-pushed the perfcontext-per-level branch from 09442d0 to 2373d26 Compare October 12, 2018 23:13

facebook-github-bot reviewed Oct 12, 2018

View reviewed changes

siying reviewed Oct 15, 2018

View reviewed changes

improve comments about ClearPerLevelPerfContext

a0a002c

facebook-github-bot reviewed Oct 15, 2018

View reviewed changes

siying reviewed Oct 15, 2018

View reviewed changes

add test

268629f

facebook-github-bot reviewed Oct 16, 2018

View reviewed changes

siying reviewed Oct 16, 2018

View reviewed changes

fix uninitialized parameter bug

45a1562

miasantreble force-pushed the perfcontext-per-level branch from 0de04ba to 45a1562 Compare October 16, 2018 21:13

facebook-github-bot reviewed Oct 16, 2018

View reviewed changes

remove PerfContextByLevel::ToString, replace __thread with thread_local

39400fb

facebook-github-bot reviewed Oct 16, 2018

View reviewed changes

siying reviewed Oct 16, 2018

View reviewed changes

address review comments

e86d4d0

facebook-github-bot reviewed Oct 17, 2018

View reviewed changes

siying approved these changes Oct 17, 2018

View reviewed changes

update history

d1f53d7

facebook-github-bot reviewed Oct 17, 2018

View reviewed changes

facebook-github-bot closed this in d6ec288 Oct 17, 2018

miasantreble deleted the perfcontext-per-level branch October 17, 2018 18:25

miasantreble mentioned this pull request Oct 23, 2018

use per-level perf context for bloom filter related counters #4581

Closed

thatsafunnyname mentioned this pull request Aug 13, 2020

Memory leak reported out of __cxa_thread_atexit #5931

Open

Add PerfContextByLevel to provide per level perf context information #4226

Add PerfContextByLevel to provide per level perf context information #4226

Conversation

miasantreble commented Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miasantreble commented Oct 12, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 15, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 16, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 16, 2018

facebook-github-bot commented Oct 16, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 16, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 17, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

siying left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 17, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

miasantreble commented Aug 3, 2018 •

edited

Loading