Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a global StatsDumpScheduler for stats dumping #7223

Closed
wants to merge 7 commits into from

Conversation

jay-zhuang
Copy link
Contributor

@jay-zhuang jay-zhuang commented Aug 6, 2020

Have a global StatsDumpScheduler for all DB instance stats dumping, including DumpStats() and PersistStats(). Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awesome to see this progress!
Just two random thoughts:

  1. we are assuming all DBs' status dumping is done by one thread. It might work but in case one dumping thread is stalling (e.g. DB is stalling), then all DBs are blocked. Even this is much better than what we have today, so I have no problem with going with this. It is worth considering that these threads are executed in thread pools instead. But it won't block this diff.
  2. we would need timer for periodic compaction and TTL based compaction. It's worth considering what the code should look like after we have multiple of the tasks. Of course, we don't have to do over engineering in this first timer use case. Just some random thoughts.

monitoring/stats_dump_scheduler.cc Show resolved Hide resolved
Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jay-zhuang for working on this task. Left a few comments.

TARGETS Show resolved Hide resolved
db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
util/timer.h Outdated
@@ -119,11 +138,13 @@ class Timer {
continue;
}

FunctionInfo* current_fn = heap_.top();
auto current_fn = heap_.top();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use auto if the type can be easily told, which is not the case here. Maybe this is a matter of taste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I feel it's better to explicitly specify the type by saying FunctionInfo* current_fn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'm wondering why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow https://google.github.io/styleguide/cppguide.html#Type_deduction. To sum up, use type deduction only when it is clearer or safer, which is not the case here.

monitoring/stats_dump_scheduler.h Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.h Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.cc Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.cc Outdated Show resolved Hide resolved
util/timer.h Outdated Show resolved Hide resolved
util/timer.h Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.cc Show resolved Hide resolved
@jay-zhuang jay-zhuang changed the title [WIP] Add stats dump scheduler Introduce a global StatsDumpScheduler for stats dumping Aug 11, 2020
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jay-zhuang for the PR. Took a pass and left some comments.

db/db_impl/db_impl_open.cc Show resolved Hide resolved
monitoring/stats_dump_scheduler.cc Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.cc Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.h Outdated Show resolved Hide resolved
monitoring/stats_dump_scheduler.h Show resolved Hide resolved
monitoring/stats_history_test.cc Outdated Show resolved Hide resolved
monitoring/stats_history_test.cc Outdated Show resolved Hide resolved
util/timer.h Outdated Show resolved Hide resolved
util/timer.h Show resolved Hide resolved
util/timer.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

util/timer_test.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two more comments, more to come. Can you also rebase? @jay-zhuang

db/db_impl/db_impl.cc Show resolved Hide resolved
db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with a few more comments. We are close to getting this done.

db/db_impl/db_impl.h Show resolved Hide resolved
#ifndef NDEBUG
bool StatsDumpScheduler::TEST_SetEnv(Env* env) {
MutexLock l(&test_mutex_);
auto scheduler = Default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird to have a non-static setter method in which we get a pointer to a singleton and set the value for the singleton object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it a little bit to create a StatsDumpTestScheduler, which takes Env for initializing new scheduler object, and contains all the test functions. It will be override with the Callback you mentioned below.

monitoring/stats_dump_scheduler.h Show resolved Hide resolved
Close();
}

TEST_F(StatsDumpSchedulerTest, MultiInstancesTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/MultiInstancesTest/MultiInstances/ to save some characters for the test name. This can apply to other tests.

monitoring/stats_dump_scheduler_test.cc Outdated Show resolved Hide resolved

std::string dbname = test::PerThreadDBPath("multi_env_test");
DB* db;
DB::Open(options2, dbname, &db);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should check for the return value of Open by doing ASSERT_OK.

util/timer.h Outdated Show resolved Hide resolved
util/timer.h Outdated
@@ -119,11 +138,13 @@ class Timer {
continue;
}

FunctionInfo* current_fn = heap_.top();
auto current_fn = heap_.top();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I feel it's better to explicitly specify the type by saying FunctionInfo* current_fn.

util/timer.h Show resolved Hide resolved
}
stats_dump_scheduler_ = StatsDumpScheduler::Default();
#ifndef NDEBUG
stats_dump_scheduler_->TEST_SetEnv(env_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is about the difference in behavior for production and test code. Env::Default() is used in production, while env_ is used in tests. I am thinking about the following.
Add a sync point in this function.

TEST_SYNC_POINT_CALLBACK("DBImpl::StartStatsDumpScheduler:SetEnv", stats_dump_scheduler_);

In your unit test,

SyncPoint::GetInstance()->SetCallBack("DBImpl::StartStatsDumpScheduler:SetEnv", [&](void* arg) {
    auto* stats_dump_scheduler = reinterpret_cast<StatsDumpScheduler*>(arg);
    ASSERT_NE(nullptr, stats_dump_scheduler);
    stats_dump_scheduler->TEST_SetEnv(env_);
  });

This way, most of the unit tests will have the same behavior as production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be better.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few minor comments. Thanks @jay-zhuang for the PR.

@@ -42,23 +42,30 @@ class Timer {
running_(false),
executing_task_(false) {}

~Timer() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to know the reason for the removal of destructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an empty non-virtual destructor, I think it would be better to be removed. Or should we?

util/timer.h Outdated
Comment on lines 60 to 67
cond_var_.SignalAll();
} else {
// If it already exists, overriding it.
it->second->fn = std::move(fn_info->fn);
it->second->valid = true;
it->second->next_run_time_us = env_->NowMicros() + start_after_us;
it->second->repeat_every_us = repeat_every_us;
cond_var_.SignalAll();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need only one cond_var_.SignalAll() in the end, right?

Have a global StatsDumpScheduler for all DB instance stats dumping,
including DumpStats() and PersistStats(). Before this, there're 2
dedicate threads for every DB instance, one for DumpStats() one for
PersistStats(), which could create lots of threads if there're hundreds
DB instances.
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang merged this pull request in 69760b4.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.

Pull Request resolved: facebook#7223

Reviewed By: riversand963

Differential Revision: D23056737

Pulled By: jay-zhuang

fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants