-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a global StatsDumpScheduler for stats dumping #7223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome to see this progress!
Just two random thoughts:
- we are assuming all DBs' status dumping is done by one thread. It might work but in case one dumping thread is stalling (e.g. DB is stalling), then all DBs are blocked. Even this is much better than what we have today, so I have no problem with going with this. It is worth considering that these threads are executed in thread pools instead. But it won't block this diff.
- we would need timer for periodic compaction and TTL based compaction. It's worth considering what the code should look like after we have multiple of the tasks. Of course, we don't have to do over engineering in this first timer use case. Just some random thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jay-zhuang for working on this task. Left a few comments.
util/timer.h
Outdated
@@ -119,11 +138,13 @@ class Timer { | |||
continue; | |||
} | |||
|
|||
FunctionInfo* current_fn = heap_.top(); | |||
auto current_fn = heap_.top(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to use auto
if the type can be easily told, which is not the case here. Maybe this is a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I feel it's better to explicitly specify the type by saying FunctionInfo* current_fn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'm wondering why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We follow https://google.github.io/styleguide/cppguide.html#Type_deduction. To sum up, use type deduction only when it is clearer or safer, which is not the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jay-zhuang for the PR. Took a pass and left some comments.
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two more comments, more to come. Can you also rebase? @jay-zhuang
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with a few more comments. We are close to getting this done.
monitoring/stats_dump_scheduler.cc
Outdated
#ifndef NDEBUG | ||
bool StatsDumpScheduler::TEST_SetEnv(Env* env) { | ||
MutexLock l(&test_mutex_); | ||
auto scheduler = Default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a little weird to have a non-static setter method in which we get a pointer to a singleton and set the value for the singleton object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored it a little bit to create a StatsDumpTestScheduler
, which takes Env for initializing new scheduler object, and contains all the test functions. It will be override with the Callback you mentioned below.
Close(); | ||
} | ||
|
||
TEST_F(StatsDumpSchedulerTest, MultiInstancesTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/MultiInstancesTest/MultiInstances/
to save some characters for the test name. This can apply to other tests.
|
||
std::string dbname = test::PerThreadDBPath("multi_env_test"); | ||
DB* db; | ||
DB::Open(options2, dbname, &db); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should check for the return value of Open
by doing ASSERT_OK
.
util/timer.h
Outdated
@@ -119,11 +138,13 @@ class Timer { | |||
continue; | |||
} | |||
|
|||
FunctionInfo* current_fn = heap_.top(); | |||
auto current_fn = heap_.top(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I feel it's better to explicitly specify the type by saying FunctionInfo* current_fn
.
db/db_impl/db_impl.cc
Outdated
} | ||
stats_dump_scheduler_ = StatsDumpScheduler::Default(); | ||
#ifndef NDEBUG | ||
stats_dump_scheduler_->TEST_SetEnv(env_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is about the difference in behavior for production and test code. Env::Default()
is used in production, while env_
is used in tests. I am thinking about the following.
Add a sync point in this function.
TEST_SYNC_POINT_CALLBACK("DBImpl::StartStatsDumpScheduler:SetEnv", stats_dump_scheduler_);
In your unit test,
SyncPoint::GetInstance()->SetCallBack("DBImpl::StartStatsDumpScheduler:SetEnv", [&](void* arg) {
auto* stats_dump_scheduler = reinterpret_cast<StatsDumpScheduler*>(arg);
ASSERT_NE(nullptr, stats_dump_scheduler);
stats_dump_scheduler->TEST_SetEnv(env_);
});
This way, most of the unit tests will have the same behavior as production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be better.
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few minor comments. Thanks @jay-zhuang for the PR.
@@ -42,23 +42,30 @@ class Timer { | |||
running_(false), | |||
executing_task_(false) {} | |||
|
|||
~Timer() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to know the reason for the removal of destructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an empty non-virtual destructor, I think it would be better to be removed. Or should we?
util/timer.h
Outdated
cond_var_.SignalAll(); | ||
} else { | ||
// If it already exists, overriding it. | ||
it->second->fn = std::move(fn_info->fn); | ||
it->second->valid = true; | ||
it->second->next_run_time_us = env_->NowMicros() + start_after_us; | ||
it->second->repeat_every_us = repeat_every_us; | ||
cond_var_.SignalAll(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need only one cond_var_.SignalAll()
in the end, right?
Have a global StatsDumpScheduler for all DB instance stats dumping, including DumpStats() and PersistStats(). Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@jay-zhuang merged this pull request in 69760b4. |
Summary: Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances. Pull Request resolved: facebook#7223 Reviewed By: riversand963 Differential Revision: D23056737 Pulled By: jay-zhuang fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba
Have a global StatsDumpScheduler for all DB instance stats dumping, including
DumpStats()
andPersistStats()
. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances.