Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a timer crash caused by invalid memory management #9656

Closed
wants to merge 6 commits into from

Conversation

jay-zhuang
Copy link
Contributor

@jay-zhuang jay-zhuang commented Mar 3, 2022

Timer crash when multiple DB instances doing heavy DB open and close
operations concurrently. Which is caused by adding a timer task with
smaller timestamp than the current running task. Fix it by moving the
getting new task timestamp part within timer mutex protection.
And other fixes:

  • Disallow adding duplicated function name to timer
  • Fix a minor memory leak in timer when a running task is cancelled

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jay-zhuang jay-zhuang linked an issue Mar 4, 2022 that may be closed by this pull request
Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and thanks @jay-zhuang for the fix! Overall LGTM except a few minor comments/questions.

util/timer.h Outdated
auto it = map_.find(fn_name);
if (it == map_.end()) {
heap_.push(fn_info.get());
map_.emplace(std::make_pair(fn_name, std::move(fn_info)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just do map_.emplace(fn_name, std::move(fn_info)).

const std::string& fn_name,
uint64_t start_after_us,
uint64_t repeat_every_us) {
bool Add(std::function<void()> fn, const std::string& fn_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update the comments in line 51. Specifically, fn_name has to be unique, and we disallow adding function with the same name to timer.

Comment on lines +70 to +75
auto it = map_.find(fn_name);
if (it == map_.end()) {
heap_.push(fn_info.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use try_emplace()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean map_.try_emplace(), updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean something like

auto res = map_.try_emplace(fn_name, std::move(fn_info));
if (!res.second) {
  return false;
}
heap_.push(res.first.second.get());

uint64_t repeat_every_us) {
bool Add(std::function<void()> fn, const std::string& fn_name,
uint64_t start_after_us, uint64_t repeat_every_us) {
InstrumentedMutexLock l(&mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can avoid dynamic memory allocation inside this mutex by calling NowMicros() and setting next run time after mutex acquision.

std::unique_ptr<> fn_info(new FunctionInfo(..., dummy_time,...));
InstrumentedMutexLock l(&mutex_);
now = NowMicros();
fn_info->SetNextRunTime(now + start_after_us);
...

util/timer.h Outdated
Comment on lines 68 to 69
assert(!executing_task_ ||
fn_info->next_run_time_us >= heap_.top()->next_run_time_us);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we should return false here instead of aborting the process if we hit this condition?

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jay-zhuang. LGTM except a minor comment and one of the CIs complaining about unused var.

Comment on lines +70 to +75
auto it = map_.find(fn_name);
if (it == map_.end()) {
heap_.push(fn_info.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean something like

auto res = map_.try_emplace(fn_name, std::move(fn_info));
if (!res.second) {
  return false;
}
heap_.push(res.first.second.get());

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to confirm my understanding: Is the problem we remove the same name twice from map_ and the second time fails? If so, does the duplicate entry come from popping the wrong thing from heap_ due to something added with earlier running time than the executing function? Then we re-add an existing thing to heap_ which would cause two map_.erase()s after it's Cancel()d?

Anyways the new code to keep heap_ and map_ in sync looks great!

@jay-zhuang
Copy link
Contributor Author

Wanted to confirm my understanding: Is the problem we remove the same name twice from map_ and the second time fails?

Yes, but it fails before we try to remove the item the second time as it's memory already be released including the string current_fn->name. So it failed to get the fn->name. not fail to map_.erase().

If so, does the duplicate entry come from popping the wrong thing from heap_ due to something added with earlier running time than the executing function? Then we re-add an existing thing to heap_ which would cause two map_.erase()s after it's Cancel()d?

Yes, the duplicated entry come from popping the wrong item (it suppose to be popping the current running one then update the next_run_time and re-add it back. but in this case, it's popping the new added one as it's timestamp is newer than the current running one, so after that the current running one is duplicated in the queue).

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Timer crash when multiple DB instances doing heavy DB open and close
operations concurrently. Which is caused by adding a timer task with
smaller timestamp than the current running task. Fix it by moving the
getting new task timestamp part within timer mutex protection.
And other fixes:
- Disallow adding duplicated function number to timer
- Fix a memory leak in timer when current running task is cancelled
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jay-zhuang added a commit to jay-zhuang/rocksdb that referenced this pull request Mar 13, 2022
ajkr pushed a commit that referenced this pull request Mar 13, 2022
Summary:
Timer crash when multiple DB instances doing heavy DB open and close
operations concurrently. Which is caused by adding a timer task with
smaller timestamp than the current running task. Fix it by moving the
getting new task timestamp part within timer mutex protection.
And other fixes:
- Disallow adding duplicated function name to timer
- Fix a minor memory leak in timer when a running task is cancelled

Pull Request resolved: #9656

Reviewed By: ajkr

Differential Revision: D34626296

Pulled By: jay-zhuang

fbshipit-source-id: 6b6d96a5149746bf503546244912a9e41a0c5f6b
@ajkr ajkr mentioned this pull request Mar 29, 2022
@@ -372,6 +372,7 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
const int kLevel0 = 0;
score_ = vstorage_->CompactionScore(kLevel0);
sorted_runs_ = CalculateSortedRuns(*vstorage_);
fprintf(stdout, "JJJ1\n");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete test code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rocks DB crash when used via JNI. Version: 6.20.3
5 participants