Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicate cacheline for DB mutex #9637

Closed
wants to merge 4 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Feb 26, 2022

We found a case of cacheline bouncing due to writers locking/unlocking mutex_ and readers accessing block_cache_tracer_. We discovered it only after the issue was fixed by #9462 shifting the DBImpl members such that mutex_ and block_cache_tracer_ were naturally placed in separate cachelines in our regression testing setup. This PR forces the cacheline alignment of mutex_ so we don't accidentally reintroduce the problem.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@ajkr ajkr force-pushed the db-mutex-dedicated-cachelines branch from 44cb1fe to 8538f8b Compare February 26, 2022 20:32
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

//
// `mutex_` can be a hot lock in some workloads, so it deserves dedicated
// cachelines.
ALIGN_AS(CACHE_LINE_SIZE) mutable InstrumentedMutex mutex_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can we make the following default_cf_handle_ also aligned on cache line size.
Both the approach in the PR and the above proposal require that the members are in the same access block, e.g. private or public or protected. I hope we can enforce this and not have to worry about (accidental and unlikely) following change:

ALIGN_AS(CACHE_LINE_SIZE) mutable InstrumentedMutex mutex_;
private:
 char padding[(CACHE_LINE_SIZE -
                (sizeof(InstrumentedMutex) % CACHE_LINE_SIZE)) %
               CACHE_LINE_SIZE] ROCKSDB_FIELD_UNUSED;

How about wrap the mutex_ and padding in another struct? This may incur more changes, so I would not strongly advocate it.

Copy link
Contributor Author

@ajkr ajkr Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check if it's possible to make a CacheAlignedInstrumentedMutex structure without too much changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. C++17 made this much easier. Thanks for the review!

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

ajkr added a commit that referenced this pull request Mar 3, 2022
Summary:
We found a case of cacheline bouncing due to writers locking/unlocking `mutex_` and readers accessing `block_cache_tracer_`. We discovered it only after the issue was fixed by #9462 shifting the `DBImpl` members such that `mutex_` and `block_cache_tracer_` were naturally placed in separate cachelines in our regression testing setup. This PR forces the cacheline alignment of `mutex_` so we don't accidentally reintroduce the problem.

Pull Request resolved: #9637

Reviewed By: riversand963

Differential Revision: D34502233

Pulled By: ajkr

fbshipit-source-id: 46aa313b7fe83e80c3de254e332b6fb242434c07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants