-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(storage): fix race condition between pin version and new shared-buffer #3651
fix(storage): fix race condition between pin version and new shared-buffer #3651
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we only have one thread updating the pinned vaersion, can we avoid all the complexity and inefficiency to first acquire read lock and then write lock by:
- Do not check version id in
try_update_pinned_version
, rename it toupdate_pinned_version
. - Let the caller ensure the updated version is newer. It is easier to do so because the caller anyway needs to get the local pinned version id before sending pin_version RPC.
But it is dangerous to depend on other module to keep version id supplied by parameters is incremental. |
@@ -230,10 +230,10 @@ impl LocalVersionManager { | |||
conflict_detector.set_watermark(newly_pinned_version.max_committed_epoch); | |||
} | |||
|
|||
let mut new_version = old_version; | |||
let mut new_version = old_version.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to update the pinned_version
in local version here so we don't need to clone and replace the whole local version.
{
let mut guard = RwLockUpgradableReadGuard::upgrade(old_version);
guard.set_pinned_version(newly_pinned_version);
RwLockWriteGuard::unlock_fair(guard);
}
How about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no different from the code before #3620....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not hope to run set_pinned_version
with a write lock because it will also change BTreeMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The differences are:
- We don't need to acquire a write lock if the version id is not larger.
- We release the write lock before
version_update_notifier_tx.send
. - We ensure fair unlock
Given that the version id check is very light-weight, I don't think we can benefit from 1). I am not sure about 2) either. I suspect the necessity of #3620 but maybe 3) is the key point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not hope to run
set_pinned_version
with a write lock because it will also changeBTreeMap
.
I see your point now. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No....It is not enought. Our goal in #3620 is to reduce the time holding write mutex
Codecov Report
@@ Coverage Diff @@
## main #3651 +/- ##
==========================================
- Coverage 74.40% 74.40% -0.01%
==========================================
Files 781 781
Lines 110788 110788
==========================================
- Hits 82432 82430 -2
- Misses 28356 28358 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
Can we get this merged first so that our release version won't contain race condition? We may solve other issues later, it's okay to have some regression for now. 🤣 |
…uffer (risingwavelabs#3651) * fix Signed-off-by: Little-Wallace <bupt2013211450@gmail.com> * fix warn Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace bupt2013211450@gmail.com
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
close #3639
This bug may cause write lose because the shared buffer was created in BTreeMap of before version but the other thread will replace the whole
LocalVersion
, include BTreeMap of shared-buffer andHummockVersion
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.
Types of user-facing changes
Please keep the types that apply to your changes, and remove those that do not apply.
Release note
Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.
Refer to a related PR or issue link (optional)