-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heal: Persist MRF queue in the disk during shutdown #19410
base: master
Are you sure you want to change the base?
Conversation
f65e284
to
569ff87
Compare
569ff87
to
d71a6b0
Compare
adb9bef
to
06f9b16
Compare
Looks reordering caused some issue in healing tests PTAL @vadmeste |
@vadmeste please rebase this PR with master. |
@vadmeste PTAL when you get a chance. |
69b5a57
to
034b199
Compare
b9aee10
to
a14ecc0
Compare
a14ecc0
to
a916975
Compare
a916975
to
15c4029
Compare
func (m *mrfState) shutdown() { | ||
atomic.StoreInt32(&m.closing, 1) | ||
|
||
m.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait group approach to synchronize between the multiple senders, i.e addPartialOp
and the shutdown
method won't work since all calls to wg.Add
must happen before we call wg.Wait
. This isn't guaranteed.
Note that calls with a positive delta that occur when the counter is zero must happen before a Wait.
ref: https://pkg.go.dev/sync#WaitGroup.Add
I understand that closing m.opCh
simplifies processing of outstanding PartialOp
values onto local drives. But closing a channel which has multiple senders is error prone and complicates the code.
As an alternate approach, we could use a done/quit channel, which will be closed at the beginning of the shutdown
method. addPartialOp
calls will select on this channel and return immediately if they found it closed.
This way we stop more tasks from being sent on m.opCh
. Now, we could drain at most len(m.opCh)
entries from m.opCh
. Of course this is only a point in time value of the number of outstanding PartialOp
values and there could be more sent after we signalled on the quit channel (see above). The downside is that we may drop the 'last' few values. Overall, the implementation will be easier to reason with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait group approach to synchronize between the multiple senders, i.e addPartialOp and the shutdown method won't work since all calls to wg.Add must happen before we call wg.Wait. This isn't guaranteed.
I do not need a guarantee here. I just wanted to avoid sending new entries to m.opCh during shutdown because m.opCh will be closed by then
m.wg.Add(1) | ||
defer m.wg.Done() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check if it is shutdown prior to adding to wg as well? Otherwise m.wg.Wait() will need to wait unnecessarily on any ops being queued
Community Contribution License
All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.
Description
Motivation and Context
How to test this PR?
Types of changes
Checklist:
commit-id
orPR #
here)