-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out election Runner #1177
Conversation
This would be a lot easier to review if it didn't mix together factoring out |
Codecov Report
@@ Coverage Diff @@
## master #1177 +/- ##
==========================================
+ Coverage 58.34% 58.52% +0.18%
==========================================
Files 109 110 +1
Lines 9424 9415 -9
==========================================
+ Hits 5498 5510 +12
+ Misses 3363 3344 -19
+ Partials 563 561 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the new TTL
behavior, and tried to make the resigning delay behavior somewhat compatible with the old one. Do you think there are other behavioral changes I should remove from the PR?
MasterCheckInterval: *masterCheckInterval, | ||
}, | ||
MasterHoldInterval: *masterHoldInterval, | ||
// TODO(pavelkalinnikov): ResignOdds initially set a geometric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the old flag name for compatibility reasons. Could change it in this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd leave it for now and we can do smaller PRs to migrate it.
glog.Errorf("%d: failed to resign mastership: %v", r.er.logID, err) | ||
// getResignDelay computes master resignation delay based on operation | ||
// parameters. | ||
func getResignDelay(info *LogOperationInfo) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not exactly equivalent to the previous ResignOdds
behavior, although could be simpler?
The old one was a geometric distribution: the probability of resigning on k
-th master check iteration was (1-p)^(k-1) * p
, where p=1/ResignOdds
. Expected number of iterations was ResignOdds
. So the code here could look like:
p := 1.0 / float64(info.ResignOdds)
x := rand.Float64()
checks := int64(math.Log2(1.0-x) / math.Log2(1.0-p))
return delay + time.Duration(checks)*info.ElectionConfig.MasterCheckInterval
The new one is just uniform with expected number of extra waiting iterations equal to ResignSpread/2
(could make it ResignSpread
to be even more compatible with the old behaviour).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the exact behaviour is important at this level. We want to be able to transfer mastership to allow things to balance out across multiple tasks but not too often and (probabilistically likely) not for all the logs at the same time.
server/log_operation_manager.go
Outdated
delay := getResignDelay(&l.info) | ||
// Note: The loop is to allow blocking by a mocked TimeSource. | ||
for until := start.Add(delay); l.info.TimeSource.Now().Before(until); { | ||
if !util.Sleep(run.Done, delay) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be util.SleepContext(run.Ctx, delay) != nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
util/election/runner.go
Outdated
glog.Warningf("%sIsMaster: false", r.prefix) | ||
return | ||
} else if !isMaster { | ||
glog.Warningf("%sIsMaster: %v", r.prefix, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be swapped with line 130.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
To give some more context on why I'm doing this, here is how I'm going to use |
glog.Errorf("%d: failed to resign mastership: %v", r.er.logID, err) | ||
// getResignDelay computes master resignation delay based on operation | ||
// parameters. | ||
func getResignDelay(info *LogOperationInfo) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the exact behaviour is important at this level. We want to be able to transfer mastership to allow things to balance out across multiple tasks but not too often and (probabilistically likely) not for all the logs at the same time.
glog.Errorf("%d: failed to resign mastership: %v", r.er.logID, err) | ||
// getResignDelay computes master resignation delay based on operation | ||
// parameters. | ||
func getResignDelay(info *LogOperationInfo) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this to something not starting with 'get', just resignDelay
for example.
func fixupElectionInfo(info LogOperationInfo) LogOperationInfo { | ||
if info.PreElectionPause < minPreElectionPause { | ||
info.PreElectionPause = minPreElectionPause | ||
cfg := &info.ElectionConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the other two parameters (hold interval and spread) were moved into ElectionConfig
then this could just be passed an election config.
I haven't fully examined the impact of this change though.
runner := election.NewRunner(&l.info.ElectionConfig, el, l.info.TimeSource, label+": ") | ||
l.electionRunner[logID] = runner | ||
|
||
// beMaster := func(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be here? Duplicates one in runner.go. If intended as documentation it should maybe be moved outside to the function doc.
MasterCheckInterval: *masterCheckInterval, | ||
}, | ||
MasterHoldInterval: *masterHoldInterval, | ||
// TODO(pavelkalinnikov): ResignOdds initially set a geometric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd leave it for now and we can do smaller PRs to migrate it.
return r.me.Resign(ctx) | ||
} | ||
|
||
// Close permanently stops this Runner from participating in election. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should probably be "elections".
// expectNotClosed checks that done is not closed within some timeout. | ||
func expectNotClosed(t *testing.T, done <-chan struct{}, msg string) { | ||
t.Helper() | ||
if !util.Sleep(done, decentDur) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of construct can be flaky but maybe can't be avoided here?
Oh, don't look at this. I was just backing up my local branches :) |
OK, didn't look too closely and reviewed it. |
There's a "convert to draft" feature now! |
Distilled version of #1144.