Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out election Runner #1177

Closed
wants to merge 6 commits into from
Closed

Conversation

pav-kv
Copy link
Contributor

@pav-kv pav-kv commented May 31, 2018

Distilled version of #1144.

@daviddrysdale
Copy link
Contributor

This would be a lot easier to review if it didn't mix together factoring out electionRunner with a bunch of behavioural changes -- is it possible to split things up?

@codecov-io
Copy link

codecov-io commented May 31, 2018

Codecov Report

Merging #1177 into master will increase coverage by 0.18%.
The diff coverage is 88.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1177      +/-   ##
==========================================
+ Coverage   58.34%   58.52%   +0.18%     
==========================================
  Files         109      110       +1     
  Lines        9424     9415       -9     
==========================================
+ Hits         5498     5510      +12     
+ Misses       3363     3344      -19     
+ Partials      563      561       -2
Impacted Files Coverage Δ
server/log_operation_manager.go 76.16% <86.79%> (+5.46%) ⬆️
util/election/runner.go 90.9% <90.9%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b736197...f53a5ab. Read the comment docs.

Copy link
Contributor Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the new TTL behavior, and tried to make the resigning delay behavior somewhat compatible with the old one. Do you think there are other behavioral changes I should remove from the PR?

MasterCheckInterval: *masterCheckInterval,
},
MasterHoldInterval: *masterHoldInterval,
// TODO(pavelkalinnikov): ResignOdds initially set a geometric
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the old flag name for compatibility reasons. Could change it in this PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd leave it for now and we can do smaller PRs to migrate it.

glog.Errorf("%d: failed to resign mastership: %v", r.er.logID, err)
// getResignDelay computes master resignation delay based on operation
// parameters.
func getResignDelay(info *LogOperationInfo) time.Duration {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exactly equivalent to the previous ResignOdds behavior, although could be simpler?

The old one was a geometric distribution: the probability of resigning on k-th master check iteration was (1-p)^(k-1) * p, where p=1/ResignOdds. Expected number of iterations was ResignOdds. So the code here could look like:

	p := 1.0 / float64(info.ResignOdds)
	x := rand.Float64()
	checks := int64(math.Log2(1.0-x) / math.Log2(1.0-p))
	return delay + time.Duration(checks)*info.ElectionConfig.MasterCheckInterval

The new one is just uniform with expected number of extra waiting iterations equal to ResignSpread/2 (could make it ResignSpread to be even more compatible with the old behaviour).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the exact behaviour is important at this level. We want to be able to transfer mastership to allow things to balance out across multiple tasks but not too often and (probabilistically likely) not for all the logs at the same time.

delay := getResignDelay(&l.info)
// Note: The loop is to allow blocking by a mocked TimeSource.
for until := start.Add(delay); l.info.TimeSource.Now().Before(until); {
if !util.Sleep(run.Done, delay) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be util.SleepContext(run.Ctx, delay) != nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

glog.Warningf("%sIsMaster: false", r.prefix)
return
} else if !isMaster {
glog.Warningf("%sIsMaster: %v", r.prefix, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be swapped with line 130.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pav-kv
Copy link
Contributor Author

pav-kv commented Jun 1, 2018

To give some more context on why I'm doing this, here is how I'm going to use Runner in CT's Migrillian: ct-go#264, see RunWithElection method.

@daviddrysdale daviddrysdale removed their request for review May 30, 2019 09:03
@pav-kv pav-kv requested a review from a team as a code owner November 2, 2020 13:37
glog.Errorf("%d: failed to resign mastership: %v", r.er.logID, err)
// getResignDelay computes master resignation delay based on operation
// parameters.
func getResignDelay(info *LogOperationInfo) time.Duration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the exact behaviour is important at this level. We want to be able to transfer mastership to allow things to balance out across multiple tasks but not too often and (probabilistically likely) not for all the logs at the same time.

glog.Errorf("%d: failed to resign mastership: %v", r.er.logID, err)
// getResignDelay computes master resignation delay based on operation
// parameters.
func getResignDelay(info *LogOperationInfo) time.Duration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to something not starting with 'get', just resignDelay for example.

func fixupElectionInfo(info LogOperationInfo) LogOperationInfo {
if info.PreElectionPause < minPreElectionPause {
info.PreElectionPause = minPreElectionPause
cfg := &info.ElectionConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the other two parameters (hold interval and spread) were moved into ElectionConfig then this could just be passed an election config.

I haven't fully examined the impact of this change though.

runner := election.NewRunner(&l.info.ElectionConfig, el, l.info.TimeSource, label+": ")
l.electionRunner[logID] = runner

// beMaster := func(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be here? Duplicates one in runner.go. If intended as documentation it should maybe be moved outside to the function doc.

MasterCheckInterval: *masterCheckInterval,
},
MasterHoldInterval: *masterHoldInterval,
// TODO(pavelkalinnikov): ResignOdds initially set a geometric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd leave it for now and we can do smaller PRs to migrate it.

return r.me.Resign(ctx)
}

// Close permanently stops this Runner from participating in election.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should probably be "elections".

// expectNotClosed checks that done is not closed within some timeout.
func expectNotClosed(t *testing.T, done <-chan struct{}, msg string) {
t.Helper()
if !util.Sleep(done, decentDur) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of construct can be flaky but maybe can't be avoided here?

@pav-kv
Copy link
Contributor Author

pav-kv commented Nov 2, 2020

Oh, don't look at this. I was just backing up my local branches :)

@Martin2112
Copy link
Contributor

OK, didn't look too closely and reviewed it.

@pphaneuf
Copy link
Contributor

There's a "convert to draft" feature now!

@pav-kv pav-kv marked this pull request as draft January 15, 2021 11:28
@pav-kv pav-kv closed this Jan 22, 2021
@pav-kv pav-kv deleted the election_runner branch January 22, 2021 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants