Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple master candidates #352

Merged
merged 6 commits into from
Aug 24, 2018
Merged

support multiple master candidates #352

merged 6 commits into from
Aug 24, 2018

Conversation

huahuiyang
Copy link
Contributor

@huahuiyang huahuiyang commented Jun 20, 2018

In this pr, if the mesos framework specify a candidate selector(could be dynamic func), the framework will gain the ability to failover across multiple mesos masters. otherwise, it will only rely on the url as an initial configuration.

@coveralls
Copy link

coveralls commented Jun 20, 2018

Coverage Status

Coverage decreased (-0.3%) to 57.547% when pulling 12603e0 on huahuiyang:master into 5a67a24 on mesos:master.

@huahuiyang
Copy link
Contributor Author

huahuiyang commented Jun 20, 2018

this is to support multiple master api urls when start the framework, support multiple mesos api urls failover in attempts. And does not break current c.url logic.

@huahuiyang
Copy link
Contributor Author

related #339

@huahuiyang
Copy link
Contributor Author

@jdef would you like to take a look when you get a chance?

@huahuiyang
Copy link
Contributor Author

@tsenart @vladimirvivien @jdef any ideas? does this repo still in maintaining? seems it has been a long time after this pr was created.

@jdef
Copy link
Contributor

jdef commented Jul 18, 2018

Thanks for the PR! This repo is maintained, and I'm basically the only maintainer at this point .. so sometimes it takes me while to get around to reviewing PRs. Thanks for being patient.

It looks like the use case you're trying to address is this:

  1. A cluster has multiple master candidates
  2. The default HTTP client is expected to always hit some "primary" master candidate (the first one in the list); if this master is down then calls will fail.
  3. The scheduler HTTP client will round-robin across all candidates if an API call results in a non-redirection error; for redirects, continue to use the candidate suggested by the result returned by Mesos.

Does that sound about right?

These candidates are more like "bootstrap" endpoints, right? Because in a cloudy environment where servers come and go, it's possible that the initial nodes might cycle out (unless you were using floating IPs that remaining the same across recycled instances).

I'm interested in the specifics if your use case. Please elaborate in the PR description.

@huahuiyang
Copy link
Contributor Author

huahuiyang commented Jul 23, 2018

@jdef Thanks for looking at this pr, and the items you listed in the comment above are pretty much correct!
Setup a load balancer in front of mesos masters, providing a non-changed endpoint is a solution, but having some drawbacks:

  1. a load balancer is a must have component in that scenario (dns based service discovery takes time to sync, so we assume people using nginx/haproxy/lvs etc. as a load balancer), developer would be frustrated when there is not load balancer in their organization before this pr.
  2. setup a load balancer to proxy active master, introduce an extra hop which might be better to mitigate if we care performance a lot.

In our case, we choose to use mesos masters raw ip/dns directly (akka. candidates in this pr), we are aware of mesos master addresses could be changed totally, but in a high availability, fault tolerant mesos scheduler, the active/standby schedulers could easily rolling restart to load the latest mesos master candidates in case. So we propose another candidates option in this pr to initiate the mesos framework, which is not breaking origin single url register way, to let developer make the choice according to their use case.

@jdef
Copy link
Contributor

jdef commented Jul 24, 2018

OK. Let me then suggest the following changes to this PR:

  1. don't change the httpcli package at all since what you really want are changes to the httpsched client; create an Option func in httpsched handles candidate selection
  2. loosen up the candidate specification: instead of some encoded string, maybe a candiate selector option is some kind of a func() string. then you could define a "static candidate selector" that cycles over some fixed slice of candidate strings. but a user could invent a new implementation that was more dynamic and not limited to an initial, fixed set of strings. let me know if you need/want me to elaborate on this idea more.

@jdef
Copy link
Contributor

jdef commented Jul 24, 2018

e.g.

type CandidateSelector func() string

func FixedCandidateSelector(s []string) CandidateSelector { ... } // cycles over s

@huahuiyang
Copy link
Contributor Author

huahuiyang commented Jul 25, 2018

@jdef fair enough, i've changed the pr according to your comments.
In this pr, if the mesos framework specify a candidate selector(could be dynamic func), the framework will gain the ability to failover across multiple mesos masters. otherwise, it will only rely on the url as an initial configuration.

an example might be as follows:

masters := "http://master1:5050/api/v1/scheduler," +
	"http://master2:5050/api/v1/scheduler," +
	"http://master3:5050/api/v1/scheduler"
candidateIndex := 0
candidatesRoundRobinSelector := func() string {
	if len(strings.Split(masters, ",")) == 0 {
		return ""
	}
	if candidateIndex >= len(strings.Split(cfg.Masters, ",")) {
		candidateIndex = 0
	}
	res := strings.Split(cfg.Masters, ",")[candidateIndex]
	candidateIndex++
	return res
}

httpsched.NewCaller(cli,
	httpsched.AllowReconnection(true),
	httpsched.MasterCandidates(candidatesRoundRobinSelector))

Copy link
Contributor

@jdef jdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, left a few more comments.

var candidate string
if !ok {
if cli.candidateSelector == nil {
log.Printf("not found candidate selector, using url when initilize framework")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap calls to log.XXX with an if debug check

candidate = cli.candidateSelector()
}
if candidate == "" {
log.Printf("not found candidate url, return directly")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return
}
} else {
candidate = redirectErr.newURL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this the default value of candidate when it's initially declared and get rid of this else statement

Copy link
Contributor Author

@huahuiyang huahuiyang Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get newURL from redirectErr object is only safe when the ok == null. (otherwise there will be some nil point error in case)
this else block seems is necessary at here

@@ -74,6 +75,8 @@ type (
requestOpts []httpcli.RequestOpt // requestOpts are temporary per-request options
opt httpcli.Opt // opt is a temporary client option
}

CandidateSelector func() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please add at least a one-line doc string for this. something like:

// CandidateSelector returns the next endpoint to try if there are errors reaching the mesos master, or else an empty string if there are no such candidates.

@@ -134,6 +137,14 @@ func AllowReconnection(v bool) Option {
}
}

func MasterCandidates(cs CandidateSelector) Option {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename to EndpointCandidates for clarity (since it's actually selecting the next endpoint for the http client)

Copy link
Contributor

@jdef jdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

My last wish is that there would be a unit test for this logic, which would probably be easiest if most of the new code was moved to a helper func that produced the next endpoint to use (because the unit test could focus on testing the complicated logic that's being introduced by this PR).

If needed, the unit test could be added in a follow-up PR.

@huahuiyang
Copy link
Contributor Author

great, after upstream this change, our prod system do not need to maintain a forked mesos-go repo for multiple candidates purpose in our organization. i will think of the unit test part in another pr.

@jdef
Copy link
Contributor

jdef commented Jul 27, 2018

@huahuiyang please rebase so that I can merge this

@huahuiyang
Copy link
Contributor Author

@jdef rebased.

@huahuiyang
Copy link
Contributor Author

ping @jdef

@huahuiyang
Copy link
Contributor Author

@jdef any chance for you to take a look and get this pr merged? thanks

@jdef
Copy link
Contributor

jdef commented Aug 24, 2018

LGTM, thanks

@jdef jdef merged commit 1558d48 into mesos:master Aug 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants