Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler improvements and API tweaks #286

Merged
merged 15 commits into from
Apr 24, 2017
Merged

Conversation

jdef
Copy link
Contributor

@jdef jdef commented Apr 16, 2017

  • httpsched transitions to disconnected upon errors generated by the Decoder
  • demo scheduler properly handles Mesos master failover
  • httpcli: deprecate ProtocolError and Err variables in favor of APIError
  • minor doc tweaks

@jdef jdef added the WIP label Apr 16, 2017
@coveralls
Copy link

coveralls commented Apr 16, 2017

Coverage Status

Coverage increased (+0.01%) to 36.489% when pulling 70084cf on jdef_scheduler_improvements into 43f0fa5 on master.

@coveralls
Copy link

coveralls commented Apr 16, 2017

Coverage Status

Coverage increased (+0.02%) to 36.496% when pulling 299e0ef on jdef_scheduler_improvements into 43f0fa5 on master.

@coveralls
Copy link

coveralls commented Apr 16, 2017

Coverage Status

Coverage decreased (-0.03%) to 36.442% when pulling e98a3fb on jdef_scheduler_improvements into 43f0fa5 on master.

frameworkIDStore = NewInMemoryIDStore()
controlContext = &controller.ContextAdapter{
DoneFunc: func() bool { return state.done },
// not called concurrently w/ subscription events, don't worry about using the atomic
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad comment

@@ -131,13 +131,19 @@ func (ca *ContextAdapter) Error(err error) {
}
}

type SchedulerError string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty bad name. we can do better

frameworkID := e.GetSubscribed().GetFrameworkID().GetValue()
if state.frameworkID == "" || state.frameworkID != frameworkID {
if state.frameworkID != "" && state.frameworkID != frameworkID && state.config.checkpoint {
state.done = true // TODO(jdef) not goroutine safe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these are really panic worthy conditions instead

@coveralls
Copy link

coveralls commented Apr 16, 2017

Coverage Status

Coverage increased (+0.04%) to 36.514% when pulling 7c4c8c7 on jdef_scheduler_improvements into 43f0fa5 on master.

@@ -85,6 +101,19 @@ func FrameworkCaller(frameworkID string) Decorator {
}
}

// DynamicFrameworkCaller generates and returns a Decorator that applies the given frameworkID to all calls (except SUBSCRIBE).
func DynamicFrameworkCaller(frameworkID func() string) Decorator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still dislike this name. maybe SubscribedCaller instead?

@@ -85,6 +101,19 @@ func FrameworkCaller(frameworkID string) Decorator {
}
}

// DynamicFrameworkCaller generates and returns a Decorator that applies the given frameworkID to all calls (except SUBSCRIBE).
func DynamicFrameworkCaller(frameworkID func() string) Decorator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still dislike this name. maybe SubscribedCaller instead?

@coveralls
Copy link

coveralls commented Apr 17, 2017

Coverage Status

Coverage decreased (-0.002%) to 36.475% when pulling 60818df on jdef_scheduler_improvements into 43f0fa5 on master.

state.frameworkID = e.GetSubscribed().GetFrameworkID().GetValue()
frameworkID := e.GetSubscribed().GetFrameworkID().GetValue()
if state.frameworkID == "" || state.frameworkID != frameworkID {
if state.frameworkID != "" && state.frameworkID != frameworkID && state.config.checkpoint {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required to be nested, pull outside of containing if

@@ -84,6 +88,7 @@ func NewConfig() Config {
url: env("MESOS_MASTER_HTTP", "http://:5050/api/v1/scheduler"),
codec: codec{Codec: &encoding.ProtobufCodec},
timeout: envDuration("MESOS_CONNECT_TIMEOUT", "20s"),
failoverTimeout: envDuration("FRAMEWORK_FAILOVER_TIMEOUT", "1000h"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCHEDULER_FAILOVER_TIMEOUT

@@ -78,7 +76,9 @@ func (_ *controllerImpl) Run(config Config) (lastErr error) {
for !config.Context.Done() {
frameworkID := config.Context.FrameworkID()
if config.Framework.GetFailoverTimeout() > 0 && frameworkID != "" {
subscribe.Subscribe.FrameworkInfo.ID = &mesos.FrameworkID{Value: frameworkID}
frameworkProto := &mesos.FrameworkID{Value: frameworkID}
subscribe.Subscribe.FrameworkInfo.ID = frameworkProto
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to consolidate these lines and say something like subscribe.WithFrameworkID(frameworkID)

@coveralls
Copy link

coveralls commented Apr 17, 2017

Coverage Status

Coverage decreased (-0.07%) to 36.405% when pulling a8ba6d2 on jdef_scheduler_improvements into 43f0fa5 on master.

})
)

// ProtocolError is a generic error type returned for expected status codes
// received from Mesos.
// Deprecated: no longer used in favor of APIError.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to just break the API now and remove this, otherwise downstream consumers will wonder why they're not seeing ProtocolError's anymore. Better to break the build and force change.

Maybe tackle this in a follow-up PR? This one is getting a bit long already.

@coveralls
Copy link

coveralls commented Apr 17, 2017

Coverage Status

Coverage increased (+0.04%) to 36.521% when pulling 3db5f2c on jdef_scheduler_improvements into 43f0fa5 on master.

@coveralls
Copy link

coveralls commented Apr 17, 2017

Coverage Status

Coverage increased (+0.01%) to 36.487% when pulling 17a0ad3 on jdef_scheduler_improvements into 43f0fa5 on master.

@jdef
Copy link
Contributor Author

jdef commented Apr 17, 2017

upon the scheduler receiving an ERROR event, it should probably re-try subscription. unless Mesos terminates the current subscription connection after sending an ERROR (which is not in the contract) then we're still in trouble because the httpsched state doesn't know anything about these ERROR messages so it will stay "connected" (and reject SUBSCRIBE attempts)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 36.195% when pulling a4cc120 on jdef_scheduler_improvements into 43f0fa5 on master.

@coveralls
Copy link

coveralls commented Apr 17, 2017

Coverage Status

Coverage increased (+0.03%) to 36.504% when pulling a4cc120 on jdef_scheduler_improvements into 43f0fa5 on master.

@coveralls
Copy link

coveralls commented Apr 17, 2017

Coverage Status

Coverage increased (+0.04%) to 36.519% when pulling 99f3370 on jdef_scheduler_improvements into 43f0fa5 on master.

@jdef
Copy link
Contributor Author

jdef commented Apr 17, 2017

This PR is massive enough and I'm mostly happy (enough) with it. Will keep it open for a few days for comments and merge if there are no complaints.

@jdef jdef added PTAL and removed WIP labels Apr 17, 2017
@jdef
Copy link
Contributor Author

jdef commented Apr 17, 2017

ideally there would be tests shipping with these changes

@jdef
Copy link
Contributor Author

jdef commented Apr 24, 2017

fixes #277

@jdef jdef force-pushed the jdef_scheduler_improvements branch from 99f3370 to 7fd41ca Compare April 24, 2017 18:24
@coveralls
Copy link

coveralls commented Apr 24, 2017

Coverage Status

Coverage decreased (-0.01%) to 36.446% when pulling 7fd41ca on jdef_scheduler_improvements into c00d0c1 on master.

@jdef
Copy link
Contributor Author

jdef commented Apr 24, 2017

filed ticket to track tech debt for better unit tests in httpcli and httpsched: #291

@jdef jdef merged commit d09f663 into master Apr 24, 2017
@jdef jdef deleted the jdef_scheduler_improvements branch April 24, 2017 18:37
@jdef jdef removed the PTAL label Apr 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants