[presence] Always sync before responding to requests. #6629

Merged
merged 2 commits into from Nov 29, 2016

Conversation

Projects
None yet
3 participants
Owner

howbazaar commented Nov 29, 2016

When initiating a migration, sometimes the precheck would report an agent as down when status would show it as up. The cause of this was a timing issue in the state workers. It was possible for a request for agent alive to be processed by the presence worker before it had synced its state with the database, which would cause it to report the agent as down.

This change moves the initial call to sync with the database outside of the infinite for loop in the loop function. It appears that the intention was always to start with an initial sync as the next sync time was time.After(0). However since it was in a select, sometimes if a request came in very quickly, the request would be handled first.

A drive-by fix for a state leak in the migration pre-check code. The state returned for checking the controller machines was never closed.

Owner

howbazaar commented Nov 29, 2016

QA

I patched the initiate migration function so it didn't actually start the migration, just executed the prechecks, and would then run the migrate command repeatedly in a bash loop.

Without this change, the command would error ~5 - 10% of the time. With this change I was unable to get it to fail after ~50 executions.

mjs approved these changes Nov 29, 2016

I'm so glad you managed to track this down. This fix will likely tighten up other presence related weirdness that we sometimes see.

@@ -106,6 +113,7 @@ func SourcePrecheck(backend PrecheckBackend) error {
if err != nil {
return errors.Trace(err)
}
+ defer controllerBackend.Close()
@mjs

mjs Nov 29, 2016

Contributor

Please also mention this drive-by fix in the PR description

@@ -313,7 +320,7 @@ func (w *Watcher) flush() {
// handle deals with requests delivered by the public API
// onto the background watcher goroutine.
func (w *Watcher) handle(req interface{}) {
- logger.Tracef("got request: %#v", req)
+ logger.Tracef("[%s] got request: %#v for model", w.modelUUID[:6], req)
@mjs

mjs Nov 29, 2016

Contributor

how about adding a field with the model UUID prefix instead of slicing at each call site? Better yet, what about a logging method which logs at trace and automatically adds the model UUID prefix?

@howbazaar

howbazaar Nov 29, 2016

Owner

Perhaps not just now given that we want to delete all this code.

Owner

howbazaar commented Nov 29, 2016

$$merge$$

Contributor

jujubot commented Nov 29, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 72ef6c9 into juju:develop Nov 29, 2016

1 check failed

github-check-merge-juju Built PR, ran unit tests, and tested LXD deploy. Use !!.*!! to request another build. IE, !!build!!, !!retry!!
Details

@howbazaar howbazaar deleted the howbazaar:migration-machine-down branch Nov 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment