Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
[presence] Always sync before responding to requests. #6629
Conversation
howbazaar
added some commits
Nov 29, 2016
QAI patched the initiate migration function so it didn't actually start the migration, just executed the prechecks, and would then run the migrate command repeatedly in a bash loop. Without this change, the command would error ~5 - 10% of the time. With this change I was unable to get it to fail after ~50 executions. |
mjs
approved these changes
Nov 29, 2016
I'm so glad you managed to track this down. This fix will likely tighten up other presence related weirdness that we sometimes see.
| @@ -106,6 +113,7 @@ func SourcePrecheck(backend PrecheckBackend) error { | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
| + defer controllerBackend.Close() |
| @@ -313,7 +320,7 @@ func (w *Watcher) flush() { | ||
| // handle deals with requests delivered by the public API | ||
| // onto the background watcher goroutine. | ||
| func (w *Watcher) handle(req interface{}) { | ||
| - logger.Tracef("got request: %#v", req) | ||
| + logger.Tracef("[%s] got request: %#v for model", w.modelUUID[:6], req) |
mjs
Nov 29, 2016
Contributor
how about adding a field with the model UUID prefix instead of slicing at each call site? Better yet, what about a logging method which logs at trace and automatically adds the model UUID prefix?
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
howbazaar commentedNov 29, 2016
•
Edited 1 time
-
howbazaar
Nov 29, 2016
When initiating a migration, sometimes the precheck would report an agent as down when status would show it as up. The cause of this was a timing issue in the state workers. It was possible for a request for agent alive to be processed by the presence worker before it had synced its state with the database, which would cause it to report the agent as down.
This change moves the initial call to sync with the database outside of the infinite for loop in the loop function. It appears that the intention was always to start with an initial sync as the next sync time was time.After(0). However since it was in a select, sometimes if a request came in very quickly, the request would be handled first.
A drive-by fix for a state leak in the migration pre-check code. The state returned for checking the controller machines was never closed.