Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
implement Runner.Worker method #2
Conversation
| + // run goroutine. | ||
| + | ||
| + // start holds the function to create the worker. | ||
| + // If this is nil, |
| + | ||
| +// killWorkerUnlocked is like killWorker except that it expects | ||
| +// the runner.mu mutex to be held already. | ||
| +func (runner *Runner) killWorkerUnlocked(id string) { |
| + | ||
| + // finalError holds the error that will be returned | ||
| + // when the runner finally exits. | ||
| + finalError error |
mjs
Mar 7, 2017
Would you adding some comments that isDying and finalError should only be touched by the run goroutine and/or things it calls.
rogpeppe
Mar 7, 2017
Owner
Added a comment to finalError - isDying already says "isDying is maintained by the run goroutine", and I think that as it's not inside a mutex, it should be clear that "maintained by" implies ownership and hence that it shouldn't be modified outside of that goroutine.
I'm open to alternative comment wordings.
| select { | ||
| - case runner.startc <- startReq{id, startFunc}: | ||
| + case runner.startc <- startReq{id, startFunc, reply}: | ||
| + <-reply |
mjs
Mar 7, 2017
This naked channel read makes me nervous. Are you really sure there will always be a reply? Can it be inside a select which also checks the Tomb?
rogpeppe
Mar 7, 2017
Owner
It's OK because the startc channel is synchronous so if we succeed in sending on it, we know that the run goroutine has entered the startc arm of the select, and that calls startWorker which never blocks (trivially verifiable) and then closes the reply channel, so we can be completely sure we'll always get a reply.
| + // Worker should not block almost all of the time. | ||
| + runner.mu.Lock() | ||
| + stopped = true | ||
| + runner.mu.Unlock() |
mjs
Mar 7, 2017
The handling of runner.mu and stopped in this func is somewhat hard to follow (feels bug prone). Can it be structured to be more clear?
| + stopped = true | ||
| + runner.mu.Unlock() | ||
| + runner.workersChangedCond.Broadcast() | ||
| + return nil, ErrStopped |
mjs
Mar 7, 2017
As discussed IRL, please explore options for making the synchronising/handling stopped clearer. Also consider making the second half a separate method/func.
rogpeppe
Mar 7, 2017
Owner
I just looked into factoring out the second half into its own function, but it doesn't work out so well - we'd have to call it with the lock held (so asymmetrical locking) and the getWorker function would either need to be duplicated or passed as a parameter, neither of which I'm that keen on.
Can you remind me of the suggestions/conversation around stopped, please?
mjs
Mar 7, 2017
The issue with the handling of the mu and stopped in this method is that it's quite hard to reason about when the lock is locked and how/when it's unlocked. I feel like there's got to be better way to structure the code.
| - info.start = req.start | ||
| - info.restartDelay = 0 | ||
| + logger.Debugf("start %q", req.id) | ||
| + runner.startWorker(req) |
| + | ||
| + // Wait long enough to be reasonably confident that the | ||
| + // worker has been started and registered. | ||
| + time.Sleep(10 * time.Millisecond) |
mjs
Mar 7, 2017
No! This kind of thing always results in flaky tests.
A more robust approach is to poll for up to some much longer time (many seconds) until the desired result is seen.
rogpeppe
Mar 7, 2017
Owner
The result is the same whether we sleep for a long time or not - there's no externally visible difference and the test will pass either way.
We could insert test synchronisation points into the production code, I guess, but I'd prefer not.
If occasionally the code takes longer than 10ms to add the worker to the map, we'll just go through the slow path (the path that TestWorkerWithWorkerNotImmediatelyAvailable tests).
rogpeppe
Mar 7, 2017
Owner
Because we test a different path through the code (verifiable with go test -cover).
| + }() | ||
| + // Sleep long enough that we're pretty sure that Worker | ||
| + // will be blocked. | ||
| + time.Sleep(10 * time.Millisecond) |
rogpeppe
Mar 7, 2017
Owner
Same deal - there's nothing externally visible that tells us whether Worker has blocked or not.
mjs
approved these changes
Mar 7, 2017
On a re-read of the change I'm ok with it, as long as there's a comment explaining the relationship between the mutex and the cond variable.
rogpeppe commentedMar 3, 2017
There are places in Juju that implement reasonably
complicated wrappers to gain access to running
workers. The Worker method provides access
to the currently running worker within a Runner,
which should reduce the need for such things.