New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement Runner.Worker method #2
Conversation
7448e46
to
a3a99a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
runner.go
Outdated
// run goroutine. | ||
|
||
// start holds the function to create the worker. | ||
// If this is nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a then clause.
runner.go
Outdated
|
||
// killWorkerUnlocked is like killWorker except that it expects | ||
// the runner.mu mutex to be held already. | ||
func (runner *Runner) killWorkerUnlocked(id string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks locked
runner.go
Outdated
|
||
// finalError holds the error that will be returned | ||
// when the runner finally exits. | ||
finalError error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you adding some comments that isDying
and finalError
should only be touched by the run
goroutine and/or things it calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to finalError - isDying already says "isDying is maintained by the run goroutine", and I think that as it's not inside a mutex, it should be clear that "maintained by" implies ownership and hence that it shouldn't be modified outside of that goroutine.
I'm open to alternative comment wordings.
runner.go
Outdated
select { | ||
case runner.startc <- startReq{id, startFunc}: | ||
case runner.startc <- startReq{id, startFunc, reply}: | ||
<-reply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naked channel read makes me nervous. Are you really sure there will always be a reply? Can it be inside a select
which also checks the Tomb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK because the startc channel is synchronous so if we succeed in sending on it, we know that the run goroutine has entered the startc arm of the select, and that calls startWorker which never blocks (trivially verifiable) and then closes the reply channel, so we can be completely sure we'll always get a reply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to that effect.
runner.go
Outdated
// Worker should not block almost all of the time. | ||
runner.mu.Lock() | ||
stopped = true | ||
runner.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of runner.mu
and stopped
in this func is somewhat hard to follow (feels bug prone). Can it be structured to be more clear?
info.start = req.start | ||
info.restartDelay = 0 | ||
logger.Debugf("start %q", req.id) | ||
runner.startWorker(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea to extract these out. run
is now much more readable.
stopped = true | ||
runner.mu.Unlock() | ||
runner.workersChangedCond.Broadcast() | ||
return nil, ErrStopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed IRL, please explore options for making the synchronising/handling stopped
clearer. Also consider making the second half a separate method/func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked into factoring out the second half into its own function, but it doesn't work out so well - we'd have to call it with the lock held (so asymmetrical locking) and the getWorker function would either need to be duplicated or passed as a parameter, neither of which I'm that keen on.
Can you remind me of the suggestions/conversation around stopped, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with the handling of the mu
and stopped
in this method is that it's quite hard to reason about when the lock is locked and how/when it's unlocked. I feel like there's got to be better way to structure the code.
|
||
// Wait long enough to be reasonably confident that the | ||
// worker has been started and registered. | ||
time.Sleep(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! This kind of thing always results in flaky tests.
A more robust approach is to poll for up to some much longer time (many seconds) until the desired result is seen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is the same whether we sleep for a long time or not - there's no externally visible difference and the test will pass either way.
We could insert test synchronisation points into the production code, I guess, but I'd prefer not.
If occasionally the code takes longer than 10ms to add the worker to the map, we'll just go through the slow path (the path that TestWorkerWithWorkerNotImmediatelyAvailable tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, ok. if the test will pass without the sleep why bother with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we test a different path through the code (verifiable with go test -cover).
}() | ||
// Sleep long enough that we're pretty sure that Worker | ||
// will be blocked. | ||
time.Sleep(10 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, polling here would be good (and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal - there's nothing externally visible that tells us whether Worker has blocked or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a re-read of the change I'm ok with it, as long as there's a comment explaining the relationship between the mutex and the cond variable.
There are places in Juju that implement reasonably complicated wrappers to gain access to running workers. The Worker method provides access to the currently running worker within a Runner, which should reduce the need for such things.
There are places in Juju that implement reasonably
complicated wrappers to gain access to running
workers. The Worker method provides access
to the currently running worker within a Runner,
which should reduce the need for such things.