implement Runner.Worker method #2

Merged
merged 1 commit into from Mar 8, 2017

Conversation

Projects
None yet
3 participants
Owner

rogpeppe commented Mar 3, 2017

There are places in Juju that implement reasonably
complicated wrappers to gain access to running
workers. The Worker method provides access
to the currently running worker within a Runner,
which should reduce the need for such things.

mhilton approved these changes Mar 7, 2017

LGTM

runner.go
+ // run goroutine.
+
+ // start holds the function to create the worker.
+ // If this is nil,
@mhilton

mhilton Mar 7, 2017

Member

please add a then clause.

runner.go
+
+// killWorkerUnlocked is like killWorker except that it expects
+// the runner.mu mutex to be held already.
+func (runner *Runner) killWorkerUnlocked(id string) {
@mhilton

mhilton Mar 7, 2017

Member

this looks locked

runner.go
+
+ // finalError holds the error that will be returned
+ // when the runner finally exits.
+ finalError error
@mjs

mjs Mar 7, 2017

Would you adding some comments that isDying and finalError should only be touched by the run goroutine and/or things it calls.

@rogpeppe

rogpeppe Mar 7, 2017

Owner

Added a comment to finalError - isDying already says "isDying is maintained by the run goroutine", and I think that as it's not inside a mutex, it should be clear that "maintained by" implies ownership and hence that it shouldn't be modified outside of that goroutine.

I'm open to alternative comment wordings.

runner.go
select {
- case runner.startc <- startReq{id, startFunc}:
+ case runner.startc <- startReq{id, startFunc, reply}:
+ <-reply
@mjs

mjs Mar 7, 2017

This naked channel read makes me nervous. Are you really sure there will always be a reply? Can it be inside a select which also checks the Tomb?

@rogpeppe

rogpeppe Mar 7, 2017

Owner

It's OK because the startc channel is synchronous so if we succeed in sending on it, we know that the run goroutine has entered the startc arm of the select, and that calls startWorker which never blocks (trivially verifiable) and then closes the reply channel, so we can be completely sure we'll always get a reply.

@rogpeppe

rogpeppe Mar 7, 2017

Owner

I added a comment to that effect.

runner.go
+ // Worker should not block almost all of the time.
+ runner.mu.Lock()
+ stopped = true
+ runner.mu.Unlock()
@mjs

mjs Mar 7, 2017

The handling of runner.mu and stopped in this func is somewhat hard to follow (feels bug prone). Can it be structured to be more clear?

+ stopped = true
+ runner.mu.Unlock()
+ runner.workersChangedCond.Broadcast()
+ return nil, ErrStopped
@mjs

mjs Mar 7, 2017

As discussed IRL, please explore options for making the synchronising/handling stopped clearer. Also consider making the second half a separate method/func.

@rogpeppe

rogpeppe Mar 7, 2017

Owner

I just looked into factoring out the second half into its own function, but it doesn't work out so well - we'd have to call it with the lock held (so asymmetrical locking) and the getWorker function would either need to be duplicated or passed as a parameter, neither of which I'm that keen on.

Can you remind me of the suggestions/conversation around stopped, please?

@mjs

mjs Mar 7, 2017

The issue with the handling of the mu and stopped in this method is that it's quite hard to reason about when the lock is locked and how/when it's unlocked. I feel like there's got to be better way to structure the code.

- info.start = req.start
- info.restartDelay = 0
+ logger.Debugf("start %q", req.id)
+ runner.startWorker(req)
@mjs

mjs Mar 7, 2017

Nice idea to extract these out. run is now much more readable.

+
+ // Wait long enough to be reasonably confident that the
+ // worker has been started and registered.
+ time.Sleep(10 * time.Millisecond)
@mjs

mjs Mar 7, 2017

No! This kind of thing always results in flaky tests.

A more robust approach is to poll for up to some much longer time (many seconds) until the desired result is seen.

@rogpeppe

rogpeppe Mar 7, 2017

Owner

The result is the same whether we sleep for a long time or not - there's no externally visible difference and the test will pass either way.

We could insert test synchronisation points into the production code, I guess, but I'd prefer not.

If occasionally the code takes longer than 10ms to add the worker to the map, we'll just go through the slow path (the path that TestWorkerWithWorkerNotImmediatelyAvailable tests).

@mjs

mjs Mar 7, 2017

hmmm, ok. if the test will pass without the sleep why bother with it?

@rogpeppe

rogpeppe Mar 7, 2017

Owner

Because we test a different path through the code (verifiable with go test -cover).

+ }()
+ // Sleep long enough that we're pretty sure that Worker
+ // will be blocked.
+ time.Sleep(10 * time.Millisecond)
@mjs

mjs Mar 7, 2017

Again, polling here would be good (and elsewhere)

@rogpeppe

rogpeppe Mar 7, 2017

Owner

Same deal - there's nothing externally visible that tells us whether Worker has blocked or not.

mjs approved these changes Mar 7, 2017

On a re-read of the change I'm ok with it, as long as there's a comment explaining the relationship between the mutex and the cond variable.

implement Runner.Worker method
There are places in Juju that implement reasonably
complicated wrappers to gain access to running
workers. The Worker method provides access
to the currently running worker within a Runner,
which should reduce the need for such things.

@rogpeppe rogpeppe merged commit c279cb0 into juju:v1 Mar 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment