Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Runner.Worker method #2

Merged
merged 1 commit into from Mar 8, 2017

Conversation

rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Mar 3, 2017

There are places in Juju that implement reasonably
complicated wrappers to gain access to running
workers. The Worker method provides access
to the currently running worker within a Runner,
which should reduce the need for such things.

@rogpeppe rogpeppe force-pushed the 002-get-worker branch 4 times, most recently from 7448e46 to a3a99a5 Compare March 7, 2017 14:50
Copy link

@mhilton mhilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

runner.go Outdated
// run goroutine.

// start holds the function to create the worker.
// If this is nil,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a then clause.

runner.go Outdated

// killWorkerUnlocked is like killWorker except that it expects
// the runner.mu mutex to be held already.
func (runner *Runner) killWorkerUnlocked(id string) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks locked

runner.go Outdated

// finalError holds the error that will be returned
// when the runner finally exits.
finalError error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you adding some comments that isDying and finalError should only be touched by the run goroutine and/or things it calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to finalError - isDying already says "isDying is maintained by the run goroutine", and I think that as it's not inside a mutex, it should be clear that "maintained by" implies ownership and hence that it shouldn't be modified outside of that goroutine.

I'm open to alternative comment wordings.

runner.go Outdated
select {
case runner.startc <- startReq{id, startFunc}:
case runner.startc <- startReq{id, startFunc, reply}:
<-reply
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naked channel read makes me nervous. Are you really sure there will always be a reply? Can it be inside a select which also checks the Tomb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK because the startc channel is synchronous so if we succeed in sending on it, we know that the run goroutine has entered the startc arm of the select, and that calls startWorker which never blocks (trivially verifiable) and then closes the reply channel, so we can be completely sure we'll always get a reply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to that effect.

runner.go Outdated
// Worker should not block almost all of the time.
runner.mu.Lock()
stopped = true
runner.mu.Unlock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of runner.mu and stopped in this func is somewhat hard to follow (feels bug prone). Can it be structured to be more clear?

info.start = req.start
info.restartDelay = 0
logger.Debugf("start %q", req.id)
runner.startWorker(req)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea to extract these out. run is now much more readable.

stopped = true
runner.mu.Unlock()
runner.workersChangedCond.Broadcast()
return nil, ErrStopped
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed IRL, please explore options for making the synchronising/handling stopped clearer. Also consider making the second half a separate method/func.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked into factoring out the second half into its own function, but it doesn't work out so well - we'd have to call it with the lock held (so asymmetrical locking) and the getWorker function would either need to be duplicated or passed as a parameter, neither of which I'm that keen on.

Can you remind me of the suggestions/conversation around stopped, please?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the handling of the mu and stopped in this method is that it's quite hard to reason about when the lock is locked and how/when it's unlocked. I feel like there's got to be better way to structure the code.


// Wait long enough to be reasonably confident that the
// worker has been started and registered.
time.Sleep(10 * time.Millisecond)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! This kind of thing always results in flaky tests.

A more robust approach is to poll for up to some much longer time (many seconds) until the desired result is seen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is the same whether we sleep for a long time or not - there's no externally visible difference and the test will pass either way.

We could insert test synchronisation points into the production code, I guess, but I'd prefer not.

If occasionally the code takes longer than 10ms to add the worker to the map, we'll just go through the slow path (the path that TestWorkerWithWorkerNotImmediatelyAvailable tests).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, ok. if the test will pass without the sleep why bother with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we test a different path through the code (verifiable with go test -cover).

}()
// Sleep long enough that we're pretty sure that Worker
// will be blocked.
time.Sleep(10 * time.Millisecond)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, polling here would be good (and elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal - there's nothing externally visible that tells us whether Worker has blocked or not.

Copy link

@mjs mjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a re-read of the change I'm ok with it, as long as there's a comment explaining the relationship between the mutex and the cond variable.

There are places in Juju that implement reasonably
complicated wrappers to gain access to running
workers. The Worker method provides access
to the currently running worker within a Runner,
which should reduce the need for such things.
@rogpeppe rogpeppe merged commit c279cb0 into juju:v1 Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants