Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Remove and close pooled states when a model is destroyed #6408
Conversation
perrito666
approved these changes
Oct 7, 2016
I like this PR a lot, please get a second pair on eyes on it just to be sure it is a significant change.
| @@ -538,9 +544,33 @@ func (srv *Server) serveConn(wsConn *websocket.Conn, modelUUID string, apiObserv | ||
| conn := rpc.NewConn(codec, apiObserver) | ||
| - h, err := srv.newAPIHandler(conn, modelUUID, host) | ||
| + // Note that we don't overwrite modelUUID here because |
babbageclunk
Oct 10, 2016
Member
So should I change resolvedModelUUID to modelUUID and pull that section out of validateModelUUID?
| + st, err = srv.statePool.Get(resolvedModelUUID) | ||
| + } | ||
| + | ||
| + if err == nil { |
wallyworld
Oct 10, 2016
Owner
I think this would have been a little cleaner to retain the newAPIHandler() func and have it return st as well has apiHandler.
babbageclunk
Oct 10, 2016
•
Member
The reason I merged it back in was because it felt weird to have the Get and Put in different scopes. If newAPIHandler returned an error for some reason (even though the Get succeeded) the Put would never be done, and that state would be leaked.
| + case <-srv.tomb.Dying(): | ||
| + return tomb.ErrDying | ||
| + case modelIDs := <-w.Changes(): | ||
| + for _, modelID := range modelIDs { |
wallyworld
Oct 10, 2016
Owner
these are modelUUIDs, can we rename the vars to avoid the notion there maybe such a thing as a modelID
| + c.Assert(err, jc.ErrorIsNil) | ||
| + | ||
| + // Make a request for the model API to check it puts | ||
| + // state back into the pool once the connection is closed. |
wallyworld
Oct 10, 2016
Owner
Where is the check that it puts state back into the pool done? Is that expected to be part of this test?
babbageclunk
Oct 10, 2016
Member
Well, it couldn't be closed unless it had been put/released, so the test checks it indirectly - the refcount isn't exposed so it can't test it directly (although I could do that if you think it's worthwhile). Originally I had two tests, one that didn't hit the API and one that did, but the former was a subset of the latter so I deleted it.
| +// Put indicates that the client has finished using the State. If the | ||
| +// state has been marked for removal, it will be closed and removed | ||
| +// when the final Put is done. | ||
| +func (p *StatePool) Put(modelUUID string) error { |
wallyworld
Oct 10, 2016
Owner
Put() seems like a poor choice of name for this? It implies it should be the opposite of Get(), which returns a State.
Could we call it Release() ?
mjs
Oct 10, 2016
Contributor
-1 to Release. That implies to me that the call to Get granted exclusive access to the returned State which isn't the case.
Put confused me at first, but on reflection I don't think it's too bad.
wallyworld
Oct 10, 2016
Owner
I disagree :-)
What's wrong with Release() ? That is what is being done.
Put() really should be the opposite to Get() and in the PR it is not. It needs a different name to better reflect the semantics. Release() does that IMHO.
wallyworld
Oct 10, 2016
Owner
Actually, I liken Release() to operating on a wait group or something like that. Multiple processes which have previously got a resource then Release() when done, each Release() decrements the ref count. In the theme of wait groups, maybe Done() would be better? ie "I am done with this model UUID"
babbageclunk
Oct 10, 2016
Member
The reason I used put was to match the method names on sync.Pool. I don't mind Release (I think I toyed with that) - I can see your point that Put should really take the State. Happy to change it to Release if you guys agree. I don't think it implies exclusive access.
| + item, ok := p.pool[modelUUID] | ||
| + if !ok { | ||
| + // Don't require the client to keep track of what we've seen - | ||
| + // ignore unknown model ids. |
babbageclunk
Oct 10, 2016
Member
I mean, they are identifiers (which happen to be universally unique). I'm not sure every mention of them needs to highlight the universal part, given that people aren't digging around inside them. But I don't feel strongly about this, so I don't mind changing it if other people feel differently.
| c.Assert(err, jc.ErrorIsNil) | ||
| c.Assert(st2_, gc.Equals, st2) | ||
| } | ||
| func (s *statePoolSuite) TestGetWithControllerEnv(c *gc.C) { | ||
| - p := state.NewStatePool(s.State) | ||
| - defer p.Close() | ||
| - | ||
| // When a State for the controller env is requested, the same |
babbageclunk
Oct 10, 2016
Member
Do you mean the name of the test? Will change (and the comment too).
| c.Assert(err, jc.ErrorIsNil) | ||
| c.Assert(st2_, gc.Not(gc.Equals), st2) | ||
| } | ||
| + | ||
| +func (s *statePoolSuite) TestPutSystemState(c *gc.C) { | ||
| + // Doesn't maintain a refcount for the system state. |
babbageclunk
Oct 10, 2016
Member
I guess I'm not - really this is testing that it doesn't blow up when someone Puts the system state. I'll add a check that it's not closed.
| + c.Assert(err, gc.ErrorMatches, "unable to return unknown model deadbeef to the pool") | ||
| +} | ||
| + | ||
| +func (s *statePoolSuite) TestTooManyPuts(c *gc.C) { |
| + go func() { | ||
| + defer srv.wg.Done() | ||
| + srv.tomb.Kill(srv.processModelRemovals()) | ||
| + }() |
| + case <-srv.tomb.Dying(): | ||
| + return tomb.ErrDying | ||
| + case modelIDs := <-w.Changes(): | ||
| + for _, modelID := range modelIDs { |
wallyworld
Oct 10, 2016
Owner
these are modelUUIDs, can we rename the vars to avoid the notion there maybe such a thing as a modelID
| + // time slices that the watcher uses for coalescing | ||
| + // events. Without it the model appears and disappears quickly | ||
| + // enough that it never generates a change from WatchModels. | ||
| + // Many Bothans died to bring us this information. |
| + // events. Without it the model appears and disappears quickly | ||
| + // enough that it never generates a change from WatchModels. | ||
| + // Many Bothans died to bring us this information. | ||
| + time.Sleep(coretesting.ShortWait) |
mjs
Oct 10, 2016
Contributor
Would it be more reliable to have the test have it's own model watcher from the same State instance that the apiserver is using? It could then watch for the new model to appear itself. I'm worried on a heavily loaded test machine, the sleep might not be enough.
| +// Put indicates that the client has finished using the State. If the | ||
| +// state has been marked for removal, it will be closed and removed | ||
| +// when the final Put is done. | ||
| +func (p *StatePool) Put(modelUUID string) error { |
wallyworld
Oct 10, 2016
Owner
Put() seems like a poor choice of name for this? It implies it should be the opposite of Get(), which returns a State.
Could we call it Release() ?
mjs
Oct 10, 2016
Contributor
-1 to Release. That implies to me that the call to Get granted exclusive access to the returned State which isn't the case.
Put confused me at first, but on reflection I don't think it's too bad.
wallyworld
Oct 10, 2016
Owner
I disagree :-)
What's wrong with Release() ? That is what is being done.
Put() really should be the opposite to Get() and in the PR it is not. It needs a different name to better reflect the semantics. Release() does that IMHO.
wallyworld
Oct 10, 2016
Owner
Actually, I liken Release() to operating on a wait group or something like that. Multiple processes which have previously got a resource then Release() when done, each Release() decrements the ref count. In the theme of wait groups, maybe Done() would be better? ie "I am done with this model UUID"
babbageclunk
Oct 10, 2016
Member
The reason I used put was to match the method names on sync.Pool. I don't mind Release (I think I toyed with that) - I can see your point that Put should really take the State. Happy to change it to Release if you guys agree. I don't think it implies exclusive access.
babbageclunk
added some commits
Oct 5, 2016
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
babbageclunk commentedOct 7, 2016
Investigating http://pad.lv/1625774 revealed that there were a lot of state instances (with their attendant goroutines) lingering after a model had been destroyed. They turned out to be held by the API server's StatePool.
Add StatePool.Remove and change the API server to watch for model removals and call it. This also required adding reference counting to the items in the pool - the State instances are designed to be shared between multiple requests, but we need to be sure that they aren't in use when they are closed. StatePool.Put was added so the API server could indicate that it's finished with a State.
QA done: