Terminating instances cause some status queries to hang #102

priteau opened this Issue Jun 22, 2012 · 6 comments


None yet

2 participants

Nimbus member

When terminating an instance, any global status query and any status query on the instance hangs.

This is caused by the WorkspaceHomeImpl:destroy method taking a per-instance lock for the duration of the whole termination. This includes a lengthy call to the workspace control agent. The WorkspaceHomeImpl:find method also tries to take this lock and hangs while the termination is in progress.

@priteau priteau was assigned Jun 22, 2012
Nimbus member

WSRF destroy semantics originally prohibited that destroy from being asynchronous. Converting it to asynchronous (WSRF semantics are probably not important at this point) would probably be the best overall solution but most work, consider the client half of that, too. Perhaps make it still appear synchronous to the caller whilst releasing the lock? Or allow read-only lock free access for status queries?

Nimbus member

Tim, thank you very much for taking the time to comment on this issue!

I've taken a simple approach where I release the lock during the call to workspace control. Would you mind reviewing it? It is commit dd41e55173542e654e8dfa92763e1975794d36b6.

Nimbus member

I think if you remove the lock, other actions could proceed during the destroy. I can't dig into the code but to protect against this there would at least need to be a per-instance lock around a state change to "destroying" (if there is such an intermediate state, I can't remember) and then launch the long running task. That way, other attempts to alter the instance would see it as destroying and nothing could happen. Something like that, sorry I can't research it.

Nimbus member

I think that is what I am doing. I keep the lock for almost everything during the destroy, except what I think is calling workspace-control.

Nimbus member

Oh.. then that sounds OK.

Nimbus member

The aforementioned approach had some issues, so I reworked it by introducing an extra lock dedicated to destroy.

@priteau priteau added a commit that closed this issue Jul 9, 2012
@priteau priteau Prevent destroy queries from blocking status queries
The destroy method in WorkspaceHomeImpl was taking a per-instance lock
for the whole duration of an instance termination. This blocked the find
method (called by --status queries) which tries to take the same lock.

This commit changes the locking code of destroy so that it is released
while making the lengthy call to the workspace control agent.  We also
add an additional instance-specific lock for destroy. This way, a second
call to destroy will block at the beginning. When this second call
eventually proceeds, it will not find the instance because it has been
removed (which is the current behavior).  It also prevents the remove
handler to be called concurrently with a destroy from another workspace
action (for instance at the end of a start).

Closes #102.
@priteau priteau closed this in d2cd8cc Jul 9, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment