Skip to content
This repository

Terminating instances cause some status queries to hang #102

Closed
priteau opened this Issue · 6 comments

2 participants

Pierre Riteau Tim Freeman
Pierre Riteau
Owner

When terminating an instance, any global status query and any status query on the instance hangs.

This is caused by the WorkspaceHomeImpl:destroy method taking a per-instance lock for the duration of the whole termination. This includes a lengthy call to the workspace control agent. The WorkspaceHomeImpl:find method also tries to take this lock and hangs while the termination is in progress.

Tim Freeman
Collaborator
timf commented

WSRF destroy semantics originally prohibited that destroy from being asynchronous. Converting it to asynchronous (WSRF semantics are probably not important at this point) would probably be the best overall solution but most work, consider the client half of that, too. Perhaps make it still appear synchronous to the caller whilst releasing the lock? Or allow read-only lock free access for status queries?

Pierre Riteau
Owner

Tim, thank you very much for taking the time to comment on this issue!

I've taken a simple approach where I release the lock during the call to workspace control. Would you mind reviewing it? It is commit dd41e55.

Tim Freeman
Collaborator
timf commented

I think if you remove the lock, other actions could proceed during the destroy. I can't dig into the code but to protect against this there would at least need to be a per-instance lock around a state change to "destroying" (if there is such an intermediate state, I can't remember) and then launch the long running task. That way, other attempts to alter the instance would see it as destroying and nothing could happen. Something like that, sorry I can't research it.

Pierre Riteau
Owner

I think that is what I am doing. I keep the lock for almost everything during the destroy, except what I think is calling workspace-control.

Tim Freeman
Collaborator
timf commented

Oh.. then that sounds OK.

Pierre Riteau
Owner

The aforementioned approach had some issues, so I reworked it by introducing an extra lock dedicated to destroy.

Pierre Riteau priteau closed this issue from a commit
Pierre Riteau Prevent destroy queries from blocking status queries
The destroy method in WorkspaceHomeImpl was taking a per-instance lock
for the whole duration of an instance termination. This blocked the find
method (called by --status queries) which tries to take the same lock.

This commit changes the locking code of destroy so that it is released
while making the lengthy call to the workspace control agent.  We also
add an additional instance-specific lock for destroy. This way, a second
call to destroy will block at the beginning. When this second call
eventually proceeds, it will not find the instance because it has been
removed (which is the current behavior).  It also prevents the remove
handler to be called concurrently with a destroy from another workspace
action (for instance at the end of a start).

Closes #102.
d2cd8cc
Pierre Riteau priteau closed this in d2cd8cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.