Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fix bug lp#1694734 #7552
Conversation
|
!!build!! |
| @@ -42,26 +42,47 @@ func (r *actionsResolver) NextOp( | ||
| opFactory operation.Factory, | ||
| ) (operation.Operation, error) { | ||
| nextAction, err := nextAction(remoteState.Actions, localState.CompletedActions) | ||
| - if err != nil { | ||
| + if err != nil && err != resolver.ErrNoOperation { |
ExternalReality
Jun 26, 2017
•
Member
If there are no operation left to be run, then we cannot return the error signaling such here, we must first check to see if an action is already running (that has been interrupted) before we declare that there is nothing to do.
wallyworld
Jun 26, 2017
Owner
I think we need to add a code comment with something like the above text
wallyworld
approved these changes
Jun 26, 2017
Just to check - the newly added tests fail without the code modifications being in place? And performing the QA steps produces the bad behaviour without the code modifications? And all other aspects of action execution work as expected with the code mods?
| @@ -42,26 +42,47 @@ func (r *actionsResolver) NextOp( | ||
| opFactory operation.Factory, | ||
| ) (operation.Operation, error) { | ||
| nextAction, err := nextAction(remoteState.Actions, localState.CompletedActions) | ||
| - if err != nil { | ||
| + if err != nil && err != resolver.ErrNoOperation { |
ExternalReality
Jun 26, 2017
•
Member
If there are no operation left to be run, then we cannot return the error signaling such here, we must first check to see if an action is already running (that has been interrupted) before we declare that there is nothing to do.
wallyworld
Jun 26, 2017
Owner
I think we need to add a code comment with something like the above text
| return nil, err | ||
| } | ||
| switch localState.Kind { | ||
| case operation.RunHook: | ||
| // We can still run actions if the unit is in a hook error state. | ||
| - if localState.Step == operation.Pending { | ||
| + if localState.Step == operation.Pending && err != resolver.ErrNoOperation { |
wallyworld
Jun 26, 2017
Owner
If we are here, then either err == nil or err == ErrNoOperation
I think it would read much nicer to say
if localState.Step == operation.Pending && err == nil
|
!!build!! |
|
$$MERGE$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
ExternalReality commentedJun 25, 2017
•
Edited 1 time
-
ExternalReality
Jun 25, 2017
Description of change
Why is this change needed?
Fixes the referenced bug.
In a nutshell, if the
uniterwas to crash while running an action, upon restart it would attempt to fail that action, regardless of what thecontrollerthought. This was bad since thecontrollerdoes not allow the failure of arbitrary actions (only pending ones) and thus thecontrollerwould deny theuniter'sfailure request for actions that it considered to be finished. This would cause the uniter to spin in a never ending loop of failure request retries. Ajuju run "reboot"was a good way to potentially jam up theuniterin this fashion.How do we verify that the change works?
It is difficult to verify this bug from the CLI since invoking its cause does not always lead to the error. It is quite non-deterministic. Concretely, running
juju run "reboot"may cause the issue to arise or may not. However, changing the code in such a way that theuniter"crashes" at specific points in its operation is a good way to flex the error and thus test the solution. Read on...Add the following code immediately before the
uniterwrites to its local state after executing an operation (action). At this PR's proposed commit point the line should be https://github.com/juju/juju/blob/develop/worker/uniter/operation/executor.go#L103This will crash the
uniterwhen running an action. Theunitershould restart, realize that it shutdown while running an action, and move back into a normal state of operation, no longer wanting to run the action that it was running when it crashed.You may want to crash the
uniterafter it begins running an action but before it updates thecontroller'sstate. To do this you can put a panic in the appropriate spot in the code. Upon restart, theunitershould recover from this too without attempting to again run the action in question and move into a normal waiting state.In summary, deploy a simple service:
make one of the code changes above (which help to reproduce the bug's error condition) and then run an action (try the other code change afterwards):
This will simulate a hard stop of the uniter in a places that will cause the referenced bug's error. It will show that the
uniteris now able to recover from such conditions.Does it affect current user workflow? CLI? API?
No
Bug reference
lp#1694734