-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job's references are reported as Killed #821
Comments
Hi @schicky |
There is about 3000 lines of errors, but seems to be two distinct ones (Cannot change from ABORTED to SUCCEEDED and Cannot change from SUCCEEDED to RUNNING). Here is a sample of two of them 2014-06-04 18:49:33,072 ERROR WorkflowService - updateStateForStep([identifier:1/1, index:0, stepStateChange:StepStateChangeImpl{stepState=StepStateImpl{executionState=SUCCEEDED, metadata=null, errorMessage='null'}, nodeName='vqas2128a_tomcat_applyweb-alpha-a3', nodeState=true}, timestamp:Wed Jun 04 18:49:33 GMT 2014]): Cannot change from ABORTED to SUCCEEDED 014-06-04 18:49:47,892 ERROR WorkflowService - updateStateForStep([identifier:5/1, index:0, stepStateChange:StepStateChangeImpl{stepState=StepStateImpl{executionState=RUNNING, metadata=null, errorMessage='null'}, nodeName='mpdm0823a_tomcat', nodeState=true}, timestamp:Wed Jun 04 18:49:47 GMT 2014]): Cannot change from SUCCEEDED to RUNNING |
multiple threads might change value of the same state object when running a node-step job reference in parallel. don't throw illegal state exception
multiple threads might change value of the same state object when running a node-step job reference in parallel. don't throw illegal state exception
|
Jobs are being reported as "Killed" although they are completing successfully. The same steps are being run on multiple nodes and it seems random which ones this happens to. It is happening to about 5-10% of the nodes and the parent job is then "Failed" although all the desired actions were completed. It is also not always the same job reference that is having the problem. I believe it is a false report because the workflow still continues even though it is set to stop with any failure and the actions they are performing are completed successfully on the node. So far the only correlation I've found is on job references. I haven't seen this behavior on jobs with no job references. I'm also receiving a java.util.ConcurrentModificationException but not on the same node that is reporting killed, so it may not be related.
I'm running Rundeck 2.1.0-1 on Tomcat with Java 1.7.0_45, Rhel 6.4
The text was updated successfully, but these errors were encountered: