New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random subset orchestrator doesn't work as expected #2472

Closed
pgressa opened this Issue Apr 28, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@pgressa
Contributor

pgressa commented Apr 28, 2017

Issue type: Bug report
My Rundeck detail

  • Rundeck version: 2.7.3
  • install type: war
  • OS Name/version: CentOs 7
  • DB Type/version: postgres

Expected Behavior
I have a job with two node steps and a node filter that matches 4 nodes. When using orchestrator type 'Random subset' with count parameter 1, I would expect that the job is executed on one randomly chosen node.

Actual Behavior
The first step is executed on one randomly chosen node and the second steps is executed on all steps.

How to reproduce Behavior
job with two steps and four nodes, with Random subset orchestrator with count parameter set to 1.

@calebcall

This comment has been minimized.

Show comment
Hide comment
@calebcall

calebcall Apr 28, 2017

We've noticed this same behavior recently. We are on the current release (2.8.2), also on CentOS 7 with a mysql backend. Does your job you see this with have a workflow step as one of your steps? On our jobs that only have commands for each step, we see the expected behavior, if we have a workflow step to check if another job is running, we see all steps before the workflow step fail on the second slave, but then all jobs after the workflow step succeed.

calebcall commented Apr 28, 2017

We've noticed this same behavior recently. We are on the current release (2.8.2), also on CentOS 7 with a mysql backend. Does your job you see this with have a workflow step as one of your steps? On our jobs that only have commands for each step, we see the expected behavior, if we have a workflow step to check if another job is running, we see all steps before the workflow step fail on the second slave, but then all jobs after the workflow step succeed.

@calebcall

This comment has been minimized.

Show comment
Hide comment
@calebcall

calebcall Apr 28, 2017

We just setup a test job and could replicate this and the issue goes away when we remove the workflow step.

calebcall commented Apr 28, 2017

We just setup a test job and could replicate this and the issue goes away when we remove the workflow step.

@gschueler gschueler added the bug label May 1, 2017

@gschueler gschueler added this to the 2.8.3 milestone May 1, 2017

@gschueler

This comment has been minimized.

Show comment
Hide comment
@gschueler

gschueler Jun 30, 2017

Member

Note: it seems there is a problem with the way the Orchestrator Node Dispatcher system interacts with the Workflow Strategy that is in use:

  • Node-first strategy: if you have any workflow steps, the orchestrator will not be applied to any subsequent node steps
  • Step-first strategy; the orchestrator only applies on the first step
  • Parallel/ruleset: the orchestrator will be applied to every step, but not in a repeatable way (i.e the Random Subset orchestrator will apply a different subset)

So, this is a deeper issue than I thought, the only reliable use of the Orchestrator right now is on node-first strategy workflow with only Node steps. 😞

Member

gschueler commented Jun 30, 2017

Note: it seems there is a problem with the way the Orchestrator Node Dispatcher system interacts with the Workflow Strategy that is in use:

  • Node-first strategy: if you have any workflow steps, the orchestrator will not be applied to any subsequent node steps
  • Step-first strategy; the orchestrator only applies on the first step
  • Parallel/ruleset: the orchestrator will be applied to every step, but not in a repeatable way (i.e the Random Subset orchestrator will apply a different subset)

So, this is a deeper issue than I thought, the only reliable use of the Orchestrator right now is on node-first strategy workflow with only Node steps. 😞

@gschueler

This comment has been minimized.

Show comment
Hide comment
@gschueler

gschueler Aug 2, 2017

Member

Added a fix:

  • orchestrator applied correctly for all node dispatched steps. For step-first strategy, that means that it is applied multiple times in the workflow (i.e. for each step), and for node-first strategy which contains one or more workflow-steps, it is also applied to each sequence of node steps.
  • since the orchestrator is applied multiple times, this implies that the orchestrator plugin SHOULD return the same sequence of nodes each time it is applied with the same configuration
    • I modified the RandomSubset orchestrator to operate this way: when applied multiple times within a workflow, the same random seed is used to ensure the same random sequence of nodes is used
Member

gschueler commented Aug 2, 2017

Added a fix:

  • orchestrator applied correctly for all node dispatched steps. For step-first strategy, that means that it is applied multiple times in the workflow (i.e. for each step), and for node-first strategy which contains one or more workflow-steps, it is also applied to each sequence of node steps.
  • since the orchestrator is applied multiple times, this implies that the orchestrator plugin SHOULD return the same sequence of nodes each time it is applied with the same configuration
    • I modified the RandomSubset orchestrator to operate this way: when applied multiple times within a workflow, the same random seed is used to ensure the same random sequence of nodes is used

@gschueler gschueler closed this Aug 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment