Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support parallel fork join for remote jobs #1710

Closed
updong opened this issue Aug 6, 2019 · 8 comments
Closed

Feature request: support parallel fork join for remote jobs #1710

updong opened this issue Aug 6, 2019 · 8 comments
Assignees
Labels
feature workflow Pipeline workflow related issues
Projects

Comments

@updong
Copy link

updong commented Aug 6, 2019

What happened:
I found out that I cannot use AND conditions on remote jobs

"sd@223215:qa-minos-functional-test-us" fails to match the required pattern: /^~(sd@\d+:[\w-]+|(pr|commit|release|tag)(:(.+))?)$/,

and found out that SD doesn't support parallel fork join for remote jobs

What you expected to happen:
something like this.

A job which can wait for multiple remote job with AND condition

  requires:
      - sd@123:A
      - sd@123:B
      - sd@456:C
      - sd@456:D
      - E

this job will wait for A B C D and E which runs in parallel and gets triggered when they're all successful.

How to reproduce it:
try adding remote job in requires field without ~

@jithine jithine added workflow Pipeline workflow related issues feature labels Aug 6, 2019
@vnugopal
Copy link

vnugopal commented Aug 8, 2019

@DekusDenial

@jithine jithine added this to Backlog in Active Work Sep 6, 2019
@jithine jithine moved this from Backlog to Doing in Active Work Sep 25, 2019
@d2lam d2lam self-assigned this Sep 25, 2019
@d2lam
Copy link
Member

d2lam commented Sep 27, 2019

This can be implemented only if sd@123:A, sd@123:B, sd@456:C, sd@456:D, E all start from a single point, which will look like this:
Image from iOS (2)

Without this constraint, it will get into race conditions & inconsistencies. For example, sd@123:A runs 1 week ago, B,C,D,E run now. Should F start or not? It's also almost impossible to implement since there is no way for F to track whether A,B,C,D,E finish in the same run because they can belong to different events. These events' end times can be within several minutes or several hours. How do we know which ones are supposed to be grouped together? This is probably why Jenkin's Join Plugin also relies on this constraint.

I assume this was your intention as well since you mention A B C D and E which runs in parallel @updong

I can start with some design doc on how we implement this and people can review.

@updong
Copy link
Author

updong commented Sep 27, 2019

@d2lam yes I forgot to mention about that constraint. yes I was assuming that constraint to be forced. they need to be triggered by same event.

@d2lam
Copy link
Member

d2lam commented Oct 1, 2019

DESIGN

  • Store ANY relationship that is external in Trigger Table
  • For every finished job, it will need to check both places: workflow graph and triggerFactory to figure out the next job.
  • For every next job, figure out whether it can run by building a joinList, which includes the upstream jobs that need to finish before this job can run.
  • To build this list, look at its own workflow graph and triggerFactory
  • If joinList is empty, and the next job is external, set upstreamEventId to eventId
  • If joinList is not empty:
    • If the current job is the FIRST job that finishes in the joinList:
      • If next job belongs to same pipeline, create a build with same eventId (similar to before)
      • If next job belongs to external, create a build with the same eventId as upstreamEventId
    • If it's the LAST job that finishes, start the build.
    • Otherwise, do nothing.

Example

Let's simplify the above example and use: A -> [sd@123:B, sd@456:C, D] -> E
A and E belong to sd@789

  • From sd@789's workflow graph it will capture A -> D -> E
  • Trigger table will need to store these relationships since it is not captured by its own workflow graph:
src dest
sd@789:A sd@123:B
sd@789:A sd@456:C
sd@123:B sd@789:E
sd@456:C sd@789:E
  • When A is done, it figures the next jobs to run:
    • From its workflow graph: D
    • From triggerFactory: [sd@123:B, sd@456:C]
  • Assume [sd@123:B, sd@456:C, D] don't need any other jobs beside A. So their joinList is empty. They can go ahead and start.
  • Let's say A has eventId = 1, sd@123:B and sd@456:C will will have upstreamEvent = 1 since it's external to A.
  • Let's say sd@123:B finishes first. It checks both places to figure out next job:
    • From its workflow graph: nothing
    • From triggerFactory: sd@789:E
  • Check if sd@789:E requires join. Build the joinList for sd@789:E:
    • From its workflow graph: D
    • From triggerFactory: sd@123:B, sd@456:C
  • It will then look at build of sd@123:B with upstreamEvent (= 1) and see if it finishes.
  • Since it hasn't, it creates sd@789:E with the same eventId as upstreamEventId. So now E and A belong to the same event.
  • When D finishes, it does not do anything since sd@789:E is already created, and it's not the last build in the joinList
  • When sd@456:C finishes, it starts sd@789:E

TASKS

  • Data-schema changes external trigger regex to accept sd@pipelineId:jobname
  • Config-parser updated with new data-schema
  • Workflow parser needs to handle getNextJobs and getSrcFromJoin accordingly
  • API's build plugin handles new logic for join

@d2lam
Copy link
Member

d2lam commented Oct 4, 2019

DESIGN (REVISED 10/4/2019)

The above implementation will work for backend. However, UI will need to query the trigger table to show each external nodes. Even though that's similar to what we are doing now, the current UI turns OFF displaying external triggers by default. With this feature implemented, we need to always have displaying external triggers turned ON. Otherwise, it will be very confusing to users. Therefore, I have revised the implementation design so that the load on the UI will not be too much.

  • workflow-parser now generates relationship to downstream jobs, in addition to the relationship in the yaml.
  • When each job finishes, it will figure out the next jobs. Since workflow graph now includes both internal & external edges, we can figure out next jobs by looking at the workflow graph.
  • Before each job starts, it needs to check the joinList, which is the list of builds that nextJob depends on.

Added 10/7:

  • To compute the status of the builds in joinList, look at the upstream field.
  • When starting, it should store an additional field upstream in the following form: { pipelineId: upstreamEventId}
{  
${pipelineId}: 
    {
        "eventId": ${eventId},
        ${jobName}: ${buildId} 
    }
}
  • The field is stored in such format to allow faster lookup:
    • To check if a job in joinList has started: check if upstream.pipelineId.jobName exists. (ex: sd@123:deploy is in joinList, check if upstream.123.deploy exists)
    • To check status: make API call to check upstream.pipelineId.jobName build status (ex: upstream.123.deploy)
    • To merge back to the original event: check if upstream.currentPipelineId exists. If yes, create a build under event upstream.currentPipelineId.eventId. If not, create a new event.
      - Figure out its eventId by looking its upstream event. The build's eventId can be computed by checking if current pipelineId is in the upstream list. If it is, then use that eventId.

EXAMPLE & IMPLEMENTATION:

For example (current pipeline ID = 999):
Image from iOS (3)

Previously, workflow graph will be:

{
    "nodes": [
        { "name": "~pr" },
        { "name": "~commit" },
        { "name": "A" },
        { "name": "D" },
        { "name": "G" }
    ],
    "edges": [
        { "src": "A", "dest": "D" },
        { "src": "D", "dest": "G" }
    ]
}

Now, it should be:

{
    "nodes": [
        { "name": "~pr" },
        { "name": "~commit" },
        { "name": "A" },
        { "name": "D" },
        { "name": "G" },
        { "name": "sd@111:B" },
        { "name": "sd@222:C" },
        { "name": "sd@333:E" },
        { "name": "sd@444:F" }
    ],
    "edges": [
        { "src": "A", "dest": "sd@111:B" },
        { "src": "A", "dest": "sd@222:C" },
        { "src": "A", "dest": "D" },
        { "src": "sd@111:B", "dest": "sd@333:E" },
        { "src": "sd@222:C", "dest": "sd@444:F" },
        { "src": "sd@333:E", "dest": "G", "join": true },
        { "src": "sd@444:F", "dest": "G", "join": true },
        { "src": "D", "dest": "G", "join": true }
    ]
}

To do this, we need to go pass in Trigger Factory from API -> config parser -> workflow-parser. Inside workflow parser, we compute the external nodes & edges by DFS traverse through each job's downstream jobs (look this information from the trigger table).

getNextJobs will now return both internal and external triggers:

  • When A finishes, it will trigger [sd@111:B, sd@222:C, D].
  • When sd@111:B finishes, it will trigger [sd@333:E]
    ....

Store upstream field to each build. It will look like this:

| job | eventID | upstream |
|---| ---| --- |
| A | 1 | {} |
| D | 1 | {999:1} |
| sd@111:B | 2 | {999:1} |
| sd@222:C | 3 | {999:1} |
| sd@333:D | 4 | {999:1, 111:2} |
| sd@444:E | 5 | {999:1, 222:3} |
| G | 1 | {999:1, 111:2, 222:3, 333:4, 444:5} |

Modified (10/7):

job eventID buildID upstream
A 1 11 {}
D 1 12 {999: {eventId: 1, A: 11}}
sd@111:B 2 13 {999: {eventId: 1, A: 11}}
sd@222:C 3 14 {999: {eventId: 1, A: 11}}
sd@333:E 4 15 { 999: {eventId: 1, A: 11},
  111: {eventId: 2, B: 13}}
sd@444:F 5 16 {999: {eventId: 1, A: 11},
  222: {eventId: 3, C: 14}}
G 1 17 {999: {eventId: 1, A: 11, D: 12},
  111: {eventId: 2, B: 13},
  222: {eventId: 3, C: 14},
  333: {eventId: 4, E: 15},
  444: {eventId: 5, F: 16}}

Before starting the next jobs, it should build the joinList. joinList can now be derived directly from the new workflow graph (instead of fetching the trigger table).

  • Check status of each build in the joinList. For example: G needs to check status of upstream.333.D and upstream.444.E.
    -G will have pipelineId = 999, so it will check if upstream.999 exists, and set its eventID to upstream.999.eventId

Notes:
We need to be careful since there could be multiple things updating the upstream field. For example, D finishes and update G's upstream. Shortly after, E finishes and also update G's upstream. I think we will need to implement some locking mechanism to avoid race condition.

@d2lam
Copy link
Member

d2lam commented Oct 9, 2019

Changes required:

UPDATED 10/18

To support external join, we need to rewrite the whole trigger logic.

  1. workflow graph now includes both internal & external relationship (logic in https://github.com/screwdriver-cd/workflow-parser/pull/27/files)
  2. we can get the joinList from workflow graph (logic in https://github.com/screwdriver-cd/workflow-parser/pull/27/files)
  3. after a job is done, find out all the next jobs it triggers. For each job:
    • find its joined builds (for example next job is D, and it requires [A,sd@111:B, sd@222:C]
    • construct a parentBuilds field = current build's parentBuilds + all jobs in joinList + current build info. Looks like this: { 999: {eventId: 1, A: 1}, 111: {eventId: 2, B: 2}, 222: {eventId: null, C:null } }
    • if this list is empty (this job only requires or), create/start build D (with parentBuilds info)
    • if this list is not empty (this job requires and)
      • check if build D already exists
        • if internal, check by looking at builds inside current event/parent event
        • if external, check by looking at the latest build of D and see if status is CREATED
      • if build D doesn't exist, create build D (with parentBuilds info), don't start D yet.
      • once build D exists, check if [A,sd@111:B, sd@222:C] are done by looking at its parentBuilds field
      • if has one failure, delete build D
      • if some build in joinList is not done, do nothing
      • if all finished successfully, start build D

@jithine
Copy link
Member

jithine commented Feb 28, 2020

@tkyi
Copy link
Member

tkyi commented Mar 9, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature workflow Pipeline workflow related issues
Projects
No open projects
Development

No branches or pull requests

5 participants