New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Pipeline linear execution with graph based execution #168

Open
tejal29 opened this Issue Oct 19, 2018 · 19 comments

Comments

Projects
5 participants
@tejal29
Copy link
Contributor

tejal29 commented Oct 19, 2018

Expected Behavior

Tasks in pipeline can run in parallel if:

Actual Behavior

Right now, all Tasks execute linearly in the order they are defined in in the Pipeline

Steps to Reproduce the Problem

Parallel case:

  1. Create Pipeline with 2 tasks which do not depend on each other
  2. Create a PipelineRun for this task
  3. Make sure they are executed in parallel

from case:

  1. Create Pipeline with 2 tasks , one which needs a resource from another, and declare them in the opposite order they should run in (i.e. put the Task the resource is from last)
  2. Create a PipelineRun for this task
  3. Make sure they are executed in the right order

runAfter case

  1. Create Pipeline with 2 tasks , one which is supposed to runAfter another and declare them in the opposite order they should run in (i.e. put the Task that should berunAfter last)
  2. Create a PipelineRun for this task
  3. Make sure they are executed in the right order

And combining these cases!

Additional Info

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Oct 22, 2018

@tejal29 could you add a bit more detail about what the expected behaviour is after this task is implemented when you DO need tasks to execute linearly? e.g. Task A needs to execute before Task B (I'm assuming this is covered by the functionality @pivotal-nader-ziada added in #160?)

@tejal29

This comment has been minimized.

Copy link
Contributor Author

tejal29 commented Oct 22, 2018

@bobcatfish, in #160, we did not get rid of linear execution. I had a chat with @pivotal-nader-ziada and we decided to remove the logic in another PR.
#160, does address task dependency by waiting on Task A to complete if Task B is depending on it. will add more on that in description.

@tejal29 tejal29 assigned tejal29 and unassigned tejal29 Oct 22, 2018

@bobcatfish bobcatfish moved this from To do to Ice box in Pipeline CRD Oct 22, 2018

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Oct 22, 2018

@bobcatfish, in #160, we did not get rid of linear execution

I just want to make sure that when we implement this task, we still support linear execution. This task as it's currently worded sounds like we will only be able to run tasks in parallel, which would be a step backward from where we are now.

We don't want to get rid of linear execution, we want to support more use cases

@pivotal-nader-ziada

This comment has been minimized.

Copy link
Contributor

pivotal-nader-ziada commented Oct 23, 2018

The plan is to use passedConstraint to set dependency (order) of tasks, if there is no dependency, why would you want them to run in sequence?

@tejal29

This comment has been minimized.

Copy link
Contributor Author

tejal29 commented Oct 23, 2018

+1 to what @pivotal-nader-ziada said. It respects passedConstraint to determine if a task can run.

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Oct 23, 2018

The plan is to use passedConstraint to set dependency (order) of tasks, if there is no dependency, why would you want them to run in sequence?

Sometimes you want to run Tasks in sequence, and you would do this with passedConstraints.

I'm just saying that saying we are "removing" linear execution is very confusing if you don't already know intimately how the controller is implemented. But this is a moot point since @tejal29 is the one implementing it and knows what this means :)

@pivotal-nader-ziada

This comment has been minimized.

Copy link
Contributor

pivotal-nader-ziada commented Oct 23, 2018

@bobcatfish if the tasks are not related at all, is there a need to run them in sequence? if yes, we might need a task to identify that in the yaml

@pivotal-nader-ziada

This comment has been minimized.

Copy link
Contributor

pivotal-nader-ziada commented Oct 23, 2018

concourse has an optional serial flag to indicate you want the tasks in the pipeline to run in sequence, maybe we can add something similar to the pipeline type?

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Oct 23, 2018

I think we are talking about 2 different things:

  1. Being able to express the order your Tasks execute in via the order they appear in a Pipeline
  2. Having the ability to express a sequence of Tasks that execute in a desired order

I'm not saying we need (1), as far as I'm concerned this is just a syntax thing, not really a feature.

After this Task, we'll still have (2), which is the only thing I'm saying is important. I want to add features, not remove them.

@tejal29

This comment has been minimized.

Copy link
Contributor Author

tejal29 commented Oct 23, 2018

@bobcatfish i think you got confused by the title "Remove linear Execution". I will change it to "Remove Default Linear Pipeline Execution"

@bobcatfish bobcatfish changed the title Remove Linear pipeline Execution. Remove Default Linear Pipeline Execution Oct 24, 2018

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Oct 24, 2018

Thanks @tejal29 I think that clears it up! I'm going to move some of this discussion into #137 re. passedConstraints in general.

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Nov 20, 2018

The initial implementation of the DAG is available in https://github.com/knative/build-pipeline/blob/master/pkg/reconciler/v1alpha1/pipeline/resources/dag.go - the next step will be to update the Pipeline to actually use this.

@bobcatfish bobcatfish self-assigned this Dec 4, 2018

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Dec 19, 2018

Validate that extra params + resources aren't provided
While working on creating an end to end test for knative#168, @dlorenc and I
had a very hard time determining why our pipeline wasn't working as
expected - it turned out that we were using the wrong resource name in
our `providedBy` clauses.

This is the first of a couple follow up commits to add more validation:
this one makes sure that extra resources are not provided (e.g. if you
typo-ed a param with a default value, you may never know your param was
being ignored).

(Shout out to @vdemeester - I updated the reconciler tests both before
and after his test refactoring, and it was nearly impossible to
understand the tests before his builders were added: the builders made
it waaaaay easier! 🙏

Follow up on knative#124

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 22, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 22, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of knative#320

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 22, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of knative#320

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 23, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of knative#320

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 23, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of knative#320

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 23, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of knative#320

knative-prow-robot added a commit that referenced this issue Jan 24, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see #168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of #320

zhyuri added a commit to zhyuri/build-pipeline that referenced this issue Jan 24, 2019

On Pipeline creation, verify order of `providedBy` constraints
Tasks should only rely on outputs of tasks that execute before them,
they shouldn't be able to rely on outputs of future tasks. This also
means we no longer need the `canRunTask` function: since Tasks currently
execute linearly (see knative#168), the only way `canRunTask` could return
false would be if `Tasks` in `providedBy` actually executed _after_ the
one currently being evaluated, which would result in an unrunnable and
invalid Pipeline. Now we will catch this on Pipeline creation.

Part of knative#320

@bobcatfish bobcatfish removed their assignment Jan 24, 2019

@bobcatfish bobcatfish changed the title Remove Default Linear Pipeline Execution Replace Pipeline linear execution with graph based execution Jan 24, 2019

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Jan 24, 2019

I haven't gotten to this in ages so I'm taking myself off of it - the one thing I did tho was create an end to end test for the functionality which I'll try to push in a branch if I ever get around to it.

If someone reading this wants to work on it, ping me if you'd like to see the end to end test (np if you want to start fresh).

Note that in #320 we talked about also adding runAfter functionality in addition to providedBy, we might want to implement that here too.

@vdemeester

This comment has been minimized.

Copy link
Member

vdemeester commented Jan 25, 2019

@bobcatfish 👋 #always-interested

@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Jan 25, 2019

hahaha o really @vdemeester , v exciting!!!! ❤️

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 28, 2019

Add initial integration test for dag knative#168
Three caveats to this integration test:
- It was created before knative#320, so the resource binding is not correct
- It was created before knative#387, so it relies on the log PVC which will no
  longer exist (can work around this by mounting a PVC explicitly in the
  test and writing directly to it instead of echoing?)
- It doesn't exercise `runAfter` functionality
@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Jan 28, 2019

Updated description of Task to include runAfter from #320 :D

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 28, 2019

Add initial integration test for dag knative#168
Three caveats to this integration test:
- It was created before knative#320, so the resource binding is not correct
- It was created before knative#387, so it relies on the log PVC which will no
longer exist (can work around this by mounting a PVC explicitly in the
test and writing directly to it instead of echoing?)
- It doesn't exercise `runAfter` functionality

bobcatfish added a commit to bobcatfish/build-pipeline that referenced this issue Jan 28, 2019

Add initial integration test for dag knative#168 🤓
This tests DAG functionality by defining a pipeline with both fan in and
fan out. The idea is that each Task echoes the current time, so after
the pipeline compeletes, we can look at which Task echoes which which
time to make sure they run in order. The tasks are also declared in the
Pipeline in the wrong order on purpose, to make sure that the order of
declaration doesn't affect how they are run (the opposite of the current
functionality)

Three caveats to this integration test:
- It was created before knative#320, so the resource binding is not correct
- It was created before knative#387, so it relies on the log PVC which will no
longer exist (can work around this by mounting a PVC explicitly in the
test and writing directly to it instead of echoing?)
- It doesn't exercise `runAfter` functionality
@bobcatfish

This comment has been minimized.

Copy link
Contributor

bobcatfish commented Jan 28, 2019

Okay @vdemeester my test is at bobcatfish@39f7664 but it's pretty rough and there are a few caveats :D

(ps should we assign this one to you? 😜 )

@vdemeester

This comment has been minimized.

Copy link
Member

vdemeester commented Jan 28, 2019

/assign

@afrittoli

This comment has been minimized.

Copy link
Contributor

afrittoli commented Jan 28, 2019

It might make sense to limit the maximum number of parallel tasks to be executed at any point in time for one pipeline.

@vdemeester vdemeester referenced a pull request that will close this issue Feb 4, 2019

Open

Use dag execution instead of linear one #473

8 of 10 tasks complete

@bobcatfish bobcatfish added this to the Moar features for 0.1 milestone Feb 6, 2019

@bobcatfish bobcatfish self-assigned this Feb 13, 2019

bobcatfish added a commit to vdemeester/build-pipeline that referenced this issue Feb 13, 2019

Add DAG to docs 📎👣
Update the documentation such that Tasks no longer execute in the order
they are declared in the Pipeline, order is now controlled by `from` AND
`runAfter`.

We need to add `runAfter` as part of knative#168 because without it, all
ordering must be expressed with `from`, which would make our examples
really gross temporarily (not to mention be a lot of work and slow down
test execution) just to remove it as soon as we add `runAfter`, so we
might as well do it all at once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment