New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: JENKINS-41334: Parallel stage execution #126
Conversation
abayer
commented
Feb 27, 2017
- JENKINS issue(s):
- JENKINS-41334
- Description:
- RFC for parallel stages in Declarative. [JENKINS-41334] Very very preliminary stage parallelism #98 is pertinent.
- Documentation changes:
- n/a - this is just an RFC.
- Users/aliases to notify:
- @rsandell
- @HRMPW
- @kzantow
- @michaelneale
- @i386
- @bitwiseman
|
||
#### Pro/Con | ||
|
||
* Stage status/metadata (i.e., skipped due to failure/`when`) would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be fixed though -- in all honesty, probably simpler than full nested stages support. Would require some work with pipeline-graph-analysis and extensions of the TagsAction for the status coding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not simpler from the Declarative perspective, given that we'd still have to have different code for executing nested stages-but-actually-just-parallel-branches and executing non-nested actual-stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And worth mentioning that I'm putting a healthy emphasis on keeping a common code path for all stage
executions - if we have to differentiate at runtime between a parallel stage
and a non-parallel stage
, that's just going to create problems down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivek - Also, I don't know how BO does status visualization for parallel branches currently. If we had equivalent metadata (i.e., the TagsAction
stuff) on a parallel branch, would that be visualized in Blue Ocean as things are now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer I am using pipeline-graph-analysis to compute status of parallel branch. To be specific https://github.com/jenkinsci/pipeline-graph-analysis-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/pipelinegraphanalysis/StatusAndTiming.java#L152. So maybe some work is needed there @svanoort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, we're just talking about the metadata from https://github.com/jenkinsci/pipeline-model-definition-plugin/blob/master/pipeline-stage-tags-metadata/src/main/java/org/jenkinsci/plugins/pipeline/StageStatus.java. That is to say - nothing new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer I was unaware you'd added that. Congratulations! You have now broken status handling, exactly as I described, and ensured that declarative is no longer truly compatible with pipeline because it uses a completely different status-coding mechanism that is declarative-specific. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99% sure you 👍 ed it at the time... =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer Pretty sure I didn't +1 anything like this in its current form ;-) -- I would have given this a hard 🐛 and "do not do this, not ever" if I'd seen it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer So, here's the scoop, after looking closer at the history and context: using these tags is okay as long as you are also generating appropriate ErrorActions, etc from running the pipeline.
I think unless we missed something critical at the time that the old way of using these was to assist on top of that mechanism by adding some additional context.
What it sounds like is that there's a possibility of replacing the original flow structure and action information with the Metadata, for functional purposes. And that is very incompatible.
The flow needs to either execute correctly, or we need to make an upstream change in workflow-api to support use of status tags on nodes/blocks as a superset of just the current mechanisms.
* Nested stages would allow for identical code paths for execution of | ||
a stage regardless of whether the stage is a nested stage or not, so | ||
the code would be simpler than for parallel branches. | ||
* Blue Ocean supports visualization of parallel branches already, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @i386 is this ever coming? My understanding is there was some discussion of never supporting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically the use of parallels inside stages, I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallel inside stage is supported. So just one level deep, nested stages/parallels within parallel branches within a stage are not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So parallel(foo: { stage('first nested') { ... } }, bar: { stage('second nested') { ... } }
is not supported, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abayer that is not supported today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@i386 I'm not suggesting you should support it necessarily here, so you can relax -- but I think Andrew's proposal would require it 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm the one saying we need it. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... I can't remember those mockups... do you happen to remember what medium we stored them in @abayer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see this is a great improvement to parallel, but without radically changing how a stage graph may behave - fair to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
cc @vivek as well, now that I think about it. |
|
||
#### Details | ||
|
||
* A `stage` has to contain one and only one of `steps` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that we can definitely opt to block multiple levels of nested stage
s if we so choose. I'm starting with the default assumption that we allow arbitrary levels of nesting, but there's no technical reason keeping us from limiting it to one level. Now, limiting to 2 or 3 or any other >1 level of nesting, that'd be annoying, but would also be goofy, so...
defer to the Blue Ocean team for the final decision on visualization | ||
of parallel stages. | ||
|
||
At least initially, we would recommend only visualizing the first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to be clear that I strongly believe that the language only supports levels of nesting that the visualization can handle. I personally have to triage many tickets relating to combinatorial stage nesting and users are frequently perplexed to why the script dialect supports N stage/parallel nesting but none of the visualizations do. Evidence for this can be found ad infinitum in JENKINS-38442.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence my comment above in re: limiting the nesting level supported by Declarative. =)
I have the following stages A, B, C & D. B depends on A, C can be done in parallel to both A & B and finally D depends on A, B & C. How would this above support this?
I think the embedded approach limits the possibility because sometimes the visual aspect (embed) doesn't apply well to advanced pattern such as the above.. |
@ykoehler what is C doing in this case? @i386 and I just talked about this use-case earlier today but would really like more information on the problem it solves. There are various ways we could do something like that. One of which is captured here: https://issues.jenkins-ci.org/browse/JENKINS-26052 |
Given that I hear and see the need for some type of matrix support _(e.g. defining one set of steps that will be executed in parallel with just a difference in environment and agent)_I think we need to prepare for that coming quite soon, and we should at least think about how it would be impacted by this so that the matrix support doesn't get squished in with some half measures because we designed this part the wrong way. |
@ykoehler @HRMPW Let's say that my project build requires libs B and C, and in turn lib B requires lib A, and before any of that we get code from SCM.
Furthermore, for the sake of argument let's give those libs estimated build times:
In the current declarative pipeline syntax the most efficient thing we can do is make an educated guess about which
This results in a pipeline flow graph that looks like this:
Which at least parallelizes the longest running pieces of the process, but it still has C arbitrarily waiting for A to finish. The desired pipeline flow graph would look exactly like the first dependency graph, where A --> B runs 100% parallel to C.
Or, taking a more pseudo-declarative-pipeline stab at it myself where
What I and many coworkers want is a [reasonably] arbitrary number of stages in each parallel path, with proper This type of logic is particularly useful for:
Aaaand that was a novel. |
@joshsleeper good "novel" - I enjoyed reading it. In the case of:
You know the dependencies, so you lay them out clearly. YOu can effectively do this now but without stages by using steps. This still lets you run A->B but with C concurrently, just there won't be a discrete circle going green for A or B, but A&B together: It would look like this:
There is no functional difference to the functionaltiy, it is just that build A&B are not separated out into their own grouping (stage) yet. But it will work. You just see one circle for A&B, but it doesn't stop C from taking its own time. Your proposal for using nested stages looks logically correct to, IF/When it can be visuall supported (in the stage graph, and visual editor - takes time. Know SVG and reactjs and html5?). This PR is more about the first change - ie getting stage into parallel so it looks consistent and then opens the way to the type of thing you suggest, which I like the look of. As for your final points:
There was an attempt in the past to do a more "declarative" dependency chain which could calculate the graph for you - but it was viewed as more confusing for many to use. But who knows, it could happen. But your suggestion of path - that is functionally the same as your earlier suggestion right? I don't think that is needed as the outer parallel can easily work that out from the nested stages in it. For the final point: There is planned support specifically for matrix - which is what most people want here. Ie specifying some dimensions and data - and letting the pipeline how to work it out. It also changes the visualisation to be more "matrix like" as you don't want some explotion of parallel lines here anyway. I have now added to your novel |
@michaelneale Good points all around, thanks for the feedback!
The problem, from my standpoint, is quickly determining where something failed. That said, extend that logic and say that one parallel path has 10 things and another has 2. Another example was something my co-worker brought up after switching to declarative pipeline syntax for his jobs. He was doing a
When what he wanted to see was this:
Not saying that has to be in the first revision here, but I am saying that that type of visualization is desired for parallelism and I see a lot of value there.
At what point does this become an omnibus? |
@joshsleeper yeah I think we are on same page. That last graph of yours is clearly a type of matrix, so that is good. And yes, I can totally see why you need that fast feedback of where in the graph things have failed. https://issues.jenkins-ci.org/browse/JENKINS-38442 is for nested stages etc - from a front end perspective. |
Just throwing a completely different "half baked/half stupid" idea out there;
Any non "grouped" stage means waiting for the previous declared "grouped" stages to finish. |
@rsandell that looks pretty confusing (and visualising it, kidn of breaks the idea of having stages at all). I am not sure what that does that others can't funcitonally do, unless we want to give up the idea of stages and just have a graph of nodes (which could be better done with dependencies anyway). |
@vivek @i386 @michaelneale I've updated the RFC with the decision to simply use |
What happens when Blue Ocean implements support for nested stages inside |
If Blue Ocean's eventual implementation for visualizing something like parallel(a: { stage('a') { echo 'Hi, this is a' } },
b: { stage('b') { echo 'Hi, this is b' } }) ends up being different than how it visualizes parallel(a: { echo 'Hi, this is a' },
b: { echo 'Hi, this is b' }) then we'll want to change Declarative's runtime for parallel "stages" accordingly. But that's a bridge we can cross if/when that happens. |
@abayer To be clear, this would enable support for things like |
@joshsleeper Yes, exactly. =) @michaelneale So this means no change at all from the proposed Declarative syntax. It's just a change to how we execute that syntax behind the scenes, using something like: parallel(a: { echo 'Hi, this is a' },
b: { echo 'Hi, this is b' }) ...which Blue Ocean already visualizes. |
@michaelneale that is why I was asking about future. Right now BO would ignore |
@abayer Mmk, great. To further clarify, this also does not facilitate displaying multiple parallel stages, correct? That is, I still can't get a visualization of something like this (where paren sets denote a visualized "stage"):
But I could get the same functional parallelism by doing this instead:
|
@joshsleeper So this is for executing a list of stages in parallel with each other - i.e., one What you're talking about is what I've been calling "matrix equivalence" - that's most definitely a longer-term goal, both for the Declarative syntax/editor and Blue Ocean visualization, but there is a lot of work (both design and implementation) to get there, so I don't yet have a timeframe for when we will hopefully be able to do that. =) |
@abayer ok - so the syntax changes to accommodate per branch agent config (for example) but really it is the same thing in terms of what the result are (sorry I am way out of the loop here). |
@michaelneale Basically, yes. There's some work needed on the Declarative runtime side to make sure that the supplementary metadata we add to stages now (like I said, to mark a stage as "skipped due to REASON" and the like) can be added to parallel branches as well. As far as I can tell from quick looks at |
My input has been requested into the implementation here, so after taking another look: This feels like it's more likely to make declarative more confusing rather than more "lovable" because of the added use confusion and code complexity that will result. A couple points:
My strong recommendation would be to force a different syntax for multi-component parallel branches that creates a simpler structure for when you need to run through a matrix of options. Maybe something like a for-each-combination which uses the same stages for each branch but allows you to apply a different input variable to each. This ensures consistent structure and removes some of the variables, but still lets you accommodate the base use cases. But at this point, I honestly don't understand Declarative -- it has clearly become its own little world so who knows? I only know the challenges that came with throwing the world wide-open in normal pipeline. |
* `stage` configuration can be done on both a `stage` containing | ||
`parallel`, or `stage`s within `parallel`, i.e., `agent`, `when`, | ||
`post`, etc. | ||
* `agent`, `environment`, and `tools` specified on the "parent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need inheritance here? Weigh this carefully -- it greatly increases the potential for user confusion and bugs. Would it make more sense ensure all branches follow a common set of parameters (or possibly take an input for each branch to use in the given property)?
`parallel`, or `stage`s within `parallel`, i.e., `agent`, `when`, | ||
`post`, etc. | ||
* `agent`, `environment`, and `tools` specified on the "parent" | ||
`stage` will apply to child `stage`s in the same way that top-level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm thinking at the moment is to allow environment
to be specified on the parent stage
(since that's something I can see applying, with overrides, across all the child stages). But for the others, yeah, inheritance actually seems like a bad idea (beyond the always there default inheritance to the top-level). In which case, we'd probably reject agent
and tools
fields from a parent stage entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps. I think we may want to run take a closer look specifically at the options allowed and when inheritance does/does not make sense, and where it might make sense to use something other than inheritance, potentially (leaving that one open here for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough - I'd say environment
should always inherit, but I can see an argument for not even having the top-level agent
and tools
be inherited by parallel stages - or perhaps inheriting the configuration for agent
as a default config for the parallel stages but not by default running on the same node
block as the top-level...will ponder.
* `agent`, `environment`, and `tools` specified on the "parent" | ||
`stage` will apply to child `stage`s in the same way that top-level | ||
configuration applies to `stage`s currently. | ||
* Arbitrarily deep nesting of parallel `stage`s would not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is smart.
* Arbitrarily deep nesting of parallel `stage`s would not be | ||
allowed. Only `stage`s that themselves are not within a `parallel` | ||
would be able to contain further `parallel` `stage`s. | ||
* `stage`s within a `parallel` would not allow use of the `parallel` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart, good common sense.
would be able to contain further `parallel` `stage`s. | ||
* `stage`s within a `parallel` would not allow use of the `parallel` | ||
step either, so as to prevent visualization confusion. | ||
* Naming of `parallel` is up for debate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend something like "foreach" or "branches" to create a distinction between this implementation and the different vanilla pipeline step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on names like branch or foreach - would want to pick something more declarative (and branch creates confusion already..)
changes in Blue Ocean visualization or Bismuth in order to work, meaning only | ||
changes to the editor will be needed outside of Declarative itself. | ||
|
||
Each `parallel` branch will be given the `stage` name specified, with `agent`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 on this. If you want to declare a branch, call it a branch, not a stage. Pre/post can be done, but I do not think we could or should effectively support using metadata with branches, and especially conditional execution at this point. This is a hack and a potentially dangerous one. There's a right way to do this, and that's to start looking at a richer execution & status model for pipeline -- but that needs to be provided from a lower level API and then bubble up to the Declarative which takes advantage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to wait for such functionality to be introduced, assuming we can get the ball rolling on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there are already nested stages (they aren't very useful, but they are there). FWIW a lot of people use the term "stage" when talking about parallel branches of execution (doesn't make it right, but there is some intuition in there going on I guess).
@svanoort Thanks for the feedback! In re a couple things you brought up:
|
A couple quick replies before I dive back into other stuff to wrap up today:
Think carefully about the implications of this for comparing one parallel vs. another -- Vivek and I spent the better part of 2 hours talking through how to match pipelines with different parallel & stage structures & names for purposes of gathering averages & and metrics. The conclusion was a set of heuristics but knowing it would always have problems. My gut is that "when" should only apply to the overall parallel block, never specific branches (to simplify), making a parallel "all or none." If someone wants more, that's a perfect case for vanilla pipeline.
That's actually a very useful part of parallel IMO, but I can't think of a sensible way to manage it along with pre/post either. Agree it's probably best to say "if you want this, you need full pipeline." |
@svanoort So you mean only allowing |
I'm going to mark this as approved - we can revise it later if needed, but I think we've got a general consensus. |
Though I am moving the target release to 1.3 - 1.2 may well happen too soon for Blue Ocean to be ready for this. |