Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-43995] Per stage/step status #192

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@abayer
Copy link
Member

abayer commented Dec 7, 2017

JENKINS-43995

Downstream of jenkinsci/workflow-api-plugin#63

When creating a body StepEndNode for success, looks to the immediate
children to see if any have a FlowNodeStatusAction. If so, add a
FlowNodeStatusAction with the worst status of any of the children's
FlowNodeStatusAction to the body StepEndNode.

When creating a non-body StepEndNode, get the worst status from any of
its parents with BodyInvocationActions, and if non-null, add a
FlowNodeStatusAction to the non-body StepEndNode.

Note that we don't add a FlowNodeStatusAction for an abnormal outcome
in either case. The exception propagates to the body StepEndNode in
FailureAdapter, and onward from the body StepEndNode to its enclosing
non-body StepEndNode. getStatus() on one of these StepEndNodes will
return failure, as appropriate, while caught exceptions will not alter
the status of the enclosing StepEndNodes.

[JENKINS-43995] Per stage/step status
When creating a body StepEndNode for success, looks to the immediate
children to see if any have a FlowNodeStatusAction. If so, add a
FlowNodeStatusAction with the worst status of any of the children's
FlowNodeStatusAction to the body StepEndNode.

When creating a non-body StepEndNode, get the worst status from any of
its parents with BodyInvocationActions, and if non-null, add a
FlowNodeStatusAction to the non-body StepEndNode.

Note that we don't add a FlowNodeStatusAction for an abnormal outcome
in either case. The exception propagates to the body StepEndNode in
FailureAdapter, and onward from the body StepEndNode to its enclosing
non-body StepEndNode. getStatus() on one of these StepEndNodes will
return failure, as appropriate, while caught exceptions will not alter
the status of the enclosing StepEndNodes.

@abayer abayer requested a review from svanoort Dec 7, 2017

@svanoort

This comment has been minimized.

Copy link
Member

svanoort commented Dec 8, 2017

All kinds of test failures here...

@abayer

This comment has been minimized.

Copy link
Member Author

abayer commented Dec 8, 2017

Yeah, looks like I missed something.

@Override protected Void run() throws Exception {
FlowNode node = getContext().get(FlowNode.class);
if (node != null) {
node.replaceAction(new FlowNodeStatusAction(Result.fromString(result)));

This comment has been minimized.

Copy link
@svanoort

svanoort Dec 8, 2017

Member

Why replaceAction?

This comment has been minimized.

Copy link
@abayer

abayer Dec 9, 2017

Author Member

Because I blanked for a moment and forgot addOrReplaceAction existed. =)

@svanoort

This comment has been minimized.

Copy link
Member

svanoort commented Dec 8, 2017

So, first the good: I think you have a start at propagating status in a reasonablish way, and it seems like it's heading towards the right track.

When creating a body StepEndNode for success, looks to the immediate
children to see if any have a FlowNodeStatusAction. If so, add a
FlowNodeStatusAction with the worst status of any of the children's
FlowNodeStatusAction to the body StepEndNode.

Won't that be O(n) for every block? And in fact, in the face of try/catch or mechanisms to re-set status and handle/ignore unstable or error cases, it's guaranteed wrong unless you're doing something new here that I've missed while trying to finish initial review.

I believe we've discussed why that approach is not the way to go before (similar reasons).

When creating a non-body StepEndNode, get the worst status from any of
its parents with BodyInvocationActions, and if non-null, add a
FlowNodeStatusAction to the non-body StepEndNode.

That seems reasonably okay to me, AFAICT.

Note that we don't add a FlowNodeStatusAction for an abnormal outcome
in either case. The exception propagates to the body StepEndNode in
FailureAdapter, and onward from the body StepEndNode to its enclosing
non-body StepEndNode. getStatus() on one of these StepEndNodes will
return failure, as appropriate, while caught exceptions will not alter
the status of the enclosing StepEndNodes.

I think this sounds like we're heading towards the right approach but not quite there yet -- what about Unstable or other custom status rather than errors? Feels like we need to almost have a ResultAdapter that takes the block result and figures out what to do, not just error passing -- but we might actually be able to subsume a lot of the existing logic into that.

Missing things we probably need:

  • Step or groovy that can read the "current" status within a block and so we can do some logic on the basis of this -- this lets us convert UNSTABLE to FAIL or ignore UNSTABLE for some flaky tests for example (similar logic exists in freestyle). Might be able to get away with Groovy that fetches current status just like build.Result or something.
  • A flying squirrel. I mean, not for this PR or anything, just because they're cool. 😜

So yeah, this is a solid start -- some issues and some pieces that need to be approached differently but we're getting there. I think with a few rounds of discussion and modification this will arrive at something mergeable that does what it says and really improves the situation.

WorkflowRun b = r.assertBuildStatus(Result.FAILURE, r.waitForCompletion(j.scheduleBuild2(0).waitForStart()));
/*
Node dump follows, format:
[ID]{parent,ids}(millisSinceStartOfRun) flowNodeClassName stepDisplayName [st=startId if a block end node]

This comment has been minimized.

Copy link
@svanoort

svanoort Dec 8, 2017

Member

For the love of Azathoth, no - it's like reading a 🌵 . Many, small testcases, please! It's impossible to debug status problems with mega testcases, because the status issues tend to be complex.

You may wish to have a more complex one that is only used for things like nested blocks/parallels.

This comment has been minimized.

Copy link
@abayer

abayer Dec 9, 2017

Author Member

Well, that’s basically what these two tests are for - checking the more complex use cases. I do need to add simpler use case tests, though.

This comment has been minimized.

Copy link
@abayer

abayer Dec 11, 2017

Author Member

Kept these two tests but renamed them to be more specific, along with adding a simple test, a test for missing a body, and a test for an unknown Result string.

This comment has been minimized.

Copy link
@svanoort

svanoort Dec 15, 2017

Member

So, generally, what I've found is that these cases can be really complex to debug so it's extra important to keep basic tests simple and limit the intent of complex cases to the specific behavior you're trying to exercise -- see https://github.com/svanoort/pipeline-graph-analysis-plugin/blob/master/src/test/java/org/jenkinsci/plugins/workflow/pipelinegraphanalysis/StatusAndTimingTest.java -- the simpler cases are the only thing that made it sane.

Bismuth in wf-api was a bit nightmarish because of how nasty the complex parallel cases were.

I think your adding smaller tests will help -- might be good to split these up too though.

@@ -355,6 +356,19 @@ public Next receive(Object o) {
@Override
public Next receive(Object o) {
StepEndNode en = addBodyEndFlowNode();
Result r = null;

This comment has been minimized.

Copy link
@svanoort

svanoort Dec 8, 2017

Member

This block needs to be stripped out and replaced with a smarter mechanism for handling status that is able to handle the try/catch and similar contortions (or use of the Status step) correctly though.

I strongly suggest carrying a status object in the StepContext and doing something cleverish to deal with try/catch (maybe something integrating with the continuations/closure or whatnot shrug - that kind of magic is up your alley).

This comment has been minimized.

Copy link
@abayer

abayer Dec 9, 2017

Author Member

As I mentioned, we don’t need to worry about try/catch.

This comment has been minimized.

Copy link
@abayer

abayer Dec 10, 2017

Author Member

Oh, and I don’t think the StepContext idea works on a couple levels. First, a status object in StepContext would be “global” to everything in the block, so we’d have trouble, say, recording different statuses for different subblocks or steps, as far as I can tell. Second, that approach would need to deal with try/catch, unlike my approach, since a step inside a try/catch would rightfully want to set its status as an error but the enclosing block would have no way of knowing that the status objec that had now been changed to error only applied to that one step, not the whole block.

See my other comments for me attempting to explain the thinking better.

import java.util.Collections;
import java.util.Set;

public final class SetStatusStep extends Step {

This comment has been minimized.

Copy link
@svanoort

svanoort Dec 8, 2017

Member

Yeah, I wanted something like this but don't think it'll work as implemented due to the issue above with status within a block -- I think it needs to touch some sort of common status/Result within the running block.

@abayer

This comment has been minimized.

Copy link
Member Author

abayer commented Dec 8, 2017

@svanoort We don’t have to worry about try/catch - those cases aren’t setting FlowNodeStatusAction, so have no status (unless something was manually setting status, which we want to take heed of regardless). If there’s no FlowNodeStatusAction and no ErrorAction (as there would be for an uncaught exception), then the node/block in question would just present as successful.

@abayer

This comment has been minimized.

Copy link
Member Author

abayer commented Dec 9, 2017

So, trying to write up the reasoning here in a more coherent manner than I’ve done previously... =)

Currently, there are two possible “statuses” for a FlowNode: success or failure. Failure means there’s an exception thrown and uncaught in the step in question (or, if on a BlockEndNode, somewhere within the block), success means there isn’t. What I’m doing here is adding more granularity of status within what’s currently the success case. FlowNodeStatusAction is only added to an AtomNode deliberately - i.e., if a step explicitly wants to set the status of its FlowNode, like the status step here or the downstream junit step. Thrown exceptions, whether uncaught or caught by try/catch or catchError, do not add a FlowNodeStatusAction.

It is entirely possible that a step could choose to explicitly add a failure status action immediately before throwing an exception, and yes, if that exception is caught, the failure status action will still propagate outwards. That, I believe, is the right behavior.

...I’m still not coming across very coherent, am I?

@svanoort

This comment has been minimized.

Copy link
Member

svanoort commented Dec 15, 2017

I think I get you a bit more -- still not clear on the logic behind the try/catch and why you want to avoid a status that is modified as needed by running steps (sees simpler than trying to infer status). That approach really does seem simpler to me.

What I’m doing here is adding more granularity of status within what’s currently the success case. FlowNodeStatusAction is only added to an AtomNode deliberately - i.e., if a step explicitly wants to set the status of its FlowNode, like the status step here or the downstream junit step. Thrown exceptions, whether uncaught or caught by try/catch or catchError, do not add a FlowNodeStatusAction.

Here's the thing: that sounds like it'll make our life simpler but it actually can do the opposite, because there's some complexity to inferring the correct status. I really want to be able to just fetch an attached status action (or Tag or whatever), because then we can sidestep all the nasty chains of logic we're using in Pipeline Graph Analysis library. There's no ambiguity about the "right" way to combine/interpret statuses as a FlowGraph consumer -- you just fetch the attached one and go.

Basically my hope is an either-or -- legacy pipelines have to do the nastiness (no status action), but if there's one attached to a flownode, we use that directly.

@abayer

This comment has been minimized.

Copy link
Member Author

abayer commented Dec 15, 2017

We’ll talk more tomorrow, but FlowNode#getStatus() in this suite of PRs is exactly what you’re looking for. It’ll tell you the status of that step/block, without any further logic or complication needed. It does abstract over a combination of isActive(), ErrorAction, and FlowNodeStatusAction to get there, but that’s all self contained, and avoids redundancy (I.e., isActive()==true means building, ErrorAction means failure (with further disambiguation via the action’s exception), FlowNodeStatusAction means an explicitly set status, and none of the above means success.

Might need to get a little more complicated due to deal with paused for input, though. But just a little.

@abayer

This comment has been minimized.

Copy link
Member Author

abayer commented Dec 15, 2017

And my reason for not doing a status object that gets modified down the block - first, I don’t think we can actually do it easily or maybe not at all, and second, if we set the enclosing block status from within a step in that block due to a failure, we don’t know if that failure was in a try/catch, so while it may be the right status for the step, if it’s caught, it’s not the right status for the block.

@jglick

This comment has been minimized.

Copy link
Member

jglick commented Dec 15, 2017

When creating a body StepEndNode for success, looks to the immediate children to see if any have a FlowNodeStatusAction. If so, add a FlowNodeStatusAction with the worst status of any of the children's FlowNodeStatusAction to the body StepEndNode.

I have little context here but this sounds wrong. DRY and keep things immutable. If a step has some result, its FlowNode and only that node should record that new information. If you want some higher-level convenience API to yield the effective status of a block, write that, and apply whatever in-memory caching you like. If the calculation of the initial value is too slow because graph traversal and/or storage has not yet been optimized, well optimize it.

not doing a status object that gets modified down the block - first, I don’t think we can actually do it easily or maybe not at all

Correct, absolutely do not do that. It might happen to work (in a non-debuggable way) in CpsFlowExecution, but it cannot be guaranteed in alternate implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.