feat(orchestration): Allow sibling stages to continue on FAILED_CONTINUE #3252

marchello2000 · 2019-10-23T20:39:31Z

Normally, if a synthetic child stage fails with FAILED_CONTINUE the CompleteStageHandler
will propagate the FAILED_CONTINUE status to the parent. This will prevent all subsequent
sibling stages from executing.
This change allows for an option (similar to Tasks) to continue execution even when a
child stage has a status=FAILED_CONTINUE

This option can only be set on a stage that is a synthetic child.
This is needed for Monitored Deploy where we want to allow the deploy to proceed even if the
monitoring fails

ajordens · 2019-10-23T22:09:34Z

orca-queue/src/main/kotlin/com/netflix/spinnaker/orca/q/handler/CompleteStageHandler.kt

@@ -142,7 +142,7 @@ class CompleteStageHandler(

        // When a synthetic stage ends with FAILED_CONTINUE, propagate that status up to the stage's
        // parent so that no more of the parent's synthetic children will run.
-        if (stage.status == FAILED_CONTINUE && stage.syntheticStageOwner != null) {
+        if (stage.status == FAILED_CONTINUE && stage.syntheticStageOwner != null && stage.allowSiblingStagesToContinueOnFailure) {


Shouldn't this be !stage.allow...? (based on the comment of not wanting to propagate up to the parent if siblings are allowed to run)

Yes, thank you! change the name of the var last minute that inverted the meaning and forgot to update this. i guess i should find and update tests for this

oh, look, the existing tests caught it... fixed up and added a new test

ajordens · 2019-10-23T22:12:45Z

orca-core/src/main/java/com/netflix/spinnaker/orca/pipeline/model/StageContext.java

+   * @param defaultValue default value to return if key is not present
+   * @return value or null if not present
+   */
+  Object getCurrentOnly(@Nullable Object key, Object defaultValue) {


Not sure how I feel about getCurrentOnly vs overloading get() -- will let someone else weigh in.

you mean something like

public Object get(@Nullable Object key, Object defaultValue, boolean localOnly)

?

+1 on overloading get(...)

Just curious, why?

I think

stage.context.getCurrentOnly("someValue", false);

is more descriptive than:

stage.context.get("someValue", false, true);

get() because the context is a map. Matter of preference, no strong opinions.

I hear you. This area has caused some confusion in the past (where something comes from in the context) so I am more inclined to make this very explicit

Normally, if a synthetic child stage fails with `FAILED_CONTINUE` the `CompleteStageHandler` will propagate the `FAILED_CONTINUE` status to the parent. This will prevent all subsequent sibling stages from executing. This change allows for an option (similar to Tasks) to continue execution even when a child stage has a `status=FAILED_CONTINUE` This option can only be set on a stage that is a synthetic child. This is needed for Monitored Deploy where we want to allow the deploy to proceed even if the monitoring fails

This change depends on spinnaker#3252 Sometimes, `DisableClusterStage` can fail. We don't want failures in `DisableClusterStage` prenventing the subsequent `ShrinkCluster` from trying to run and clean up the canary+baseline clusters

marchello2000 · 2019-10-25T21:31:00Z

Going to merge this because there are teams blocked on this. If you have feedback please give it to me and I will incorporate

This change depends on #3252 Sometimes, `DisableClusterStage` can fail. We don't want failures in `DisableClusterStage` prenventing the subsequent `ShrinkCluster` from trying to run and clean up the canary+baseline clusters

This is a follow up to spinnaker#3259 because I rebased my local changes incorrectly. Sometimes, `DisableClusterStage` can fail. We don't want failures in `DisableClusterStage` preventing the subsequent `ShrinkCluster` from trying to run and clean up the canary+baseline clusters This change buids on top of the functionality introduced in spinnaker#3252 to allow sibling stages to run even if a stage fails (correct error code will be propagated to the execution if this happens so it's clear to the user)

This is a follow up to #3259 because I rebased my local changes incorrectly. Sometimes, `DisableClusterStage` can fail. We don't want failures in `DisableClusterStage` preventing the subsequent `ShrinkCluster` from trying to run and clean up the canary+baseline clusters This change buids on top of the functionality introduced in #3252 to allow sibling stages to run even if a stage fails (correct error code will be propagated to the execution if this happens so it's clear to the user)

This is a follow up to spinnaker#3259 because I rebased my local changes incorrectly. Sometimes, `DisableClusterStage` can fail. We don't want failures in `DisableClusterStage` preventing the subsequent `ShrinkCluster` from trying to run and clean up the canary+baseline clusters This change buids on top of the functionality introduced in spinnaker#3252 to allow sibling stages to run even if a stage fails (correct error code will be propagated to the execution if this happens so it's clear to the user)

marchello2000 requested review from robfletcher, ajordens and asher October 23, 2019 20:39

ajordens reviewed Oct 23, 2019

View reviewed changes

marchello2000 force-pushed the mark/allowsSiblingStages branch from da0182b to 10e6b31 Compare October 24, 2019 23:26

marchello2000 mentioned this pull request Oct 25, 2019

fix(kayenta): Make sure we destroy canary clusters #3259

Merged

marchello2000 added 2 commits October 25, 2019 14:06

Merge branch 'master' into mark/allowsSiblingStages

c6a37b2

Merge branch 'master' into mark/allowsSiblingStages

b14c167

marchello2000 merged commit ae0e2cf into spinnaker:master Oct 25, 2019

marchello2000 deleted the mark/allowsSiblingStages branch October 25, 2019 21:55

spinnakerbot added the target-release/1.17 label Oct 25, 2019

This was referenced Oct 27, 2019

fix(kayenta): Make sure we destroy canary clusters #3262

Merged

feat(monitoreddeploy): allow overriding maxAnalysisMinutes and failOnError #3261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(orchestration): Allow sibling stages to continue on FAILED_CONTINUE #3252

feat(orchestration): Allow sibling stages to continue on FAILED_CONTINUE #3252

marchello2000 commented Oct 23, 2019

ajordens Oct 23, 2019

marchello2000 Oct 23, 2019

marchello2000 Oct 24, 2019

ajordens Oct 23, 2019

marchello2000 Oct 24, 2019

jeyrschabu Oct 24, 2019

marchello2000 Oct 25, 2019

jeyrschabu Oct 25, 2019 •

edited

Loading

marchello2000 Oct 25, 2019

marchello2000 commented Oct 25, 2019

feat(orchestration): Allow sibling stages to continue on FAILED_CONTINUE #3252

feat(orchestration): Allow sibling stages to continue on FAILED_CONTINUE #3252

Conversation

marchello2000 commented Oct 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeyrschabu Oct 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marchello2000 commented Oct 25, 2019

jeyrschabu Oct 25, 2019 •

edited

Loading