add finalizing webhook #1194

cdlliuy · 2022-05-11T09:42:17Z

Fix of issue #1193
With the changes, we can see below "canary status"

succeed case

flagger-helloworld-demo-rollout   Waiting     0        2022-05-16T08:01:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:02:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:03:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:04:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:05:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:06:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:07:31Z
flagger-helloworld-demo-rollout   WaitingPromotion   0        2022-05-16T08:08:31Z
flagger-helloworld-demo-rollout   Promoting          0        2022-05-16T08:15:31Z
flagger-helloworld-demo-rollout   WaitingFinalising   0        2022-05-16T08:16:31Z
flagger-helloworld-demo-rollout   Finalising          0        2022-05-16T08:27:31Z
flagger-helloworld-demo-rollout   Succeeded           0        2022-05-16T08:27:31Z

falied case:

NAME                              STATUS      WEIGHT   LASTTRANSITIONTIME
flagger-helloworld-demo-rollout   Succeeded   0        2022-05-16T10:07:43Z
flagger-helloworld-demo-rollout   Succeeded   0        2022-05-16T10:07:43Z
flagger-helloworld-demo-rollout   Waiting     0        2022-05-16T11:12:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:13:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:14:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:15:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:16:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:17:04Z
flagger-helloworld-demo-rollout   WaitingFinalising   0        2022-05-16T11:18:04Z
flagger-helloworld-demo-rollout   Failed              0        2022-05-16T11:20:04Z
flagger-helloworld-demo-rollout   Failed              0        2022-05-16T11:20:04Z

The reason to have a 'WaitingFinalising" is to have a time to handle something left-over outside k8s.
i.e. we are using an external traffic management providers on azure (azure ATM). When canary starts to process, we changed the ATM policy to route some traffic to canary in waiting stage. Then before the canary scale down to zero in "Finalizing' stage, we need a time to route the traffic back to primary, so this need to apply to both succeed and failed cases.

If the idea is acceptable, I will continue to fix the unit test failures (mostly about the expected status changes)

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

aryan9600 · 2022-05-11T16:55:16Z

pkg/controller/scheduler.go

+		if cd.SkipAnalysis() {
+			c.recordEventInfof(cd, "Promotion completed! Canary analysis was skipped for %s.%s", cd.Spec.TargetRef.Name, cd.Namespace)
+			c.alert(cd, "Canary analysis was skipped, promotion finished.",
+				false, flaggerv1.SeverityInfo)
+		} else {
+			c.recordEventInfof(cd, "Promotion completed! Scaling down %s.%s", cd.Spec.TargetRef.Name, cd.Namespace)
+			c.alert(cd, "Canary analysis completed successfully, promotion finished.",
+				false, flaggerv1.SeverityInfo)
+		}
+		return
+	}
+
+	// check canary status
+	var retriable = true
+	retriable, err = canaryController.IsCanaryReady(cd)
+	if err != nil && retriable {
+		c.recordEventWarningf(cd, "%v", err)
+		return
+	}
+
+	// check if analysis should be skipped
+	if skip := c.shouldSkipAnalysis(cd, canaryController, meshRouter, err, retriable); skip {


What is the reason behind this? IMO, we should check and if required skip analysis, before promoting and/or finalizing.

reverted this change

pkg/apis/flagger/v1beta1/canary.go

aryan9600 · 2022-05-11T17:03:41Z

pkg/controller/scheduler.go

@@ -868,6 +868,10 @@ func (c *Controller) rollback(canary *flaggerv1.Canary, canaryController canary.

 	c.recorder.SetWeight(canary, primaryWeight, canaryWeight)

+	if ok := c.runConfirmFinalizingHook(canary, flaggerv1.CanaryPhaseFailed); !ok {


rollback() is meant to roll a Canary analysis back, why would it call the the webhooks meant to confirm canary finalization, which is supposed to indicate that the canary has been promoted and thus being finalized?

updated the reason in the description.

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

cdlliuy · 2022-05-19T03:56:26Z

@aryan9600 @stefanprodan can you help to review again and provide your feedback to see whether this idea to add a finalizing webhook is acceptable? Thanks!

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

aryan9600 · 2022-05-26T14:04:47Z

pkg/controller/scheduler.go

-			return
-		}
+	if (cd.Status.Phase == flaggerv1.CanaryPhaseFinalising ||
+		cd.Status.Phase == flaggerv1.CanaryPhaseWaitingFinalising) &&


Why are we checking for CanaryPhaseFinalising? From what I understand runFinalizing, runs the finalizing webhooks, sets the status to Finalizing, scales the canary deployment down to zero and then sets the status to Succeeded. If runFinalizing fails to scale down the canary deployment, then the canary would be in Finalizing status, and runFinalizing() would be called on the next run, which would subsequently call the finalizing webhooks as well.

aryan9600 · 2022-05-26T14:09:06Z

pkg/controller/scheduler.go

+	// check if the number of failed checks reached the threshold for rollback
+	if (cd.Status.Phase == flaggerv1.CanaryPhaseProgressing ||
+		cd.Status.Phase == flaggerv1.CanaryPhaseWaitingPromotion ||
+		cd.Status.Phase == flaggerv1.CanaryPhaseWaitingFinalising) &&


WaitingFinalizing comes between Promoting and Finalizing, we would not want to rollback after the canary has been promoted.

aryan9600 · 2022-05-26T14:16:09Z

pkg/controller/scheduler.go

+		if err := canaryController.IsPrimaryReady(cd); err != nil {
+			c.recordEventWarningf(cd, "%v", err)
+			return
+		}


We already check for that in the beginning:

flagger/pkg/controller/scheduler.go

Lines 250 to 256 in 560f884

// check primary status

if !cd.SkipAnalysis() {

if err := canaryController.IsPrimaryReady(cd); err != nil {

c.recordEventWarningf(cd, "%v", err)

return

}

}

aryan9600 · 2022-05-26T14:29:04Z

pkg/controller/scheduler.go

-	if err := canaryController.Promote(canary); err != nil {
-		c.recordEventWarningf(canary, "%v", err)
-		return true
+	if canary.Status.Phase != flaggerv1.CanaryPhasePromoting {


When skipAnalysis is enabled, it means that we halt the entire state machine, promote the canary, scale down the canary deployment and mark it as Succeeded, this changes that and relies on the state machine to continue and mark it as succeeded, which is not we want to do.

aryan9600 · 2022-05-26T14:41:04Z

If the intention is to have a webhook that runs before a canary is finalized/succeeds OR a canary is rolled back, then ConfirmFinalizing isn't the correct name for that, since it implies that the webhook confirms a canary's finalization.

add finalizing webhook

fbd7eef

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

cdlliuy requested a review from stefanprodan as a code owner May 11, 2022 09:42

aryan9600 reviewed May 11, 2022

View reviewed changes

add phase waitingfinalising

ed4b720

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

Ying Liu added 2 commits May 26, 2022 20:34

fix the problem when skipAnalysi

cb917cb

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

fix typo

df8d838

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>

aryan9600 requested changes May 26, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add finalizing webhook #1194

add finalizing webhook #1194

cdlliuy commented May 11, 2022 •

edited

Loading

aryan9600 May 11, 2022

cdlliuy May 16, 2022

aryan9600 May 11, 2022 •

edited

Loading

cdlliuy May 16, 2022 •

edited

Loading

cdlliuy commented May 19, 2022 •

edited

Loading

aryan9600 May 26, 2022

aryan9600 May 26, 2022

aryan9600 May 26, 2022

aryan9600 May 26, 2022

aryan9600 commented May 26, 2022

		@@ -868,6 +868,10 @@ func (c Controller) rollback(canary flaggerv1.Canary, canaryController canary.

		c.recorder.SetWeight(canary, primaryWeight, canaryWeight)

		if ok := c.runConfirmFinalizingHook(canary, flaggerv1.CanaryPhaseFailed); !ok {

	// check primary status
	if !cd.SkipAnalysis() {
	if err := canaryController.IsPrimaryReady(cd); err != nil {
	c.recordEventWarningf(cd, "%v", err)
	return
	}
	}

add finalizing webhook #1194

Are you sure you want to change the base?

add finalizing webhook #1194

Conversation

cdlliuy commented May 11, 2022 • edited Loading

aryan9600 May 11, 2022

Choose a reason for hiding this comment

cdlliuy May 16, 2022

Choose a reason for hiding this comment

aryan9600 May 11, 2022 • edited Loading

Choose a reason for hiding this comment

cdlliuy May 16, 2022 • edited Loading

Choose a reason for hiding this comment

cdlliuy commented May 19, 2022 • edited Loading

aryan9600 May 26, 2022

Choose a reason for hiding this comment

aryan9600 May 26, 2022

Choose a reason for hiding this comment

aryan9600 May 26, 2022

Choose a reason for hiding this comment

aryan9600 May 26, 2022

Choose a reason for hiding this comment

aryan9600 commented May 26, 2022

cdlliuy commented May 11, 2022 •

edited

Loading

aryan9600 May 11, 2022 •

edited

Loading

cdlliuy May 16, 2022 •

edited

Loading

cdlliuy commented May 19, 2022 •

edited

Loading