Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add finalizing webhook #1194

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

cdlliuy
Copy link
Contributor

@cdlliuy cdlliuy commented May 11, 2022

Fix of issue #1193
With the changes, we can see below "canary status"

  • succeed case
flagger-helloworld-demo-rollout   Waiting     0        2022-05-16T08:01:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:02:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:03:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:04:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:05:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:06:31Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T08:07:31Z
flagger-helloworld-demo-rollout   WaitingPromotion   0        2022-05-16T08:08:31Z
flagger-helloworld-demo-rollout   Promoting          0        2022-05-16T08:15:31Z
flagger-helloworld-demo-rollout   WaitingFinalising   0        2022-05-16T08:16:31Z
flagger-helloworld-demo-rollout   Finalising          0        2022-05-16T08:27:31Z
flagger-helloworld-demo-rollout   Succeeded           0        2022-05-16T08:27:31Z
  • falied case:
NAME                              STATUS      WEIGHT   LASTTRANSITIONTIME
flagger-helloworld-demo-rollout   Succeeded   0        2022-05-16T10:07:43Z
flagger-helloworld-demo-rollout   Succeeded   0        2022-05-16T10:07:43Z
flagger-helloworld-demo-rollout   Waiting     0        2022-05-16T11:12:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:13:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:14:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:15:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:16:04Z
flagger-helloworld-demo-rollout   Progressing   0        2022-05-16T11:17:04Z
flagger-helloworld-demo-rollout   WaitingFinalising   0        2022-05-16T11:18:04Z
flagger-helloworld-demo-rollout   Failed              0        2022-05-16T11:20:04Z
flagger-helloworld-demo-rollout   Failed              0        2022-05-16T11:20:04Z

The reason to have a 'WaitingFinalising" is to have a time to handle something left-over outside k8s.
i.e. we are using an external traffic management providers on azure (azure ATM). When canary starts to process, we changed the ATM policy to route some traffic to canary in waiting stage. Then before the canary scale down to zero in "Finalizing' stage, we need a time to route the traffic back to primary, so this need to apply to both succeed and failed cases.

If the idea is acceptable, I will continue to fix the unit test failures (mostly about the expected status changes)

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>
Comment on lines 336 to 357
if cd.SkipAnalysis() {
c.recordEventInfof(cd, "Promotion completed! Canary analysis was skipped for %s.%s", cd.Spec.TargetRef.Name, cd.Namespace)
c.alert(cd, "Canary analysis was skipped, promotion finished.",
false, flaggerv1.SeverityInfo)
} else {
c.recordEventInfof(cd, "Promotion completed! Scaling down %s.%s", cd.Spec.TargetRef.Name, cd.Namespace)
c.alert(cd, "Canary analysis completed successfully, promotion finished.",
false, flaggerv1.SeverityInfo)
}
return
}

// check canary status
var retriable = true
retriable, err = canaryController.IsCanaryReady(cd)
if err != nil && retriable {
c.recordEventWarningf(cd, "%v", err)
return
}

// check if analysis should be skipped
if skip := c.shouldSkipAnalysis(cd, canaryController, meshRouter, err, retriable); skip {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason behind this? IMO, we should check and if required skip analysis, before promoting and/or finalizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted this change

pkg/apis/flagger/v1beta1/canary.go Outdated Show resolved Hide resolved
@@ -868,6 +868,10 @@ func (c *Controller) rollback(canary *flaggerv1.Canary, canaryController canary.

c.recorder.SetWeight(canary, primaryWeight, canaryWeight)

if ok := c.runConfirmFinalizingHook(canary, flaggerv1.CanaryPhaseFailed); !ok {
Copy link
Member

@aryan9600 aryan9600 May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollback() is meant to roll a Canary analysis back, why would it call the the webhooks meant to confirm canary finalization, which is supposed to indicate that the canary has been promoted and thus being finalized?

Copy link
Contributor Author

@cdlliuy cdlliuy May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the reason in the description.

Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>
@cdlliuy
Copy link
Contributor Author

cdlliuy commented May 19, 2022

@aryan9600 @stefanprodan can you help to review again and provide your feedback to see whether this idea to add a finalizing webhook is acceptable? Thanks!

Ying Liu added 2 commits May 26, 2022 20:34
Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>
Signed-off-by: Ying Liu <ying.liu.lying@gmail.com>
return
}
if (cd.Status.Phase == flaggerv1.CanaryPhaseFinalising ||
cd.Status.Phase == flaggerv1.CanaryPhaseWaitingFinalising) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking for CanaryPhaseFinalising? From what I understand runFinalizing, runs the finalizing webhooks, sets the status to Finalizing, scales the canary deployment down to zero and then sets the status to Succeeded. If runFinalizing fails to scale down the canary deployment, then the canary would be in Finalizing status, and runFinalizing() would be called on the next run, which would subsequently call the finalizing webhooks as well.

// check if the number of failed checks reached the threshold for rollback
if (cd.Status.Phase == flaggerv1.CanaryPhaseProgressing ||
cd.Status.Phase == flaggerv1.CanaryPhaseWaitingPromotion ||
cd.Status.Phase == flaggerv1.CanaryPhaseWaitingFinalising) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitingFinalizing comes between Promoting and Finalizing, we would not want to rollback after the canary has been promoted.

Comment on lines +329 to +332
if err := canaryController.IsPrimaryReady(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already check for that in the beginning:

// check primary status
if !cd.SkipAnalysis() {
if err := canaryController.IsPrimaryReady(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
}

if err := canaryController.Promote(canary); err != nil {
c.recordEventWarningf(canary, "%v", err)
return true
if canary.Status.Phase != flaggerv1.CanaryPhasePromoting {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skipAnalysis is enabled, it means that we halt the entire state machine, promote the canary, scale down the canary deployment and mark it as Succeeded, this changes that and relies on the state machine to continue and mark it as succeeded, which is not we want to do.

@aryan9600
Copy link
Member

If the intention is to have a webhook that runs before a canary is finalized/succeeds OR a canary is rolled back, then ConfirmFinalizing isn't the correct name for that, since it implies that the webhook confirms a canary's finalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants