Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: gc failure cause workflow restart not working properly #5240

Conversation

Somefive
Copy link
Collaborator

@Somefive Somefive commented Dec 29, 2022

Signed-off-by: Somefive yd219913@alibaba-inc.com

Description of your changes

In the past, when application needs to rerun workflow, it will clear the steps status first. However, since steps are array attribute, it cannot be updated through patch request, so it needs update request.

In the case when garbage collection could fail due to the potential problems like cluster disconnection or other things, it will redirect the steps clear to use the patch request, which will cause workflow restart but with outdated step status. This will let the workflow directly finished without any step executed. Then it will further lead to all resources under the application being recycled.

This PR enforces using update request for clear workflow step status, even if the garbage collection fails. I also recommend to add additional check for step finish in workflow. /cc @FogDong

The test added in this PR reproduce this issue and would fail in v1.4.11 if no controller fix is used.

  1. First create an application deploying a resource to cluster X.
  2. Let X down.
  3. Update the application to deploy a resource to cluster Y. This update will succeed but the following gc will fail.
  4. Update the application again and now in old versions, the application update will not succeed due to the improper status update for workflow steps.
  5. Delete cluster X and all gc should go back to normal.

I have:

  • Read and followed KubeVela's contribution process.
  • Related Docs updated properly. In a new feature or configuration option, an update to the documentation is necessary.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Special notes for your reviewer

@Somefive Somefive added priority/critical Highest priority. Must be actively worked on as someone's top priority right now. backport release-1.4 add this label will automatically backport this PR to release-1.4 branch backport release-1.5 add this label will automatically backport this PR to release-1.5 branch backport release-1.6 add this label will automatically backport this PR to release-1.6 branch labels Dec 29, 2022
@Somefive Somefive force-pushed the fix/gc-failure-case-workflow-restart-not-working branch from 47040d1 to 18cf8ec Compare December 29, 2022 08:12
@Somefive Somefive marked this pull request as draft December 29, 2022 08:16
@codecov
Copy link

codecov bot commented Dec 29, 2022

Codecov Report

Base: 49.69% // Head: 61.26% // Increases project coverage by +11.56% 🎉

Coverage data is based on head (43e1f4a) compared to base (7cc7ea2).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5240       +/-   ##
===========================================
+ Coverage   49.69%   61.26%   +11.56%     
===========================================
  Files         305      306        +1     
  Lines       45700    45760       +60     
===========================================
+ Hits        22712    28035     +5323     
+ Misses      20611    14843     -5768     
- Partials     2377     2882      +505     
Flag Coverage Δ
apiserver-e2etests 35.24% <ø> (?)
apiserver-unittests 36.86% <ø> (+0.01%) ⬆️
core-unittests 55.21% <100.00%> (+0.05%) ⬆️
e2e-multicluster-test 19.26% <100.00%> (+0.06%) ⬆️
e2e-rollout-tests 20.64% <100.00%> (-0.02%) ⬇️
e2etests 25.97% <100.00%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...dev/v1alpha2/application/application_controller.go 85.81% <100.00%> (+0.09%) ⬆️
...e.oam.dev/v1alpha2/application/mutating_handler.go 83.33% <0.00%> (-7.58%) ⬇️
...es/policydefinition/policydefinition_controller.go 73.17% <0.00%> (-3.66%) ⬇️
pkg/oam/util/helper.go 62.43% <0.00%> (-1.57%) ⬇️
pkg/cue/definition/template.go 68.26% <0.00%> (-1.50%) ⬇️
pkg/addon/push.go 75.14% <0.00%> (-1.13%) ⬇️
pkg/workflow/operation/operation.go 44.53% <0.00%> (-0.97%) ⬇️
...ller/core.oam.dev/v1alpha2/application/revision.go 70.92% <0.00%> (-0.92%) ⬇️
.../core/scopes/healthscope/healthscope_controller.go 72.14% <0.00%> (-0.66%) ⬇️
... and 116 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Somefive Somefive force-pushed the fix/gc-failure-case-workflow-restart-not-working branch 4 times, most recently from f1bc5f8 to 9ec2255 Compare December 29, 2022 10:18
@Somefive Somefive marked this pull request as ready for review December 29, 2022 10:26
@Somefive Somefive force-pushed the fix/gc-failure-case-workflow-restart-not-working branch from 9ec2255 to 872a5ff Compare December 29, 2022 10:36
Signed-off-by: Somefive <yd219913@alibaba-inc.com>
@Somefive Somefive force-pushed the fix/gc-failure-case-workflow-restart-not-working branch from 872a5ff to 43e1f4a Compare December 29, 2022 11:00
@FogDong FogDong merged commit 18f778a into kubevela:master Dec 29, 2022
@github-actions
Copy link

Backport failed for release-1.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.4
git worktree add -d .worktree/backport-5240-to-release-1.4 origin/release-1.4
cd .worktree/backport-5240-to-release-1.4
git checkout -b backport-5240-to-release-1.4
ancref=$(git merge-base 38aa5220168ed7b1526b99e6c9a81f93d7b30509 43e1f4a38f7d74d6fd84ad01756c15bd7496ba61)
git cherry-pick -x $ancref..43e1f4a38f7d74d6fd84ad01756c15bd7496ba61

@github-actions
Copy link

Backport failed for release-1.5, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.5
git worktree add -d .worktree/backport-5240-to-release-1.5 origin/release-1.5
cd .worktree/backport-5240-to-release-1.5
git checkout -b backport-5240-to-release-1.5
ancref=$(git merge-base 38aa5220168ed7b1526b99e6c9a81f93d7b30509 43e1f4a38f7d74d6fd84ad01756c15bd7496ba61)
git cherry-pick -x $ancref..43e1f4a38f7d74d6fd84ad01756c15bd7496ba61

@github-actions
Copy link

Backport failed for release-1.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin release-1.6
git worktree add -d .worktree/backport-5240-to-release-1.6 origin/release-1.6
cd .worktree/backport-5240-to-release-1.6
git checkout -b backport-5240-to-release-1.6
ancref=$(git merge-base 38aa5220168ed7b1526b99e6c9a81f93d7b30509 43e1f4a38f7d74d6fd84ad01756c15bd7496ba61)
git cherry-pick -x $ancref..43e1f4a38f7d74d6fd84ad01756c15bd7496ba61

zhaohuiweixiao pushed a commit to zhaohuiweixiao/kubevela that referenced this pull request Mar 7, 2023
…#5240)

Signed-off-by: Somefive <yd219913@alibaba-inc.com>

Signed-off-by: Somefive <yd219913@alibaba-inc.com>
@Somefive Somefive deleted the fix/gc-failure-case-workflow-restart-not-working branch June 20, 2023 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-1.4 add this label will automatically backport this PR to release-1.4 branch backport release-1.5 add this label will automatically backport this PR to release-1.5 branch backport release-1.6 add this label will automatically backport this PR to release-1.6 branch priority/critical Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants