Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applications: limits the number of retries only if deployOptions.Helm.atomic=true #11927

Merged

Conversation

vgramer
Copy link
Contributor

@vgramer vgramer commented Feb 17, 2023

What this PR does / why we need it:
If the worker node takes time to join the cluster (eg due to quota), we may reach the max number of retries before the CNI application may be stuck in status failed even, if the workload is finally running (ref #11929)

To avoid this behavior, we limit the number of retries only is atomic�is true.

Which issue(s) this PR fixes:

Fixes #11856

Manual tests:
Deploy apache application to force failure during installation, the image repository is overridden to a non-exisinting
repository.

Action On applicationInstallation atomic wait repo test result status.failure status.Ready
create false false invalid OK 0 (unlimited re OK
edit true true invalid OK increased until it reaches max retries KO
edit to add useless values true true invalid OK reset and then increased until it reaches max retries KO
edit true true valid OK reset to 0 OK
edit false true invalid OK 0 (unlimited retries) KO

I also test that:

  • if applicationInstallation has no deployOption, then one from applicationDefintion is taken.
  • created a cluster with no worker nodes (machine deployment nodes scaled to 0) and atomic=false , wait=true should. the installation retry indefinitely, and once a worker node is added status is resolved to OK

What type of PR is this?

Special notes for your reviewer:

Does this PR introduce a user-facing change? Then add your Release Note here:

applications: limits the number of retries only if deployOptions.Helm.atomic=true 

Documentation:

https://github.com/kubermatic/kubermatic/pull/11927

@kubermatic-bot kubermatic-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. do-not-merge/docs-needed Indicates that a PR should not merge because it's missing one of the documentation labels. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/code-freeze Indicates that a PR should not merge because it has not been approved for code freeze yet. sig/app-management Denotes a PR or issue as being assigned to SIG App Management. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2023
….atomic=true

Signed-off-by: Vincent Gramer <vincent@kubermatic.com>
@vgramer vgramer force-pushed the application-max-retries-if-atomic-true branch from 628fb1f to 6b8a22a Compare February 17, 2023 10:51
@kubermatic-bot kubermatic-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. docs/none Denotes a PR that doesn't need documentation (changes). and removed do-not-merge/docs-needed Indicates that a PR should not merge because it's missing one of the documentation labels. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 17, 2023
@vgramer vgramer changed the title [WIP ]applications: limits the number of retries only if deployOptions.Helm.atomic=true applications: limits the number of retries only if deployOptions.Helm.atomic=true Feb 17, 2023
@kubermatic-bot kubermatic-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 17, 2023
@rastislavs
Copy link
Contributor

/lgtm
/approve

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 17, 2023
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 3b17d727e535a5a529c1077c1a57cb5a6ddcebf9

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rastislavs, vgramer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rastislavs rastislavs added this to the KKP 2.22 milestone Feb 17, 2023
@vgramer
Copy link
Contributor Author

vgramer commented Feb 17, 2023

cc @kubermatic/sig-release-managers

@embik embik added the code-freeze-approved Indicates a PR has been approved by release managers during code freeze. label Feb 17, 2023
@kubermatic-bot kubermatic-bot removed the do-not-merge/code-freeze Indicates that a PR should not merge because it has not been approved for code freeze yet. label Feb 17, 2023
@kubermatic-triage-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs

Review the full test history

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubermatic-bot kubermatic-bot merged commit ae55c52 into kubermatic:main Feb 17, 2023
@vgramer vgramer deleted the application-max-retries-if-atomic-true branch February 20, 2023 08:32
@kubermatic-bot kubermatic-bot added docs/provided Denotes a PR that has a valid documentation reference. and removed docs/none Denotes a PR that doesn't need documentation (changes). labels Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. code-freeze-approved Indicates a PR has been approved by release managers during code freeze. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/provided Denotes a PR that has a valid documentation reference. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/app-management Denotes a PR or issue as being assigned to SIG App Management. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fixed amount of retries in application-controller may cause issues for system applications
5 participants