Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1926724: UPSTREAM: 98028: add auto update for priority & fairness bootstrap configuration objects #563

Merged
merged 3 commits into from Nov 15, 2021

Conversation

tkashem
Copy link

@tkashem tkashem commented Feb 10, 2021

What type of PR is this?

/kind bug

What this PR does / why we need it:
apf post startup config provider should ensure that the bootstrap spec of the suggested priority level config and flow schema objects is saved when a given object already exists.
if a new version of the apiserver has a change in the spec of any of the suggested objects in bootstrap configuration, the updated spec does not get stored, because if a given object already exists in the cluster we don't update it to match the spec of the bootstrap object.

We recently merged kubernetes#95259 that changed the spec of the suggested configuration in bootstrap. It went into 1.20. If a cluster upgrades from 1.19 to 1.20, the updated spec in the bootstrap configuration won't be reflected in the stored objects.

This PR takes the following approach:
On a fresh install, all (both mandatory and suggested) configuration objects will have auto update enabled via the following annotation
apf.kubernetes.io/autoupdate-spec: 'true'

  • the post start hook for APF tries for 30s to initialize the bootstrap configuration, the apiserver will fail to start if it can't finish the initialization within 30s
  • after the initialization is complete, the kube-apiserver periodically checks the bootstrap configuration objects on the cluster and applies update if necessary.

kube-apiserver enforces an 'always auto-update' policy for the mandatory configuration object(s). This implies:

  • the auto-update annotation key is added with a value of 'true' if it is missing.
  • the auto-update annotation key is set to 'true' if its current value is a boolean false or has an invalid boolean representation (if the cluster operator sets it to 'false' it will be stomped)
  • any changes to the spec made by the cluster operator will be stomped.

The kube-apiserver will apply update on the suggested configuration if:

  • the cluster operator has enabled auto-update by setting the annotation (apf.kubernetes.io/autoupdate-spec: 'true') or
  • the annotation key is missing but the generation is 1

If the suggested configuration object is missing the annotation key, kube-apiserver will update the annotation appropriately:

  • it is set to 'true' if generation of the object is '1' which usually indicates that the spec of the object has not been changed.
  • it is set to 'false' if generation of the object is greater than 1.

The above approach for suggested configuration ensures that we don't squash changes made by an operator. Please note, we can't protect the changes made by the operator in the following scenario:

  • the user changes the spec and then deletes and recreates the same object. (generation resets to 1)

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@openshift-ci-robot
Copy link

@tkashem: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

[WIP] UPSTREAM: 98028: fix apf post startup ensure logic works on upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Feb 10, 2021
@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tkashem
To complete the pull request process, please assign marun after the PR has been reviewed.
You can assign the PR to them by writing /assign @marun in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tkashem
Copy link
Author

tkashem commented Feb 10, 2021

The upstream PR is still in review - kubernetes#98028.
I have picked it in o/k so qe can do an early test on upgrade cc @xingxingxia @wangke19

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2021
@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci
Copy link

openshift-ci bot commented May 7, 2021

@tkashem: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

[WIP] UPSTREAM: 98028: fix apf post startup ensure logic works on upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tkashem tkashem changed the title [WIP] UPSTREAM: 98028: fix apf post startup ensure logic works on upgrade Bug 1926724: UPSTREAM: 98028: fix apf post startup ensure logic works on upgrade May 7, 2021
@tkashem tkashem changed the title Bug 1926724: UPSTREAM: 98028: fix apf post startup ensure logic works on upgrade Bug 1926724: UPSTREAM: 98028: add auto update for priority & fairness bootstrap configuration objects May 7, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 7, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 7, 2021

@tkashem: This pull request references Bugzilla bug 1926724, which is invalid:

  • expected dependent Bugzilla bug 1927397 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE), but it is NEW instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1926724: UPSTREAM: 98028: fix apf post startup ensure logic works on upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@tkashem: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2021
@tkashem
Copy link
Author

tkashem commented Sep 20, 2021

/remove-lifecycle stale

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 20, 2021
Take the following approach:
On a fresh install, all bootstrap configuration objects will
have auto update enabled via the following annotation :
`apf.kubernetes.io/autoupdate: 'true'`

The kube-apiserver periodically checks the bootstrap configuration
objects on the cluster and applies update if necessary.

We enforce an 'always auto-update' policy for the mandatory
configuration object(s).

We update the suggested configuration objects when:
- auto update is enabled (`apf.kubernetes.io/autoupdate: 'true'`) or
- auto update annotation key is missing but `generation` is `1`

If the configuration object is missing the annotation key, we add
it appropriately:
it is set to `true` if `generation` is `1`, `false` otherwise.

The above approach ensures that we don't squash changes made by an
operator. Please note, we can't protect the changes made by the
operator in the following scenario:
- the user changes the spec and then deletes and recreates
  the same object. (generation resets to 1)

remove using a marker
@openshift-ci-robot openshift-ci-robot added the backports/validated-commits Indicates that all commits come to merged upstream PRs. label Sep 20, 2021
@wangke19
Copy link

/lgtm

@wangke19
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Sep 22, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2021
@wangke19
Copy link

/bugzilla refresh

@openshift-ci
Copy link

openshift-ci bot commented Sep 22, 2021

@wangke19: This pull request references Bugzilla bug 1926724, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.z) matches configured target release for branch (4.7.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1927397 is in the state CLOSED (ERRATA), which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1927397 targets the "4.8.0" release, which is one of the valid target releases: 4.8.0, 4.8.z
  • bug has dependents

Requesting review from QA contact:
/cc @wangke19

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tkashem
Copy link
Author

tkashem commented Nov 9, 2021

/retest

@tkashem
Copy link
Author

tkashem commented Nov 9, 2021

/assign @sttts

@tkashem
Copy link
Author

tkashem commented Nov 9, 2021

  1. If the Bugzilla associated with the PR has the "FastFix" keyword, the subjective assessment on the issue has already been done and a customer is impacted. These PRs should be prioritized for merge.
    • verified
    • does not apply
  2. The bug has significant impact either through severity, reduction in supportability, or number of users affected.
    • verified
    • does not apply
  3. For branches that are in the Maintenance lifecycle phase:
    • The bug is a critical fix, no reasonable workaround exists, and a recommendation for upgrade has been ruled out, or
    • The bug is a security related bug
    • Branch not in maintenance mode yet (current release + previous release for 90 days after current GA; everything older is in maintenance)
  4. The severity field of the bug must be set to accurately reflect criticality.
    • verified
  5. The PR was created with the cherry-pick bot OR the PR’s description is well formed with user-focused release notes that state the bug number, impact, cause, and resolution. Where appropriate, it should also contain information about how a user can identify whether a particular cluster is affected.
    • verified

@tkashem
Copy link
Author

tkashem commented Nov 9, 2021

/retest

@sttts sttts added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Nov 10, 2021
@tkashem
Copy link
Author

tkashem commented Nov 10, 2021

upstream PR has merged a while ago, canceling my hold

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2021
@wangke19
Copy link

wangke19 commented Nov 12, 2021

@sttts Need one approved label

@sttts
Copy link

sttts commented Nov 15, 2021

/approve

@openshift-ci
Copy link

openshift-ci bot commented Nov 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts, tkashem, wangke19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2021
@wangke19
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 15, 2021
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link

openshift-ci bot commented Nov 15, 2021

@tkashem: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-selfupgrade f2e817f305e3caa1fa9b03f47e2028cfc8502087 link /test e2e-aws-selfupgrade
ci/prow/e2e-aws-disruptive e2dfab7f5681af211e1b4dca4b38c4e5922ec91b link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 5be9a07 into openshift:release-4.7 Nov 15, 2021
@openshift-ci
Copy link

openshift-ci bot commented Nov 15, 2021

@tkashem: All pull requests linked via external trackers have merged:

Bugzilla bug 1926724 has been moved to the MODIFIED state.

In response to this:

Bug 1926724: UPSTREAM: 98028: add auto update for priority & fairness bootstrap configuration objects

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. backports/validated-commits Indicates that all commits come to merged upstream PRs. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants