Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPNODE-1717: Update config node spec while explicitly updating the cgroups mode #3793

Merged

Conversation

sairameshv
Copy link
Member

@sairameshv sairameshv commented Jul 12, 2023

  • Future OCP releases will default to cgroupsv2
  • During the upgrades, in order to honor the cgroupsv1 on the existing cluster, there should be a reference in the node spec for the future code to depend on deciding the cgroup version.
  • Hence, this code updates the config node spec whenever the cgroupsv1 is explicitly set

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jul 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@sairameshv
Copy link
Member Author

/test all

@sairameshv sairameshv marked this pull request as ready for review July 13, 2023 00:55
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 13, 2023
@sairameshv
Copy link
Member Author

/cc @yuqi-zhang @harche

@sairameshv sairameshv changed the title [DNM] Update config node spec while explicitly updating the cgroups mode Update config node spec while explicitly updating the cgroups mode Jul 13, 2023
@sairameshv
Copy link
Member Author

/hold cancel
This PR has been tested for upgrade to the current code base, ready for the review and hence cancelling the hold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 13, 2023
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this path should work, but we would need to be a bit careful of how we update the nodes config.

So the expectation here is for upgraded clusters, the user would need to manually set the cgroups to v2 via nodes.config, or delete the nodes.config object in cluster (thereby defaulting it back to the RHCOS default (of v2)?

We should definitely document the behaviour change and how users should do this carefully. Let's also talk to the release team about edge blocking, just so they are ok with this solution.

Future OCP releases will default to cgroupsv2
During the upgrades, in order to honor the cgroupsv1 on the existing cluster,
there should be a reference in the node spec for the future code to depend on deciding
the cgroup version.
Hence, this code updates the config node spec whenever the cgroupsv1 is explicitly set

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
@sairameshv sairameshv changed the title Update config node spec while explicitly updating the cgroups mode OCPNODE-1717: Update config node spec while explicitly updating the cgroups mode Jul 13, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 13, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 13, 2023

@sairameshv: This pull request references OCPNODE-1717 which is a valid jira issue.

In response to this:

  • Future OCP releases will default to cgroupsv2
  • During the upgrades, in order to honor the cgroupsv1 on the existing cluster, there should be a reference in the node spec for the future code to depend on deciding the cgroup version.
  • Hence, this code updates the config node spec whenever the cgroupsv1 is explicitly set

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, based on the CI run I think the behaviours are as expected. We end up with duplicated kargs for now, but that shouldn't be an issue once we upgrade to 4.14.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2023
@sunilcio
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Jul 14, 2023
@rphillips
Copy link
Contributor

/lgtm

@rphillips
Copy link
Contributor

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Jul 14, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rphillips, sairameshv, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rphillips
Copy link
Contributor

/hold

@haircommander brought up a great point... Does this PR upgrade successfully to? #3782

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2023
@yuqi-zhang
Copy link
Contributor

Since this is a bit of an exception to the regular process (4.13 only) we may need to either:

  1. ask for someone to help merge this PR by adding the valid bug label, or
  2. create a dummy 4.14 bug, mark as verified directly, then doing a /jira cherry-pick on here

@yuqi-zhang
Copy link
Contributor

Does this PR upgrade successfully to? #3782

In theory it should, the MCO upon upgrade will generate a new config without the kargs in the base, but since the kargs still exist in the update node.configs object generated 97-generated-kubeletconfig, it will end up in the render and just be a no-op change by itself. @sairameshv did you verify that path? I seem to recall you did

@sairameshv
Copy link
Member Author

Does this PR upgrade successfully to? #3782

In theory it should, the MCO upon upgrade will generate a new config without the kargs in the base, but since the kargs still exist in the update node.configs object generated 97-generated-kubeletconfig, it will end up in the render and just be a no-op change by itself. @sairameshv did you verify that path? I seem to recall you did

Yes, this upgrade has been tested.

@rphillips
Copy link
Contributor

/hold cancel

ok. thanks!

@sairameshv
Copy link
Member Author

/hold

@haircommander brought up a great point... Does this PR upgrade successfully to? #3782

I created a payload using this PR(to 4.14) and performed oc adm upgrade --to-image
It worked as expected honoring the existing cgroupsMode

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2023
@mrunalp mrunalp added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jul 14, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 14, 2023

@sairameshv: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 56c0dc0 into openshift:release-4.13 Jul 14, 2023
15 checks passed
sairameshv added a commit to sairameshv/cincinnati-graph-data that referenced this pull request Jul 25, 2023
Following PR includes changes to save the cgroups context in the node config object
openshift/machine-config-operator#3793
OCP 4.14+ releases default the cgroups version to "v2"
The above changes help in restoring the cgroups context during the upgrade of the clusters
Hence, it is required for the clusters to upgrade to 4.13.6 before upgrading to 4.14

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
sairameshv added a commit to sairameshv/cincinnati-graph-data that referenced this pull request Jul 25, 2023
Following PR includes changes to save the cgroups context in the node config object
openshift/machine-config-operator#3793
OCP 4.14+ releases default the cgroups version to "v2"
The above changes help in restoring the cgroups context during the upgrade of the clusters
Hence, it is required for the clusters to upgrade to 4.13.6 before upgrading to 4.14

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet