Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add systemd as cgroup driver #27943

Merged

Conversation

akhilerm
Copy link
Member

@akhilerm akhilerm commented Nov 9, 2022

The containerd test jobs are now using cos-stable image now; which has cgroupv2 enabled by default. Therefore we need to add systemd as cgroup driver for all the tests which makes use of cos-stable.

Ref : #27912 (comment)

Signed-off-by: Akhil Mohan makhil@vmware.com

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 9, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @akhilerm. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Nov 9, 2022
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 9, 2022
@akhilerm
Copy link
Member Author

akhilerm commented Nov 9, 2022

@endocrimes @adisky Can you take a look at this PR. Its related to setting cgroup driver to systemd, for all jobs for which image has been updated to cos-stable.

@SergeyKanzhelev
Copy link
Member

add systemd as cgroup driver for all containerd jobs that make use of cos-stable image.

Can you please add more details on the motivation for this change.

@SergeyKanzhelev
Copy link
Member

/assign @bobbypage

@@ -17,7 +17,7 @@ presets:
- name: NON_MASQUERADE_CIDR
value: 0.0.0.0/0
- name: KUBELET_TEST_ARGS
value: --runtime-cgroups=/system.slice/containerd.service
value: "--runtime-cgroups=/system.slice/containerd.service --cgroup-driver=systemd"
Copy link
Member

@bobbypage bobbypage Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that all the jobs with preset-e2e-containerd should use systemd cgroup driver, I don't think this is accuarete.

For example the job below ci-containerd-e2e-ubuntu-gce uses the preset preset-e2e-containerd, but it's running on ubuntu-2004-lts which does not enable cgroupv2, so we shouldn't switch to use systemd cgroup driver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this -- for cluster e2e tests, the GCE node bootstrap scripts already configure --cgroup-driver=systemd on cgroupv2 OS images (see https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L1630-L1633). So we only need to configure node e2e tests, not cluster e2e tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a separate label preset for those jobs then?

Just thinking out loud. . We can have a new preset for tests that use cos image and preset-e2e-containerd will be as is without any changes. Will it impact any other tests? Or that would be a huge change to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a separate label preset for those jobs then?

That's one option, we don't have too many cluster e2e tests in config/jobs/kubernetes/sig-node/containerd.yaml, so I think we can just also just update each them or setup a present and annotate with preset, up to you.

The important part is we only update the ones for systemd cgroup driver on the tests which are using cgroupv2 images.

Copy link
Member Author

@akhilerm akhilerm Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except these 4 jobs, other have either systemd driver set or is using an older cos image

The above jobs have multiple images (ubuntu and cos stable) specified in the image config. So I am not sure how to update the --cgroup-driver value for these jobs.

Copy link
Member

@bobbypage bobbypage Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general rule we can follow is:

  • If the job is pointing to an image config file that is using something older than COS M97 (or != cos-stable), then let's keep it as is for now
  • Otherwise, for jobs pointing to image config file with cos-stable, let's also update it to ubuntu 22.04 and also update cgroup driver to systemd.

@akhilerm akhilerm marked this pull request as ready for review November 11, 2022 11:43
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2022
@SergeyKanzhelev
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 15, 2022
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. labels Nov 16, 2022
@@ -998,7 +998,7 @@ presubmits:
- --gcp-zone=us-west1-b
- '--node-test-args=--container-runtime-endpoint=unix:///run/containerd/containerd.sock
--container-runtime-process-name=/usr/bin/containerd --container-runtime-pid-file=
--kubelet-flags="--cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service"
--kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/containerd.service"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing previous release jobs sounds wrong. Let's have a separate copy of image config file for those locked on non cgroupv2 images

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to the release jobs since the referring image file had changed.
But it seems like the jobs itself on those branches are not failing after the update to cos-stable. So I think this change wont be needed.

Also a bit confused that, ideally if the image is using cos-stable, and we are not setting cgroup driver to systemd (on both kubelet as well as containerd), it shouldnt be working correctly right?

cc: @bobbypage

Signed-off-by: Akhil Mohan <makhil@vmware.com>
update VM images to use cos-stable for testing containerd branches 1.5
and 1.6

Signed-off-by: Akhil Mohan <makhil@vmware.com>
Signed-off-by: Akhil Mohan <makhil@vmware.com>
Signed-off-by: Akhil Mohan <makhil@vmware.com>
@akhilerm akhilerm force-pushed the containerd-upgrade-systemd-cgroup branch from adc1455 to 98f2595 Compare November 24, 2022 06:26
Signed-off-by: Akhil Mohan <makhil@vmware.com>
@akhilerm
Copy link
Member Author

akhilerm commented Dec 1, 2022

/cc @bobbypage @SergeyKanzhelev

@bobbypage
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2022
@bobbypage
Copy link
Member

Thanks for the updates!

@bobbypage
Copy link
Member

/assign @dims

@dims
Copy link
Member

dims commented Dec 6, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akhilerm, dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2022
@k8s-ci-robot k8s-ci-robot merged commit afa6a41 into kubernetes:master Dec 6, 2022
SIG Node CI/Test Board automation moved this from PRs - Needs Reviewer to Done Dec 6, 2022
SIG Node PR Triage automation moved this from Triage to Done Dec 6, 2022
@k8s-ci-robot
Copy link
Contributor

@akhilerm: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key containerd.yaml using file config/jobs/kubernetes/sig-node/containerd.yaml
  • key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

In response to this:

The containerd test jobs are now using cos-stable image now; which has cgroupv2 enabled by default. Therefore we need to add systemd as cgroup driver for all the tests which makes use of cos-stable.

Ref : #27912 (comment)

Signed-off-by: Akhil Mohan makhil@vmware.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akhilerm akhilerm deleted the containerd-upgrade-systemd-cgroup branch December 6, 2022 03:25
@liggitt
Copy link
Member

liggitt commented Dec 6, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants