Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add KEP for cgroups v2 support #1370

Merged
merged 1 commit into from Feb 26, 2020
Merged

Conversation

giuseppe
Copy link
Member

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 18, 2019
@k8s-ci-robot
Copy link
Contributor

Welcome @giuseppe!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 18, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @giuseppe. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Nov 18, 2019
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unshare cgroupns by default

keps/sig-node/20191118-cgroups-v2.md Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved
@AkihiroSuda
Copy link
Member

Do you think we should open a separate KEP for rootless?

@giuseppe
Copy link
Member Author

Do you think we should open a separate KEP for rootless?

yes let's keep it separate from cgroups v2, which is already touching enough components :-)

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add more detail on kubelet changes where it interacts with cgroups directly

keps/sig-node/20191118-cgroups-v2.md Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Show resolved Hide resolved

## User Stories

* The Kubelet can run on a cgroups v2 host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that kubelet will support v1 or v2

keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved

Kubelet PR: [https://github.com/kubernetes/kubernetes/pull/85218](https://github.com/kubernetes/kubernetes/pull/85218)

### Option 2: Use cgroups v2 throughout the stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For option 1, kubelet still has to write cgroups v2 for its interactions with cgroup. Please add some detail around kubelet / cm package and eviction manager.


## Graduation Criteria

- Alpha: Basic support for running Kubernetes on a cgroups v2 host.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth discussing if we should add test coverage on fedora as a good candidate upstream.

@derekwaynecarr
Copy link
Member

/assign

@giuseppe giuseppe force-pushed the cgroupsv2 branch 2 times, most recently from 83faa61 to c0c871d Compare November 20, 2019 07:49
_net_prio_ are not available with the new version. The alternative to
these controllers is to use eBPF.

The lack of the network controllers probably affects some CNI
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't currently know of any CNI plugins that use cgroups for network management. (My list is not exhaustive, of course...). Bandwidth limiting, if implemented, is done via a tc filter / qdisc for all of the plugins I'm aware of.

keps/sig-node/20191118-cgroups-v2.md Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved
keps/sig-node/20191118-cgroups-v2.md Outdated Show resolved Hide resolved


With this approach cAdvisor would have to read back values from
cgroups v2 files (already done).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with this option, CRI implementations will need to map the cgroup v1 fields to cgroup v2 values, right?

Could you list what each layer needs to do for each option to make it clear?

  • user (pod spec)
  • kubelet
  • CRI implementation
  • OCI runtime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in the CRI implementation and OCI runtime might be implemented in a different way, deciding where to draw the line.
For CRI-O+crun, most of the logic is in the OCI runtime itself, but that is not the only way to achieve it.
For the Kubelet, the changes in this patch should be enough (at least until we have not hugetlb available): kubernetes/kubernetes#85218

In the second phase though, when cgroup v2 is fully supported through the stack there is need to change both pod specs+CRI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify why you feel the pod spec needs to change? i see no major reason to change the pod spec or resource representation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming we'd like to expose all/most of the cgroup v2 features.

e.g. the memory controller on cgroup v1 allows to configure:

  • memory.soft_limit_in_bytes
  • memory.limit_in_bytes

while on cgroup v2 we have:

  • memory.high
  • memory.low
  • memory.max
  • memory.min

but that is probably out of the scope as each new feature (if needed) must go through its own KEP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr are future improvements based on what cgroup 2 offers out of scope for the current KEP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe future improvements for what cgroup v2 offers should be out of scope of this kep. i would keep this kep focused on kubelet is tolerant of cgroup v2 host. adding new cgroup v2 specific features to resource model would be a separate enhancement.

To address @yujuhong question, I think we are saying the following:

  • no change required to pod spec
  • kubelet cgroup manager uses v1 or v2 mode based on its introspection of sys/fs on startup
  • kubelet qos and pod level cgroup creation on a v2 host uses the mapping table specified
  • cri implementers that support cgroup v2 hosts must use a similar mapping to ensure that pod bounding cgroup values are consistent with the container cgroup values.
  • oci runtime spec is not required to change as cri implementer provides mapping in the transitional period.

@giuseppe agree with above?

@yujuhong
Copy link
Contributor

yujuhong commented Dec 3, 2019

/cc @Random-Liu for containerd

@k8s-ci-robot
Copy link
Contributor

@yujuhong: GitHub didn't allow me to request PR reviews from the following users: for, containerd.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @Random-Liu for containerd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AkihiroSuda added a commit to AkihiroSuda/usernetes that referenced this pull request Feb 18, 2020
The rootless patches has been merged into the upstream Docker since
early 2019 and no longer needs to be incubated in this repo.

Another reason to remove Docker from this repo is that  dockershim
might not be going to support cgroup v2:
kubernetes/enhancements#1370 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/usernetes that referenced this pull request Feb 18, 2020
The rootless patches has been merged into the upstream Docker since
early 2019 and no longer needs to be incubated in this repo.

Another reason to remove Docker from this repo is that  dockershim
might not be going to support cgroup v2:
kubernetes/enhancements#1370 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
AkihiroSuda added a commit to AkihiroSuda/usernetes that referenced this pull request Feb 18, 2020
The rootless patches has been merged into the upstream Docker since
early 2019 and no longer needs to be incubated in this repo.

Another reason to remove Docker from this repo is that  dockershim
might not be going to support cgroup v2:
kubernetes/enhancements#1370 (comment)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@AkihiroSuda
Copy link
Member

@derekwaynecarr @yujuhong @dchen1107 PTAL?

@sftim
Copy link
Contributor

sftim commented Feb 22, 2020

/retest

@derekwaynecarr
Copy link
Member

per dawn's comment and discussion on sig-node, we can merge this kep, and update as we find issues with implementation in 1.19. there is no disagreement on need to iterate here.

see: #1370 (comment)

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 26, 2020
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2020
@mrunalp
Copy link
Contributor

mrunalp commented Feb 26, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, giuseppe, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 4abcc67 into kubernetes:master Feb 26, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 26, 2020
@hectorj2f
Copy link

awesome work!

@crosbymichael
Copy link

LGTM :)

@giuseppe giuseppe mentioned this pull request Jan 12, 2021
4 tasks
| period | cpu.max | y = x| period and quota are written together|
| quota | cpu.max | y = x| period and quota are written together|

**blkio controller**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any kuberente versions and plans to support blkio controller ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any kuberente versions and plans to support blkio controller ?

kubernetes/enhancements#1907 the KEP is WIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet