Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KEP-2400] Mount tmpfs memory-backed volumes with a noswap option if supported #124060

Merged
merged 7 commits into from
May 23, 2024

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Mar 26, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR addresses #105978 which is about preventing memory-backed volumes from being swapped.

This PR brings the following behavior:

  • When kubelet starts running with the fail-swap-on=false argument, it will check if the tmpfs noswap option is supported. If it's not - a warning log will be raised.
    • To understand if tmpfs noswap is supported or not, kubelet would try to mount a dummy tmpfs with the option and check for success. The directory is obviously cleaned up afterwards.
  • If the tmpfs noswap option [1] is supported, use it by default for every tmpfs volume mount. With this options, memory-backed volumes will not be able to swap.

[1] https://www.kernel.org/doc/html/latest/filesystems/tmpfs.html

Example

After running kubelet with the --fail-swap-on=false argument, I've created the following pod and secret:

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
data:
  username: dXNlcm5hbWU=  # Base64 encoded value of 'username'
  password: cGFzc3dvcmQ=  # Base64 encoded value of 'password'
---
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  volumes:
  - name: my-volume
    emptyDir:
      medium: "Memory"
  - name: secret-volume
    secret:
      secretName: mysecret
  containers:
  - image: k8s.gcr.io/e2e-test-images/agnhost:2.9
    volumeMounts:
    - name: my-volume
      mountPath: /tmpfsdir
    - name: secret-volume
      mountPath: "/etc/secrets"
    command:
      - sleep
      - "36000"
    name: mycontainer

Then on the node we can see tmpfs noswap being used:

> grep noswap /proc/mounts
tmpfs /var/lib/kubelet/pods/a79290c3-8565-4d05-b065-e6fe3d471beb/volumes/kubernetes.io~projected/kube-api-access-dfw72 tmpfs rw,seclabel,relatime,size=51200k,inode64,noswap 0 0
tmpfs /var/lib/kubelet/pods/1b756da6-531f-40d0-aa44-df79de14ad04/volumes/kubernetes.io~projected/kube-api-access-gh9tk tmpfs rw,seclabel,relatime,size=394492600k,inode64,noswap 0 0
tmpfs /var/lib/kubelet/pods/1fe60428-4258-481a-99b6-bdc4ae671c71/volumes/kubernetes.io~empty-dir/my-volume tmpfs rw,seclabel,relatime,size=524288k,inode64,noswap 0 0
tmpfs /var/lib/kubelet/pods/1fe60428-4258-481a-99b6-bdc4ae671c71/volumes/kubernetes.io~secret/secret-volume tmpfs rw,seclabel,relatime,size=524288k,inode64,noswap 0 0
tmpfs /var/lib/kubelet/pods/1fe60428-4258-481a-99b6-bdc4ae671c71/volumes/kubernetes.io~projected/kube-api-access-sfr4d tmpfs rw,seclabel,relatime,size=524288k,inode64,noswap 0 0

Which issue(s) this PR fixes:

Fixes #105978

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md

Note: I keep the functest and its revert commit for future reference, as in the future it would be a great candidate for a functional test (when noswap is widely supported enough)

/sig node

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 26, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 26, 2024
@iholder101 iholder101 changed the title Always mount tmpfs memory-backed volumes with a noswap option [KEP-2400] Always mount tmpfs memory-backed volumes with a noswap option Mar 26, 2024
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 26, 2024
@iholder101
Copy link
Contributor Author

/cc @liggitt @kannon92

@bart0sh bart0sh moved this from Triage to Archive-it in SIG Node CI/Test Board Mar 31, 2024
@bart0sh bart0sh moved this from Triage to Waiting on Author in SIG Node PR Triage Apr 3, 2024
@bart0sh
Copy link
Contributor

bart0sh commented Apr 3, 2024

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 3, 2024
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2024
@kannon92
Copy link
Contributor

Nice work @iholder101!

can you update the swap KEP with some of our discoveries around this?

@iholder101
Copy link
Contributor Author

Nice work @iholder101!

can you update the swap KEP with some of our discoveries around this?

Absolutely, will do!

@liggitt
Copy link
Member

liggitt commented May 20, 2024

( for explicit /lgtm from either @liggitt or @mrunalp )

one comment about avoiding the global rootDir in the swap utility package, I'll defer to @mrunalp for lgtm after that's addressed

@mrunalp
Copy link
Contributor

mrunalp commented May 20, 2024

@iholder101 Can we make the commits purely additive?

use the tmpfs noswap option in order
to mount memory-backed volumes if it's supported.

Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
When --fail-swap-on=false kubelet CLI argument
is provided, but tmpfs noswap is not supported
by the kernel, warn about the risks of memory-backed
volumes being swapped into disk

Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101
Copy link
Contributor Author

@iholder101 Can we make the commits purely additive?

@mrunalp Definitely.
I've now folded two commits (regarding error message improvement + address dim's request to organize imports) into previous ones.

PTAL

@k8s-ci-robot
Copy link
Contributor

@iholder101: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-providerless 9f7631f link false /test pull-kubernetes-e2e-gce-providerless
pull-kubernetes-conformance-kind-ipv6-parallel 9f7631f link false /test pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-e2e-gce-storage-snapshot a6b971f link false /test pull-kubernetes-e2e-gce-storage-snapshot

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, iholder101, kwilczynski, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@haircommander
Copy link
Contributor

/lgtm

great work here!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7ee51bea411226326f5247a0081b6e8681284089

@iholder101
Copy link
Contributor Author

Thank you everyone for your fruitful reviews!

Unholding based on #124060 (comment)
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 23, 2024
@k8s-ci-robot k8s-ci-robot merged commit b42bb8f into kubernetes:master May 23, 2024
17 of 18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone May 23, 2024
@AkihiroSuda
Copy link
Member

AkihiroSuda commented May 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG Node PR Triage
Needs Approver
Development

Successfully merging this pull request may close these issues.

[KEP-2400] Address swap impact on memory-backed volumes