Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available #124868

Open
jhrcz-ls opened this issue May 14, 2024 · 7 comments · May be fixed by #125038
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@jhrcz-ls
Copy link

jhrcz-ls commented May 14, 2024

What happened?

when disk fills, it hits the hard eviction threshold, causing node disk pressure in the same moment imagegc spots it should prune something and start acting. this casues node going into disk pressure and evicting pods and not just start imagegc soon enough

What did you expect to happen?

i expect imagegc spots the filling disk soon enough, to start garbage collection and empty disk before node hits disk pressure.

from current documentation:

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

--image-gc-high-threshold int32     Default: 85
--image-gc-low-threshold int32     Default: 80

and

https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

    "imageGCHighThresholdPercent": 85,
    "imageGCLowThresholdPercent": 80,

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

vs.

nodefs.available<10%
imagefs.available<15%

... this does not make sense having the same percentage for imagefs.available (100-15 = 85 :-)

better approach would be having default values a little bit shifted with eqivalent to setting

  - "--image-gc-high-threshold=80"  
  - "--image-gc-low-threshold=75"   

... after setting this, i almost never get node disk pressure, because garbage collection and pruning disk happens soon enough

How can we reproduce it (as minimally and precisely as possible)?

fill disk, spot node disk pressure state at the same moment it starts garbage collecting.

Anything else we need to know?

No response

Kubernetes version

this is not version specific as checked documentation at the moment, its for years the same.

$ kubectl version
# paste output here

Cloud provider

none - kubeadm installation

OS version

not relevant

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@jhrcz-ls jhrcz-ls added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jhrcz-ls jhrcz-ls changed the title image-gc-high-threshold should be lower than value causing hard eviction of nodefs.available or imagefs.available image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available May 14, 2024
@vaibhav2107
Copy link
Member

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 15, 2024
@olderTaoist
Copy link
Contributor

"imageGCHighThresholdPercent" used by realImageGCManager that just delete no used images. imagefs.available used by Eviction Manager that generate unused images by evicting pods. even though 100 - imagefs.available < imageGCHighThresholdPercent It doesn't matter.

@jhrcz-ls
Copy link
Author

jhrcz-ls commented May 21, 2024

@olderTaoist maybe i got the idea you thought ... you mean, that without running eviction, there will be no unused images to garbage collect. but there could be lots of images not used anymore, because of cronjobs, previous image versions, etc.

in reality with this setting i eliminated lots of situations where node is marked with disk pressure for short time.

and just moving workload betwen nodes to garbage collect unused here, but redownload on other node is not ideal. better to proactively clean up some space and keep workload going. btw in the worst scenario, if it does not prune enough, eviction still happens and it will work the way it works now.

@olderTaoist
Copy link
Contributor

olderTaoist commented May 21, 2024

i got. It is necessary to modify the default value of imageGCHighThresholdPercent, meanwhile ensure that the value of imageGCHighThresholdPercent is smaller than 100 - imagefs.available.

@olderTaoist
Copy link
Contributor

/assign

@haircommander
Copy link
Contributor

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Development

Successfully merging a pull request may close this issue.

5 participants