image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available #124868

jhrcz-ls · 2024-05-14T12:46:56Z

What happened?

when disk fills, it hits the hard eviction threshold, causing node disk pressure in the same moment imagegc spots it should prune something and start acting. this casues node going into disk pressure and evicting pods and not just start imagegc soon enough

What did you expect to happen?

i expect imagegc spots the filling disk soon enough, to start garbage collection and empty disk before node hits disk pressure.

from current documentation:

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

--image-gc-high-threshold int32     Default: 85
--image-gc-low-threshold int32     Default: 80

and

https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

    "imageGCHighThresholdPercent": 85,
    "imageGCLowThresholdPercent": 80,

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

vs.

nodefs.available<10%
imagefs.available<15%

... this does not make sense having the same percentage for imagefs.available (100-15 = 85 :-)

better approach would be having default values a little bit shifted with eqivalent to setting

  - "--image-gc-high-threshold=80"  
  - "--image-gc-low-threshold=75"

... after setting this, i almost never get node disk pressure, because garbage collection and pruning disk happens soon enough

How can we reproduce it (as minimally and precisely as possible)?

fill disk, spot node disk pressure state at the same moment it starts garbage collecting.

Anything else we need to know?

No response

Kubernetes version

this is not version specific as checked documentation at the moment, its for years the same.

$ kubectl version
# paste output here

Cloud provider

none - kubeadm installation

OS version

not relevant

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-05-14T12:47:05Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

vaibhav2107 · 2024-05-15T08:21:45Z

/sig node

olderTaoist · 2024-05-21T06:18:22Z

"imageGCHighThresholdPercent" used by realImageGCManager that just delete no used images. imagefs.available used by Eviction Manager that generate unused images by evicting pods. even though 100 - imagefs.available < imageGCHighThresholdPercent It doesn't matter.

jhrcz-ls · 2024-05-21T07:10:58Z

@olderTaoist maybe i got the idea you thought ... you mean, that without running eviction, there will be no unused images to garbage collect. but there could be lots of images not used anymore, because of cronjobs, previous image versions, etc.

in reality with this setting i eliminated lots of situations where node is marked with disk pressure for short time.

and just moving workload betwen nodes to garbage collect unused here, but redownload on other node is not ideal. better to proactively clean up some space and keep workload going. btw in the worst scenario, if it does not prune enough, eviction still happens and it will work the way it works now.

olderTaoist · 2024-05-21T11:30:26Z

i got. It is necessary to modify the default value of imageGCHighThresholdPercent, meanwhile ensure that the value of imageGCHighThresholdPercent is smaller than 100 - imagefs.available.

olderTaoist · 2024-05-21T11:31:00Z

/assign

haircommander · 2024-05-22T17:58:23Z

/priority backlog

jhrcz-ls added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2024

jhrcz-ls changed the title ~~image-gc-high-threshold should be lower than value causing hard eviction of nodefs.available or imagefs.available~~ image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available May 14, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 15, 2024

k8s-ci-robot assigned olderTaoist May 21, 2024

olderTaoist linked a pull request May 22, 2024 that will close this issue

--image-gc-high-threshold should less than 100 - imagefs.available #125038

Open

SergeyKanzhelev added this to Triage in SIG Node Bugs May 22, 2024

haircommander moved this from Triage to Triaged in SIG Node Bugs May 22, 2024

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available #124868

image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available #124868

jhrcz-ls commented May 14, 2024 •

edited

k8s-ci-robot commented May 14, 2024

vaibhav2107 commented May 15, 2024

olderTaoist commented May 21, 2024

jhrcz-ls commented May 21, 2024 •

edited

olderTaoist commented May 21, 2024 •

edited

olderTaoist commented May 21, 2024

haircommander commented May 22, 2024

image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available #124868

image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available #124868

Comments

jhrcz-ls commented May 14, 2024 • edited

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented May 14, 2024

vaibhav2107 commented May 15, 2024

olderTaoist commented May 21, 2024

jhrcz-ls commented May 21, 2024 • edited

olderTaoist commented May 21, 2024 • edited

olderTaoist commented May 21, 2024

haircommander commented May 22, 2024

jhrcz-ls commented May 14, 2024 •

edited

jhrcz-ls commented May 21, 2024 •

edited

olderTaoist commented May 21, 2024 •

edited