-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdout logs should not be accounted in ephemeral storage usage #124333
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig node |
/cc @haircommander |
I personally disagree with this, as it's the container that is logging in the first place. I understand the lifecycle point about the kubelet being the entity that choses when to clean logs, but allowing a container to define its own limits functionally opens the node up for DOS attacks from a container spamming the node and defining its own arbitrarily high limit. Kubelet is responsible for protecting the other workloads on the node, and if the container is being limited by the admin defined limits, it should log less, IMO |
I'm not sure that we're in sync regarding the context: I'm not asking/requesting for a feature to let the container define it's own log retention - on the contrary, I think the current design (where kubelet / CRI handles the lifecycle of container produced logs) is sufficient to cover that. By default, the configuration allows 50 MiB (uncompressed) logs to be stored:
Needless to say, cluster admin(s) might choose to configure these values differently (allowing more or less logs to be stored on a worker node, so the maximum amount of logs allowed on each worker node is completely up to the cluster admin to decide. No, what I'm highlighting in this issue is kinda the opposite: If the container's ephemeral-storage limitation is more strict than the amount of logs kubelet maintain on the worker node, the container could be evicted due to the way ephemeral storage usage is calculated. One might argue that containers should not produce that much logs to begin with, but this is just something that can't be guaranteed over the lifetime of the container:
Examples:
// NOTE: Default ingress-nginx / coredns doesn't come with any ephemeral-storage limit definition, which is why they're not impacted by this issue (see the Unfortunately the documentation doesn't do a good job of explaining how 'ephemeral-storage' limits are used either. Because of this, it's probably safer to follow the example of ingress-nginx / coredns and not set any limit for ephemeral-storage to avoid "mysterious" POD evictions, but doesn't that make this feature pointless? |
Marking this as a feature since the cluster-admin will have to account for storage used by container logs separately while planning. /kind feature |
I would argue that this is already the case: When the cluster admin designs / dimensions a cluster node, it already has to account the number of PODs the system will have to support, and the maximum amount of logs those PODs (and containers related to those PODs) could generate (and thus to configure the Again - the issue I'm trying to highlight here is the disconnect between the [1] kubelet - controls the amount of logs a container could have on a given worker node Configurations / settings related to these two are completely disconnected from each other, however the POD eviction logic connects them indirectly (by accounting logs that are maintained by kubelet on the worker node). Because of this I don't see the |
I brought this up at SIG node meeting today, and @SergeyKanzhelev brought up a good point. The design of logs being charged against the pod's ephemeral storage may feel like an anti-pattern, but that's because it's tricky to fully charge the pod for all of its behavior. We discussed pushing towards a future where the cost of pods on the infrastructure (kubelet, CRI) is not linear with the number of pods. That's an ambitious goal because they both do a lot on a per-pod basis, but a good reach goal would be to have the infrastructure not be affected by increasing number of pods, and instead have each pod charged for all of its behavior. The way ephemeral storage accounts for logs fulfills that goal |
What happened?
This question has originally been reported under https://discuss.kubernetes.io/t/why-stdout-logs-are-accounted-in-ephemeral-storage-usage/27815.
However, the more I think about it, the more I'm convinced that this is an actual issue.
When Kubelet calculates ephemeral storage usage, it incorporates the CRI managed logs' usage in the calculation:
Code snippet from pkg/kubelet/eviction/eviction_manager.go:
However those log messages are completely out of control of the container itself, as it has no influence of their retention / lifecycle after leaving the container premises: Instead it's Kubelet's responsibility to define how long it wants to retain those logs, based on the --container-log-max-files , --container-log-max-size settings.
Because of this, if the container-log-max-size * container-log-max-files is more than the container's ‘ephemeral-storage’ limit, the container could be evicted (without writing a single byte to any of it's mounted volumes) over it's lifetime.
What did you expect to happen?
POD / container should not be evicted without reaching the container defined limit related to the ephemeral volumes under the container's control.
How can we reproduce it (as minimally and precisely as possible)?
PoC example:
Anything else we need to know?
No response
Kubernetes version
Cloud provider
N/A
OS version
No response
Install tools
No response
Container runtime (CRI) and version (if applicable)
No response
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: