Copy logs from dead containers to local files to facilitate immediate GC of dead containers #26923

vishh · 2016-06-07T00:43:16Z

One of the reasons for keeping dead containers around in kubelet today is to get access to logs from previous instances. Dead container instances can take up disk space via their root filesystem and result in disk pressure on the nodes. To alleviate disk pressure and improve the logging experience in kubernetes, kubelet can retrieve logs from dead containers and GC them right away once kubelet doesn't depend on metadata associated with old containers.
Specifically,

Kubelet can retrieve logs from the runtime and store in a per-pod, per-container directory inside of /var/log/ directory. For the docker runtime, this can be a simple move operation. For rkt, we will have to retrieve the logs remotely.
The directory structure can be /var/log/<podUID>/<ContainerName>/<InstanceNumber>_stdout.log
Update kubectl logs -p to return logs from the files instead of relying on the runtime for previous logs.
These logs will be kept around on a best-effort basis and will be deleted whenever there is disk pressure.
Kubelet can prefer keeping the first and most recent instances of a container around and aggressively delete other log files.
All these logs will be accessible initially via the /logs REST endpoint. In the future, we can consider expanding kubectl logs interface to support an instance number or add support for the first attempt specifically.

The text was updated successfully, but these errors were encountered:

resouer · 2016-06-07T05:46:04Z

Will we do this for dead containers only, or this apply to all /logs cases?

vishh · 2016-06-07T19:16:43Z

As of now, logs of running containers are controlled by runtimes. To avoid duplicated storage of log files, it's better to avoid managing logs of running instances for now.

fejta-bot · 2017-12-15T21:02:39Z

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

ankon · 2018-01-11T13:22:42Z

This issue is still very much relevant for us: having access to the logs of a dead container is needed, but most of the container itself is just using disk space that should be reclaimed and better used for something more productive.

fejta-bot · 2018-02-11T15:04:09Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-13T15:50:32Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

hzxuzhonghu · 2019-03-09T09:02:45Z

/reopen

hzxuzhonghu · 2019-03-09T09:04:21Z

We also has a case which depend on get container logs when the job completed successfully.

krancour · 2020-01-17T20:17:19Z

This issue is relevant for me. I use fluentd to get logs from pods and send them elsewhere, but two conditions can make me lose valuable logs:

Due to volume, agent falls behind and pod (for whatever reason) is deleted before it catches up. With the log file ripped out from under us, logs go missing.
If a pod's total lifetime is very small, the log files may be deleted before the agent event notices there's a new file to tail.

This seems to me to be a big problem. How many folks are using the likes of fluentd, specifically in an attempt to preserve their logs and don't even realize that in some cases, the kubelet isn't even giving them a fair opportunity to do so?

krancour · 2020-01-17T20:17:28Z

/reopen

k8s-ci-robot · 2020-01-17T20:17:29Z

@krancour: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-02-16T20:32:32Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-02-16T20:32:40Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

krancour · 2020-02-16T20:49:21Z

/reopen

k8s-ci-robot · 2020-02-16T20:49:23Z

@krancour: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2020-03-17T21:16:30Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-03-17T21:16:38Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

krancour · 2020-03-17T22:33:31Z

/reopen

k8s-ci-robot · 2020-03-17T22:33:32Z

@krancour: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

colltoaction · 2020-04-03T18:01:35Z

This is relevant for me the same way as for @krancour's first case. Would it be fair to say that having a minimum-container-ttl-duration aligned with our fluentd configuration would be enough to preserve logs? I understand that if fluentd is stuck we would still lose logs, but I think we can take that risk. E.g we could set minimum-container-ttl-duration to 10 minutes.

Also, I see that --minimum-container-ttl-duration is deprecated with rationale deprecated once old logs are stored outside of container’s context, but I can't find a new flag covering this use case. Does this mean that the flag is deprecated because a new solution is coming ("store logs outside of container’s context") but for now this is the only way?

Thanks!

EDIT. If I read correctly, it seems that the property is compared with the container creation time, so that wouldn't help save logs if the container was long-lived. I'll keep reading the code to try find something useful.

fejta-bot · 2020-05-05T19:21:25Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-05-05T19:21:40Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vishh added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 7, 2016

vishh added this to the next-candidate milestone Jun 7, 2016

yujuhong mentioned this issue Aug 19, 2016

Aggressively GC dead containers #13287

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 11, 2018

k8s-ci-robot closed this as completed Mar 13, 2018

k8s-ci-robot reopened this Jan 17, 2020

k8s-ci-robot closed this as completed Feb 16, 2020

k8s-ci-robot reopened this Feb 16, 2020

k8s-ci-robot closed this as completed Mar 17, 2020

k8s-ci-robot reopened this Mar 17, 2020

k8s-ci-robot closed this as completed May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy logs from dead containers to local files to facilitate immediate GC of dead containers #26923

Copy logs from dead containers to local files to facilitate immediate GC of dead containers #26923

vishh commented Jun 7, 2016

resouer commented Jun 7, 2016

vishh commented Jun 7, 2016

fejta-bot commented Dec 15, 2017

ankon commented Jan 11, 2018

fejta-bot commented Feb 11, 2018

fejta-bot commented Mar 13, 2018

hzxuzhonghu commented Mar 9, 2019

hzxuzhonghu commented Mar 9, 2019

krancour commented Jan 17, 2020

krancour commented Jan 17, 2020

k8s-ci-robot commented Jan 17, 2020

fejta-bot commented Feb 16, 2020

k8s-ci-robot commented Feb 16, 2020

krancour commented Feb 16, 2020

k8s-ci-robot commented Feb 16, 2020

fejta-bot commented Mar 17, 2020

k8s-ci-robot commented Mar 17, 2020

krancour commented Mar 17, 2020

k8s-ci-robot commented Mar 17, 2020

colltoaction commented Apr 3, 2020 •

edited

fejta-bot commented May 5, 2020

k8s-ci-robot commented May 5, 2020

Copy logs from dead containers to local files to facilitate immediate GC of dead containers #26923

Copy logs from dead containers to local files to facilitate immediate GC of dead containers #26923

Comments

vishh commented Jun 7, 2016

resouer commented Jun 7, 2016

vishh commented Jun 7, 2016

fejta-bot commented Dec 15, 2017

ankon commented Jan 11, 2018

fejta-bot commented Feb 11, 2018

fejta-bot commented Mar 13, 2018

hzxuzhonghu commented Mar 9, 2019

hzxuzhonghu commented Mar 9, 2019

krancour commented Jan 17, 2020

krancour commented Jan 17, 2020

k8s-ci-robot commented Jan 17, 2020

fejta-bot commented Feb 16, 2020

k8s-ci-robot commented Feb 16, 2020

krancour commented Feb 16, 2020

k8s-ci-robot commented Feb 16, 2020

fejta-bot commented Mar 17, 2020

k8s-ci-robot commented Mar 17, 2020

krancour commented Mar 17, 2020

k8s-ci-robot commented Mar 17, 2020

colltoaction commented Apr 3, 2020 • edited

fejta-bot commented May 5, 2020

k8s-ci-robot commented May 5, 2020

colltoaction commented Apr 3, 2020 •

edited