Capture application termination messages/output #139

bgrant0607 · 2014-06-17T21:05:23Z

When applications terminate, they may write out important information about the reason, such as assertion failure messages, uncaught exception messages, stack traces, etc. We should establish an interface for capturing such information in a first-class way for termination reporting, in addition to whatever is logged.

I suggest we pull the deathrattle message from /dev/final-log or something similar.

vishh · 2014-06-23T23:48:33Z

Is /run a tmpfs? There is an outstanding PR for this in libcontainer.

bgrant0607 · 2014-06-24T01:31:20Z

Good call. Looks like not.

Here's df from a google/nodejs container:
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 10188088 1639712 8007808 17% /
none 10188088 1639712 8007808 17% /
tmpfs 304556 0 304556 0% /dev
shm 65536 0 65536 0% /dev/shm
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /.dockerinit
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /etc/resolv.conf
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /etc/hostname
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /etc/hosts
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /data
tmpfs 304556 0 304556 0% /proc/kcore

thockin · 2014-06-24T05:20:36Z

Is /run LSB compliant?

bgrant0607 · 2014-06-24T05:36:05Z

It will be once LSB is updated to FHS 3.0:
http://www.linuxbase.org/betaspecs/fhs/fhs.html#runRuntimeVariableData

http://askubuntu.com/questions/57297/why-has-var-run-been-migrated-to-run

bgrant0607 · 2014-07-18T16:26:33Z

Re. /run: The point of tmpfs was to avoid pathological disk latency and failure problems. However, we'd need the filesystem to remain live after termination of the main process. We want that for other reasons (e.g., hooks), but it doesn't exist yet.

Solomon expressed some interest in this on #docker-dev:
https://botbot.me/freenode/docker-dev/2014-07-18/?msg=18236306&page=2

bgrant0607 · 2014-09-30T01:45:20Z

One could also view this as simple container output. I could imagine using this for simple data-in/data-out functions, such as config generators.

One question would be whether we should make the path configurable and, if so, should we provide a means to tell the container what that path is? I could imagine allowing the user/client to specify the path and environment variable name.

However, I could also envision standardizing it for containers, potentially even beyond just Docker containers. For instance, could we use /dev/console, similar to VM console output in GCE? Or maybe another file in /dev.

Note also that /dev/stdout is linked to /dev/fd/1, /dev/stderr is linked to /dev/fd/2, and /dev/ptmx is linked to /dev/pts/ptmx.

bgrant0607 · 2014-10-08T18:47:30Z

/dev/console is used by some images/distributions, so it probably needs to be /dev/somethingelse.

bgrant0607 · 2014-10-21T20:59:42Z

Possible file names: /dev/stopmsg, /dev/finalstatus, /dev/deathrattle, ...

/cc @rjnagal

bgrant0607 · 2014-11-04T17:23:11Z

/dev/log is used by syslog.

How about /dev/final-log?

vishh · 2014-11-04T22:55:17Z

@bgrant0607 I assume we want to have a structured logging format for the death reason. If we were to define and provide a new interface, how can we promote adoption of this interface? Requiring application changes might hinder adoption.
Just capturing the last few log lines from 'docker logs' would be useful for users at this point.

bgrant0607 · 2014-11-04T23:22:48Z

@vishh

No, I don't want a structured logging format. I want the raw output. We can capture other termination information (e.g., time and reason) separately.

With respect to usage: We should ensure that it is easy for a user to add a PreStop hook to populate it.

As for automatic extraction from Docker logs: I'd want to strip the cruft and display the raw output. But how many lines? Fatal log messages are typically 1 line but stacktraces and uncaught language exceptions may be many lines. Rather than building this functionality into Kubernetes, we could provide a script or program that the user can mount into their container and run as a hook, with a configurable number of lines.

In terms of promotion: I'd like to see Docker, libcontainer, and the container community more broadly adopt a mechanism like this. The "container RFC" should have proposed something like this.

bgrant0607 · 2014-11-04T23:29:13Z

A PostStop hook would probably work better than PreStop.

vishh · 2014-11-04T23:44:06Z

Structured logging might provide ability to make restart decisions which
the infrastructure cannot make on its own - disk full errors vs some
internal application error.

Docker logs: I get that it is difficult to ascertain the number of log
lines that are critical to each individual application. But I feel having
this feature will be very useful to users because it doesn't require any
changes to their container. We can come up with a sane default and provide
an option to store the entire log file if required.

If the applications were to dump their death reason to a location that is
not on tmpfs, we can scrape that today, without having to rely on hooks.

On Tue, Nov 4, 2014 at 3:29 PM, bgrant0607 notifications@github.com wrote:

A PostStop hook would probably work better than PreStop.

—
Reply to this email directly or view it on GitHub
#139 (comment)
.

dchen1107 · 2014-11-05T01:23:34Z

@bgrant0607 how about /dev/termination_log? Put some thoughts on the issue this afternoon, and here is the rough design / proposal:

At API,
i) Introduce TerminationMessagePath field to Container. If the user doesn't specify, it could be /dev/termination_log.
ii) Introduce string field called Message to ContainerStateTerminated to capture application termination reason.
When kubelet come up, it creates /var/lib/kubelet/termination_logs on the node
When PodSpec request to kubelet, kubelet create an empty file with name: $containerName_$restart_count under /var/lib/kubelet/termination_logs/$podUUID for each new container
When run docker container, kubelet tells docker to bind mount such file created at 3) to container:$TerminationMessagePath
During garbage collection, we should remove the file created for such pod or container

bgrant0607 · 2014-11-05T04:50:33Z

@vishh @dchen1107 and I discussed this in person.

First of all, structure: Different kinds of information need to be communicated:

Concise reason string, similar to Reason in type Status, that can be used for analytics, customized behavior by ecosystem extensions, etc., similar to that described by More comprehensive reporting of termination reasons #137 and Support reason parameter on pod delete #1462, only provided upwards from the container.
Brief arbitrary application-specific string, such as Fatal log message, assertion failure message, stack trace, language exception message, etc. Not structured. We'd log it, return it in Status in the API, display it in the UI, etc. Similar to Message in type Status.
Arbitrary structured payload for use by ecosystem extensions, similar to Details in type Status.
Explicit override of default restart behavior, such as to not restart and kill the pod, kill the pod and reschedule to a different node, delay restart, or restart when normally the container would not be restarted.

We can definitely leave affordances in the API (in ContainerStateTerminated) for returning all of this information.

We're going to punt on at least (4) and probably (3) for now. I feel (2) is most important, but (1) is also widely useful.

Since I believe there is no clear line between infrastructure and user control components, I don't feel we should differentiate Kubelet-originated (e.g., SystemOOM) and other reasons (e.g., WatchDogTimeout) by using separate fields.

We should also generate events for eventualities the system should respond to, such as system OOM.

On the number of lines to pull from logs by default: Docker is at least planning to move to a write-oriented log model rather than a line-oriented model, which may solve this problem.

Regarding the file location:

Using a /dev location would mean that we'd need to manage the bind mounts in the user containers. Using a standard location would require application changes or adapter hooks. Regarding the specific name, termination_log is a bit long, but I agree it's more consistent with the other terminology. I'd use a hyphen rather than an underscore (/dev/termination-log), however.

Using a configurable path would require that we check whether it's in a volume or in the container's writable layer until Docker decouples the mount namespace lifetime from the main process lifetime. We'd copy from the host filesystem for the former and docker cp for the latter.

Setting the configurable path to, say, a glog Fatal log location would provide the data for (2) only. We'd have to synthesize the reason, which could, for instance, be "FatalLog". However, I don't see a good way to accurately characterize the failure reason without running some characterization code. We could provide a default characterization program for common formats, such as glog, Java exceptions, etc.

Finally, we might as well capture the termination-log even for successful termination. The application might emit a brief execution summary of some sort.

Full-blown application output should be handled via a different mechanism.

dchen1107 · 2014-11-06T00:04:51Z

@bgrant0607 what you commented above is aligned with my initial proposal. The only change I made is taking @vishh's suggestion to allow the user configured a path.

list processes

…eate Revert "grpc for CreatePod"

Update links and info for sig-api-machinery

maicohjf · 2019-03-24T07:12:07Z

Good call. Looks like not.

Here's df from a google/nodejs container:
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 10188088 1639712 8007808 17% /
none 10188088 1639712 8007808 17% /
tmpfs 304556 0 304556 0% /dev
shm 65536 0 65536 0% /dev/shm
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /.dockerinit
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /etc/resolv.conf
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /etc/hostname
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /etc/hosts
/dev/disk/by-uuid/485b0b37-5e5f-4878-85a4-2d8653315786 10188088 1639712 8007808 17% /data
tmpfs 304556 0 304556 0% /proc/kcore

Scale the deployment webserver to 6 pods

kubectl scale deployment/webserver --replicas=6

Fix deployment of recent bash

* add hello-app-redis * Update app-deployment.yaml * Update main.go * resolve comments

…-controllers Compile fixes

update README

…atches [release v1.28] k8s v1.28.11

bgrant0607 mentioned this issue Jun 17, 2014

PreStart and PostStop event hooks #140

Closed

jbeda added the enhancement label Jun 18, 2014

bgrant0607 mentioned this issue Jun 25, 2014

Configurable restart behavior #127

Closed

erictune added kubelet labels Jul 24, 2014

bgrant0607 mentioned this issue Aug 28, 2014

log aggregation #1071

Closed

bgrant0607 mentioned this issue Sep 13, 2014

Error surfacing #1088

Closed

bgrant0607 added the api/upward label Sep 30, 2014

bgrant0607 changed the title ~~Capture application termination messages~~ Capture application termination messages/output Sep 30, 2014

bgrant0607 mentioned this issue Sep 30, 2014

Container downward/upward API umbrella issue #386

Closed

bgrant0607 added the area/app-lifecycle label Sep 30, 2014

bgrant0607 mentioned this issue Sep 30, 2014

Ensure there is an easy way to provide container input (and to get output) #1503

Closed

bgrant0607 added this to the v0.9 milestone Oct 4, 2014

bgrant0607 mentioned this issue Oct 8, 2014

Add ability for container to publish metadata moby/moby#2336

Closed

bgrant0607 mentioned this issue Oct 11, 2014

Job Controller #1624

Closed

bgrant0607 added the workload/workflow label Oct 14, 2014

erictune assigned dchen1107 Nov 4, 2014

bgrant0607 mentioned this issue Nov 5, 2014

Add a human readable message to pod state. #2185

Merged

dchen1107 mentioned this issue Nov 5, 2014

Figure out how to best convey more detailed pod status info #1370

Closed

dchen1107 mentioned this issue Nov 7, 2014

Capture application termination messages/output #2225

Merged

bgrant0607 closed this as completed in #2225 Nov 10, 2014

dchen1107 modified the milestones: v0.5, v0.9 Nov 10, 2014

dchen1107 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 4, 2015

vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016

Merge pull request kubernetes#139 from monnand/list-procs

4d2cc62

list processes

badfd mentioned this issue Aug 17, 2016

'make test' broken: framework.DeleteAllEtcdKeys() undefined #30812

Closed

Rachitmehrotra mentioned this issue Sep 1, 2016

Ubuntu provider script is failing and not able to reach api-server (ubuntu,cluster,ubuntuProvider script) #31879

Closed

resouer pushed a commit to resouer/kubernetes that referenced this issue Sep 15, 2016

Merge pull request kubernetes#139 from hyperhq/revert-130-grpc-pod-cr…

0078baa

…eate Revert "grpc for CreatePod"

xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016

Merge pull request kubernetes#139 from kubernetes/mbohlool-patch-2

09aecc8

Update links and info for sig-api-machinery

marun added a commit to marun/kubernetes that referenced this issue Jun 4, 2020

Merge pull request kubernetes#139 from marun/fix-recent-bash

e9b260a

Fix deployment of recent bash

pjh pushed a commit to pjh/kubernetes that referenced this issue Jan 31, 2022

Add hello-app-redis (kubernetes#139)

cf8b43c

* add hello-app-redis * Update app-deployment.yaml * Update main.go * resolve comments

sttts pushed a commit to sttts/kubernetes that referenced this issue Sep 11, 2023

Merge pull request kubernetes#139 from sttts/sttts-mjudeikis/kcp-1.28…

69748f6

…-controllers Compile fixes

linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024

Merge pull request kubernetes#139 from CaoShuFeng/README

92a3294

update README

krunalhinguu pushed a commit to krunalhinguu/kubernetes that referenced this issue Jul 19, 2024

Merge pull request kubernetes#139 from krunalhinguu/v1.28-june-2024-p…

c063904

…atches [release v1.28] k8s v1.28.11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture application termination messages/output #139

Capture application termination messages/output #139

bgrant0607 commented Jun 17, 2014

vishh commented Jun 23, 2014

bgrant0607 commented Jun 24, 2014

thockin commented Jun 24, 2014

bgrant0607 commented Jun 24, 2014

bgrant0607 commented Jul 18, 2014

bgrant0607 commented Sep 30, 2014

bgrant0607 commented Oct 8, 2014

bgrant0607 commented Oct 21, 2014

bgrant0607 commented Nov 4, 2014

vishh commented Nov 4, 2014

bgrant0607 commented Nov 4, 2014

bgrant0607 commented Nov 4, 2014

vishh commented Nov 4, 2014

dchen1107 commented Nov 5, 2014

bgrant0607 commented Nov 5, 2014

dchen1107 commented Nov 6, 2014

maicohjf commented Mar 24, 2019

Capture application termination messages/output #139

Capture application termination messages/output #139

Comments

bgrant0607 commented Jun 17, 2014

vishh commented Jun 23, 2014

bgrant0607 commented Jun 24, 2014

thockin commented Jun 24, 2014

bgrant0607 commented Jun 24, 2014

bgrant0607 commented Jul 18, 2014

bgrant0607 commented Sep 30, 2014

bgrant0607 commented Oct 8, 2014

bgrant0607 commented Oct 21, 2014

bgrant0607 commented Nov 4, 2014

vishh commented Nov 4, 2014

bgrant0607 commented Nov 4, 2014

bgrant0607 commented Nov 4, 2014

vishh commented Nov 4, 2014

dchen1107 commented Nov 5, 2014

bgrant0607 commented Nov 5, 2014

dchen1107 commented Nov 6, 2014

maicohjf commented Mar 24, 2019