Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

Cant open pod logs (probably problem with ids) #2640

Closed
pskrzyns opened this issue May 16, 2016 · 27 comments
Closed

Cant open pod logs (probably problem with ids) #2640

pskrzyns opened this issue May 16, 2016 · 27 comments

Comments

@pskrzyns
Copy link
Contributor

pskrzyns commented May 16, 2016

When running kubernetes with rkt i cant get logs, from kubectl get logs.
After some debugging i found:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/rkt/log.go

Environment

rkt Version: 1.6.0+git5d8f302
appc Version: 0.8.1
Go Version: go1.6.1
Go OS/Arch: linux/amd64
Features: +TPM
--
Linux 4.5.2-coreos x86_64
--
NAME=CoreOS
ID=coreos
VERSION=1032.1.0
VERSION_ID=1032.1.0
BUILD_ID=2016-05-05-1941
PRETTY_NAME="CoreOS 1032.1.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
--
systemd 229
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT -GNUTLS -ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN

What did you do?

/opt/bin/rkt list --full
UUID                                    APP     IMAGE NAME                                      IMAGE ID                STATE   CREATED        STARTED                          NETWORKS
fc032abf-7622-4f18-b732-dc73f943c036    nginx   registry-1.docker.io/library/nginx:latest       sha512-d1a304bb63e8     running 2016-05-16 21:38:19 +0200 CEST  2016-05-16 21:38:19 +0200 CEST  rkt.kubernetes.io:ip4=10.1.38.2, default-restricted:ip4=172.16.28.2

 journalctl -m _MACHINE_ID=fc032abf76224f18b732dc73f943c036 -u nginx -a

What did you expect to see?
logs from pod, including:

journalctl -u k8s_fc032abf-7622-4f18-b732-dc73f943c036
-- Logs begin at Sat 2016-05-14 11:01:39 CEST, end at Mon 2016-05-16 22:02:37 CEST. --
May 16 21:38:19 localhost systemd[1]: Stopped k8s_fc032abf-7622-4f18-b732-dc73f943c036.service.
May 16 21:38:19 localhost systemd[1]: Started k8s_fc032abf-7622-4f18-b732-dc73f943c036.service.
May 16 21:38:19 localhost rkt[25629]: networking: loading networks from /etc/rkt/net.d
May 16 21:38:19 localhost rkt[25629]: networking: loading network rkt.kubernetes.io with type flannel
May 16 21:38:19 localhost rkt[25629]: networking: loading network default-restricted with type ptp
May 16 21:52:17 localhost rkt[25629]: 10.1.38.1 - - [16/May/2016:19:52:17 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.1" "-"
May 16 21:52:20 localhost rkt[25629]: 10.1.38.1 - - [16/May/2016:19:52:20 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.1" "-"
May 16 21:52:21 localhost rkt[25629]: 10.1.38.1 - - [16/May/2016:19:52:21 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.1" "-"
May 16 21:52:22 localhost rkt[25629]: 10.1.38.1 - - [16/May/2016:19:52:22 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.1" "-"
May 16 21:52:22 localhost rkt[25629]: 10.1.38.1 - - [16/May/2016:19:52:22 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.1" "-"
May 16 21:52:23 localhost rkt[25629]: 10.1.38.1 - - [16/May/2016:19:52:23 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.47.1" "-"

What did you see instead?

An error was encountered while opening journal file /var/log/journal/271b753befe94448bdd8f18d393acc45, ignoring file.
-- No entries --

Additional info:

ls -l /var/log/journal/ | grep rkt
lrwxrwxrwx. 1 root                   systemd-journal         121 May 12 19:10 271b753befe94448bdd8f18d393acc45 -> /var/lib/rkt/pods/run/271b753b-efe9-4448-bdd8-f18d393acc45/stage1/rootfs/var/log/journal/271b753befe94448bdd8f18d393acc45
lrwxrwxrwx. 1 root                   systemd-journal         121 May 16 21:38 fc032abf76224f18b732dc73f943c036 -> /var/lib/rkt/pods/run/fc032abf-7622-4f18-b732-dc73f943c036/stage1/rootfs/var/log/journal/fc032abf76224f18b732dc73f943c036

ls -l /var/lib/rkt/pods/run/
total 8
drwxr-s---. 7 root rkt 4096 May 16 21:38 fc032abf-7622-4f18-b732-dc73f943c036

it looks like container id is messed somewhere
additionaly rkt gc doesnt clear this noexisting container logs

@pskrzyns
Copy link
Contributor Author

cc @yifan-gu , @woodbor

@yifan-gu
Copy link
Contributor

I don't know why it was opening /var/log/journal/271b753befe94448bdd8f18d393acc45 while you were doing journalctl -m _MACHINE_ID=fc032abf76224f18b732dc73f943c036 -u nginx -a ?

@pskrzyns
Copy link
Contributor Author

Don't know, after i manualy removed it, i still cant see logs

@yifan-gu
Copy link
Contributor

cc @alban @iaguis

@jellonek
Copy link
Contributor

Probably connected with #947
Maybe also connected with #1463

@iaguis
Copy link
Member

iaguis commented May 17, 2016

I don't know why it was opening /var/log/journal/271b753befe94448bdd8f18d393acc45 while you were doing journalctl -m _MACHINE_ID=fc032abf76224f18b732dc73f943c036 -u nginx -a ?

journalctl -m gets all the logs and merges them together, then you can filter (e.g. _MACHINE_ID.

@iaguis
Copy link
Member

iaguis commented May 17, 2016

@pskrzyns Does the target of the log link exist? Do you know if rkt was run with --interactive?

@pskrzyns
Copy link
Contributor Author

@iaguis yes target existed

lrwxrwxrwx. 1 root                   systemd-journal         121 May 16 21:38 fc032abf76224f18b732dc73f943c036 -> /var/lib/rkt/pods/run/fc032abf-7622-4f18-b732-dc73f943c036/stage1/rootfs/var/log/journal/fc032abf76224f18b732dc73f943c036

99% it wasnt run with interactive, cuase kubelet was creating pod, i will check it in few minutes

@pskrzyns
Copy link
Contributor Author

Its not using --interactive

/opt/bin/rkt cat-manifest a68e9b84
{"acVersion":"0.7.4+git","acKind":"PodManifest","apps":[{"name":"nginx","image":{"id":"sha512-d1a304bb63e81a43a04e676bb531c9cea239a2dbaa5ec6a2708f87a8e65ab709"},"app":{"exec":["nginx","-g","daemon off;"],"user":"0","group":"0","environment":[{"name":"NGINX_VERSION","value":"1.9.15-1~jessie"},{"name":"KUBERNETES_SERVICE_PORT_HTTPS","value":"443"},{"name":"KUBERNETES_PORT_443_TCP_PROTO","value":"tcp"},{"name":"KUBERNETES_PORT_443_TCP_ADDR","value":"10.2.0.1"},{"name":"KUBERNETES_SERVICE_PORT","value":"443"},{"name":"PATH","value":"/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"},{"name":"KUBERNETES_PORT","value":"tcp://10.2.0.1:443"},{"name":"KUBERNETES_PORT_443_TCP","value":"tcp://10.2.0.1:443"},{"name":"KUBERNETES_PORT_443_TCP_PORT","value":"443"},{"name":"KUBERNETES_SERVICE_HOST","value":"10.2.0.1"}],"mountPoints":[{"name":"termination-message-675a5b8e-1c46-11e6-a624-da5cfcca6b40","path":"/dev/termination-log"},{"name":"default-token-05w6t","path":"/var/run/secrets/kubernetes.io/serviceaccount","readOnly":true}],"ports":[{"name":"443-tcp","protocol":"tcp","port":443,"count":1,"socketActivated":false},{"name":"80-tcp","protocol":"tcp","port":80,"count":1,"socketActivated":false},{"name":"nginx-tcp-80","protocol":"TCP","port":80,"count":1,"socketActivated":false}]},"annotations":[{"name":"rkt.kubernetes.io/container-hash","value":"3678416530"},{"name":"rkt.kubernetes.io/termination-message-path","value":"/var/lib/kubelet/pods/6a20eb39-1c46-11e6-806f-001e6776a094/containers/nginx/675a5b8e-1c46-11e6-a624-da5cfcca6b40"}]}],"volumes":[{"name":"termination-message-675a5b8e-1c46-11e6-a624-da5cfcca6b40","kind":"host","source":"/var/lib/kubelet/pods/6a20eb39-1c46-11e6-806f-001e6776a094/containers/nginx/675a5b8e-1c46-11e6-a624-da5cfcca6b40"},{"name":"default-token-05w6t","kind":"host","source":"/var/lib/kubelet/pods/6a20eb39-1c46-11e6-806f-001e6776a094/volumes/kubernetes.io~secret/default-token-05w6t"}],"isolators":null,"annotations":[{"name":"rkt.kubernetes.io/managed-by-kubelet","value":"true"},{"name":"rkt.kubernetes.io/uid","value":"6a20eb39-1c46-11e6-806f-001e6776a094"},{"name":"rkt.kubernetes.io/name","value":"nginx"},{"name":"rkt.kubernetes.io/namespace","value":"default"},{"name":"rkt.kubernetes.io/restart-count","value":"0"}],"ports":null}
/opt/bin/rkt status a68e9b84
state=running
created=2016-05-17 17:45:40 +0200 CEST
started=2016-05-17 17:45:40 +0200 CEST
networks=rkt.kubernetes.io:ip4=10.1.38.3, default-restricted:ip4=172.16.28.3
pid=44029
exited=false
machinectl 
MACHINE                                  CLASS     SERVICE
rkt-a68e9b84-70a4-4ffb-8f9a-9bed58ed833a container rkt    

1 machines listed.
ls -l /var/log/journal/
total 24
drwxr-sr-x. 2 root                   systemd-journal        4096 May 17 17:51 98bd53d93a0443a5b7c5b9cba2737482
lrwxrwxrwx. 1 root                   systemd-journal         121 May 17 17:45 a68e9b8470a44ffb8f9a9bed58ed833a -> /var/lib/rkt/pods/run/a68e9b84-70a4-4ffb-8f9a-9bed58ed833a/stage1/rootfs/var/log/journal/a68e9b8470a44ffb8f9a9bed58ed833a
drwxr-sr-x. 2 systemd-journal-remote systemd-journal-remote 4096 May  5 21:56 remote
ls -l /var/lib/rkt/pods/run/a68e9b84-70a4-4ffb-8f9a-9bed58ed833a/stage1/rootfs/var/log/journal/a68e9b8470a44ffb8f9a9bed58ed833a/
total 8196
-rw-r-----. 1 root root 8388608 May 17 17:50 system.journa

@iaguis
Copy link
Member

iaguis commented May 18, 2016

When I run rkt in a systemd unit file, my logs look like:

> journalctl -u run-r0fb934a16c79446198a54813cfba77b0.service
-- Logs begin at Sun 2016-01-10 03:53:46 CET, end at Wed 2016-05-18 17:53:46 CEST. --
May 18 17:53:42 locke-xps13 systemd[1]: Started /home/iaguis/work/coreos/go/src/github.com/coreos/rkt/build-rkt/bin/rkt run coreos.com/etcd:v2.0.10.
May 18 17:53:43 locke-xps13 rkt[9218]: image: using image from local store for image name coreos.com/rkt/stage1-coreos:1.6.0+git68f072c
May 18 17:53:43 locke-xps13 rkt[9218]: image: using image from local store for image name coreos.com/etcd:v2.0.10
May 18 17:53:43 locke-xps13 rkt[9218]: networking: loading networks from /etc/rkt/net.d
May 18 17:53:43 locke-xps13 rkt[9218]: networking: loading network default with type ptp
May 18 17:53:43 locke-xps13 rkt[9218]: [27913.931563] etcd[5]: 2016/05/18 15:53:43 etcd: no data-dir provided, using default data-dir ./default.etcd
May 18 17:53:43 locke-xps13 rkt[9218]: [27913.932052] etcd[5]: 2016/05/18 15:53:43 etcd: listening for peers on http://localhost:2380
May 18 17:53:43 locke-xps13 rkt[9218]: [27913.932380] etcd[5]: 2016/05/18 15:53:43 etcd: listening for peers on http://localhost:7001

In the logs you pasted before you don't have the container process after the rkt[9218]:. That's weird.

@pskrzyns
Copy link
Contributor Author

@iaguis could you show me your unit file ?

@iaguis
Copy link
Member

iaguis commented May 19, 2016

$ systemctl cat rkt-test
# /run/systemd/system/rkt-test.service
[Unit]
Description=Test rkt
[Service]
ExecStart=/home/iaguis/work/coreos/go/src/github.com/coreos/rkt/build-rkt/bin/rkt run coreos.com/etcd:v2.0.10

@pskrzyns
Copy link
Contributor Author

In my case it looks like:

Run command:[prepare --quiet --pod-manifest /tmp/manifest-nginx-825906943 --stage1-path /opt/rkt/stage1-coreos.aci]
# /run/systemd/system/k8s_ec54daa6-379f-4756-9b86-bae152de64c8.service
[Service]
ExecStart=/opt/bin/rkt --insecure-options=image,ondisk --local-config=/etc/rkt --system-config=/usr/lib/rkt --dir=/var/lib/rkt run-prepared --net=rkt.kubernetes.io --dns=10.2.0.10 --dns-search=default.svc.cluster.local --dns-search=svc.cluster.local --dns-search=cluster.local --dns-opt=ndots:5 --hostname=nginx ec54daa6-379f-4756-9b86-bae152de64c8
ExecStopPost=/usr/bin/touch /var/lib/kubelet/pods/1cf4cbbb-1daf-11e6-a069-001e6776a094/finished-ec54daa6-379f-4756-9b86-bae152de64c8
KillMode=mixed

@pskrzyns
Copy link
Contributor Author

pskrzyns commented May 19, 2016

for etcd i have similiar logs like you have

journalctl -u k8s_3cf80863-e690-4b78-b06e-27b67196db09.service
-- Logs begin at Tue 2016-05-17 04:50:06 CEST, end at Thu 2016-05-19 13:04:50 CEST. --
May 19 13:02:27 localhost systemd[1]: Stopped k8s_3cf80863-e690-4b78-b06e-27b67196db09.service.
May 19 13:02:27 localhost systemd[1]: Started k8s_3cf80863-e690-4b78-b06e-27b67196db09.service.
May 19 13:02:27 localhost rkt[39834]: networking: loading networks from /etc/rkt/net.d
May 19 13:02:27 localhost rkt[39834]: networking: loading network rkt.kubernetes.io with type flannel
May 19 13:02:27 localhost rkt[39834]: networking: loading network default-restricted with type ptp
May 19 13:02:27 localhost rkt[39834]: [259175.378021] etcd[5]: 2016/05/19 11:02:27 etcd: no data-dir provided, using default 
May 19 13:02:27 localhost rkt[39834]: [259175.378262] etcd[5]: 2016/05/19 11:02:27 etcd: listening for peers on http://loc

and i can see etcd logs:

kubectl logs etcd 
2016/05/19 11:02:27 etcd: no data-dir provided, using default data-dir ./default.etcd
2016/05/19 11:02:27 etcd: listening for peers on http://localhost:2380
2016/05/19 11:02:27 etcd: listening for peers on http://localhost:7001
2016/05/19 11:02:27 etcd: listening for client requests on http://localhost:2379
2016/05/19 11:02:27 etcd: listening for client requests on http://localhost:4001
2016/05/19 11:02:27 etcdserver: datadir is valid for the 2.0.1 format

@iaguis
Copy link
Member

iaguis commented May 20, 2016

The issue is basically https://bugzilla.redhat.com/show_bug.cgi?id=1212756

The nginx image writes its logs to /var/log/nginx/access.log which is a symlink to /dev/stdout. We run apps in a systemd unit file and, if we don't pass --interactive, we set the option StandardOutput to journal+console. This means that the standard output of the process is sent to the journal and to the system console and everything works fine.

However, apps that write directly to /dev/stdout won't work on this configuration because the kernel doesn't support opening (with the open() syscall) sockets (the journal listens on a socket).

Since Docker images expect being able to write to /dev/stdout, we added a workaround when converting Docker images to ACI where we link /dev/stdout to /dev/console. In this way, images that write to /dev/stdout don't fail, but write their messages to the system console and the user can see their output.

Unfortunately, that also means the output doesn't go to the journal, which is the behavior we're seeing here.

Related: #1617

@iaguis
Copy link
Member

iaguis commented May 20, 2016

Setting these values in /etc/nginx/nginx.conf in the image makes the error log go to both stderr and the journal and the access log go to the journal only:

error_log stderr;

http {
    access_log syslog:server=unix:/dev/log main;
    [...]
}

@alban
Copy link
Member

alban commented May 20, 2016

Implementing a "access_log stdout" in nginx and get the Docker image to use that would be my preferred option (#1617 (comment)). I don't know how to fix this otherwise.

Alternatively, if we ever add an additional process for managing stdout/stderr as per #1799 and systemd/systemd#2069, that could fix this at the same time.

@pskrzyns
Copy link
Contributor Author

@yifan-gu : do we have another method for logs gathering in kubernetes that the one currently implemented ?
nginx was just example of app writing to stdout

@yifan-gu
Copy link
Contributor

@pskrzyns I don't think so, unless users redirect the log to a volume by themselves.

@philips
Copy link
Contributor

philips commented May 25, 2016

@iaguis can't we fix up the base images of nginx to do this? And we should write a doc documenting this.

@iaguis
Copy link
Member

iaguis commented May 27, 2016

Let's try to figure out this for the next release.

@iaguis iaguis added this to the v1.8.0 milestone May 27, 2016
@iaguis iaguis removed this from the v1.7.0 milestone May 27, 2016
@borqosky
Copy link
Contributor

@iaguis : I can update the docs, but what is the direction according to this:

  • are we going to just fix a docs and an image
  • or plan some deep research in the future release ?

@lucab lucab changed the title Cant open pod logs (propably problem with ids) Cant open pod logs (probably problem with ids) Jun 9, 2016
@lucab lucab modified the milestones: v1.9.0, v1.8.0 Jun 9, 2016
@lucab
Copy link
Member

lucab commented Jun 9, 2016

@woodbor I think documenting logging behavior is a good start. Same for #2417. We have some not-a-bug behaviors which works as intended but could surprise users. As they are related, can you please tackle the two tickets together?

@iaguis
Copy link
Member

iaguis commented Jun 9, 2016

@iaguis can't we fix up the base images of nginx to do this? And we should write a doc documenting this.

I don't think there's a way to fix the image to keep the current Docker behavior and make it work on rkt unless the option access_log stdout is implemented in upstream nginx.

@tmrts tmrts modified the milestones: v1.11.0, v1.9.0 Jun 23, 2016
@tmrts
Copy link
Contributor

tmrts commented Jun 23, 2016

@woodbor what's the status of this PR? I'm bumping the milestone

@pskrzyns
Copy link
Contributor Author

@tmrts plz assign it to me

@iaguis
Copy link
Member

iaguis commented Jul 20, 2016

Moving to v1+, please reassess

@iaguis iaguis modified the milestones: v1+, v1.11.0 Jul 20, 2016
pskrzyns pushed a commit to intelsdi-x/rkt that referenced this issue Aug 10, 2016
Description of known issues with logging and instructions how to handle
them.

related to rkt#2640 and rkt#2417
pskrzyns pushed a commit to intelsdi-x/rkt that referenced this issue Aug 10, 2016
Description of known issues with logging and instructions how to handle
them.

related to rkt#2640 and rkt#2417
pskrzyns pushed a commit to intelsdi-x/rkt that referenced this issue Aug 10, 2016
Description of known issues with logging and instructions how to handle
them.

related to rkt#2640 and rkt#2417
pskrzyns pushed a commit to intelsdi-x/rkt that referenced this issue Aug 30, 2016
Description of known issues with logging and instructions how to handle
them.

related to rkt#2640 and rkt#2417
antoni pushed a commit to intelsdi-x/rkt that referenced this issue Sep 9, 2016
Description of known issues with logging and instructions how to handle
them.

related to rkt#2640 and rkt#2417
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants