Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and/or doucument an easy way to collect information about a Kubernetes pod. #1473

Open
jcvenegas opened this issue Feb 26, 2021 · 1 comment
Assignees
Labels
area/debug Debug support and debuggability enhancement Improvement to an existing feature

Comments

@jcvenegas
Copy link
Member

Debug Kata in an enviroment like K8s increase the complexity to find good intial data to identify what is broken. Today Kata is working solutions to improve the observability.

Debug Kata in an enviroment like K8s increase the complexity to find good intial data to identify what is broken. Today Kata is working solutions to improve the observability.

  1. Tracing : From runtime/shim to agent
  2. Logging: Kata today provides different log levels that will go to syslog.
    Included: runtime/shim/hypervisor

While tracing may be helpful for some cases specially to identify where the times goes in Kata.

I would like to get a new script similar to kata collection script but for a Pod.

This is given a K8s pod I get information of it .

./get-pod-logs.sh

  1. Find the associated kata container IDs for that pod
  2. Get cronological ordered for logs of the pod (running in the host)

The general idea would be something like

journalctl -t kata -t kata-vmm  -t virtiofsd | grep kata-pod-id

so the output is a cronological ordered logs
From inital container
vmm logs
vmm tty logs (so agent logs as well)
virtiofsd logs

Solutions:

In all the solutions first we the script tracks from k8s pod id to get the name kata containers ids to get kata information. Then
just check in jorunal the output logs for the containers and assets associated with it.

1) Script search the pids of process

Once it has the kata containers ids
do

shim_pid=$(ps aux | grep  shimv2 | grep containerd | filter_ps_pid)
vmm_pid=$(get_childs $shim_pid| grep vmm | filter_ps_pid)
virtiofd_pid=$(get_childs $shim_pid| grep vmm | filter_ps_pid)

get logs

journalctl -t kata -t virtiofsd -t kata-vmm | grep kata[$shim-pid|virtiofsd[$virtiosd_pid|$virtiofsd_pid

Pros

Not a lot of modifications in kata stack

Cons

If the container process is gone or some of the components are gone it may be more difficult to find the pids
In the case of virtiofsd the main process is not the one that gives logs is the child of this process, but this is
implementation specific and is not sure that will change in the future.

2) Kata runtime provide information about pids of:

shim, vmm, virtiofsd(main process and fork)

kata-runtime kata-info <container-id>
{
  id: <id>
  shim_pid: pid
  virtiofsd_pid: pid
  vmmpid: pid
  pod_id: id
  is_pod: true|false
}

Pros

More clean interface to query kata specific information
If some of the process asociated to the container are gone, we still have the pids and we can still filter logs by pids

Cons

Still virtiofsd logging is not enough by filter with PID, as the real logs are by a fork process.

Other changes for external components

For some external components would be nice that they add extra metadata provided by kata.
###virtiofsd:

virtiofsd -o debug_prefix "container-id" so regardless of the pid or internal implementation of virtiofsd we can just filter data with
journalclt -t virtiofsd | grep container-id

or instead of ask virtiofsd to log to syslog we can use a systemd-cat redirection and save the pid of systemd-cat(as we do in VMMs

@dagrh

For other components it may be a nice to have, but today components like cloud-hypervisor log collection is enough as output is sent to syslog by using systemd-cat so the PID to filter is the systemd-cat pid

\cc @egernst as has some interest on get better debug/observability
\cc @jodh-intel is doing tracing and wrote kata collect script
\cc @cmaf is doing tracing
\cc @fidencio @chavafg @GabyCT that may face some issues in integrations in the past.

Please ping someone else that may be interested to get a nice sequence of events to debug specific kata issues using k8s (that is the main way to use kata today).
@bergwolf in case has a nice way to track this on debugging complex kata issues.

@jcvenegas jcvenegas added enhancement Improvement to an existing feature needs-review Needs to be assessed by the team. area/debug Debug support and debuggability labels Feb 26, 2021
@fidencio
Copy link
Member

fidencio commented Mar 2, 2021

@jcvenegas, just a few thoughts here.

We should also get CRI-O and containerd logs, as when using shimv2 a whole bunch of kata info ends up being logged on CRI-O / containerd journal.

One thing we have to keep in mind that is that even to get kata-containers logs we do what I see as a not so elegant hack, from the CRI-O (and I guess also from the containerd) side.
Take a look here:
https://github.com/cri-o/cri-o/blob/9ce5dd6bb2f98d1bfbc273fa53a9f6397bb84a47/internal/oci/runtime_vm.go#L188-L191

There we check whether CRI-O's logLevel is set to debug, and if so, we pass -debug to containerd-shim-kata-v2 process.

Isn't this weird? In my mind, we'd always pass "-debug" to containerd-shim-kata-v2 process, and then set the logLevel we need on Kata Containers' configuration file. Does this sound reasonable?

Then we'd need to change Kata Containers' configuration file to actually get such information and to allow us to distinguish between all the different log levels as, nowadays, we have "true|false" and nothing else, which ends up being waaaaay too verbose (or not verbose at all).

Dunno, this is what @marcel-apf started working on some time ago and we didn't reach an agreement. I guess this is really worth to revisit, and try to get some things moving in order to improve the debuggability of the project.

Sorry if I deviated too much here, @jcvenegas. /o\

@ariel-adam ariel-adam removed the needs-review Needs to be assessed by the team. label Mar 2, 2021
@ariel-adam ariel-adam moved this from To do to area CI/testing/debug in Issue backlog Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/debug Debug support and debuggability enhancement Improvement to an existing feature
Projects
Issue backlog
  
area CI/testing/debug
Development

No branches or pull requests

3 participants