New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRI: Support CRI log stats #55905

Closed
Random-Liu opened this Issue Nov 16, 2017 · 8 comments

Comments

Projects
None yet
8 participants
@Random-Liu
Member

Random-Liu commented Nov 16, 2017

Who is using container log stats?

  1. Summary API provides container log disk usage. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/stats/v1alpha1/types.go#L121
  2. Ephemeral storage based eviction is using container log disk usage.

Who is collecting container log stats?

Cadvisor is collecting per-container log stats for docker today. See https://github.com/google/cadvisor/blob/master/container/docker/handler.go#L182.
However, we could not do that for CRI container log, because all CRI container logs are in a same pod directory, we don't have a per-container directory for cadvisor to collect container log stats.

As a temporary work around, we could point cadvisor to the container log file, as what cri-o is doing today. https://github.com/google/cadvisor/blob/master/container/crio/handler.go#L131-L138
However, this won't work after we add log rotation for CRI container logs.

Things need discussion

  1. Do we need per-container log stats in core metrics?
  • Option 1: Still support per-container log stats in core metrics. Then we should figure out a way to collect per-container log stats for CRI container runtime. Should we create a sub directory for each container in pod log directory?
  • Option 2: Only support per-pod log stats in core metrics. This is more straight forward based on today's CRI log directory format. However, we still need to resolve:
    • How does separate monitoring pipeline get per-container log stats?
    • How to deal with the regression / behavior change introduced by this change? (summary api, eviction)
  1. Who should collect container log stats in the future?
  • Option 1: Cadvisor + Container runtime? It seems redundant to me that CRI container runtimes still need to integrate with cadvisor independently, when their log path and format are the same.
  • Option 2: Kubelet + Cadvisor? Kubelet could call cadvisor library and get CRI log stats. Or kubelet could configure cadvisor properly to let cadvisor collect CRI log stats.
    If it's still cadvisor, should different container runtime integrate with cadvisor

@dchen1107 @yujuhong @dashpole @jingxu97 @abhi @mrunalp
@kubernetes/sig-node-feature-requests @kubernetes/sig-instrumentation-misc

@abhi

This comment has been minimized.

Member

abhi commented Jan 24, 2018

Today In sig-node https://docs.google.com/presentation/d/1BKbTa7RBVMTjZlk_6CV5SZfV4fen3bzk5PoqiXYIUK4/edit#slide=id.g3262a28663_0_0 was discussed. With the follow 4 options:

Option 1: Point Cadvisor to the container log file to monitor container stats.
Pros: Easier implementation.
Cons: With log rotation feature coming soon , this will only be a temporary workaround. How do we handle log rotation in this case ?

Option 2: Create a subdirectory per container and kubelet can inform cadvisor to monitor the the subdirectory to collect container stats.
Pros: Simpler to support log rotation.
Cons: All CRI runtimes will need to integrate with cadvisor integration to monitor the container log stats and support log rotation as well.This will break CRI backward compatibility and how long do we plan to support Cadvisor integration ?

Option 3: Container runtimes to support per container log stat collection
Pros: Cadvisor dependency may or may not be used. Upto the runtime to decide.
Cons: May need CRI stats API changes. Container runtime still doesnt know how the log is rotated as well. Redundant work.

Option 4: Let kubelet monitor container log stats
Pros: Container log directory may not be needed. Kubelet has information regarding log rotation based on current design.

Based on the discussion Option 2 and 4 are considered candidate. Since Option 2 will still per runt time change, we are looking at pursuing Option 4.
cc @Random-Liu @mikebrow

Any thoughts ? @mrunalp @runcom

@yujuhong yujuhong modified the milestones: next-candidate, v1.10 Jan 24, 2018

@brancz

This comment has been minimized.

Member

brancz commented Jan 24, 2018

I understand that these statistics are interesting to know, but I'm wondering if they need to be part of core/resource metrics at all. It feels to me this falls under normal host monitoring. I'm not sure I understand why this would need to be exposed by cAdvisor or CRI.

@yujuhong

This comment has been minimized.

Contributor

yujuhong commented Jan 24, 2018

I understand that these statistics are interesting to know, but I'm wondering if they need to be part of core/resource metrics at all. It feels to me this falls under normal host monitoring. I'm not sure I understand why this would need to be exposed by cAdvisor or CRI.

I don't want to get into the argument of what's core and the difference between core and Summary API, so I'll set that aside :-)
kubelet still needs this to enforce disk policy and eviction, and it can't do that without the log stats.

@Random-Liu

This comment has been minimized.

Member

Random-Liu commented Jan 25, 2018

Some clarification:

  • For option 2. It's not possible that "kubelet can inform cadvisor to monitor the the subdirectory to collect container stats". Today Kubelet is talking to cadvisor via cadvisor API, and cadvisor API doesn't support any configuration. It means that if we go with option 2, different container runtime still need to integrate with cadvisor, and let cadvisor provide container log stats to Kubelet.
  • For option 4. Kubelet is already monitoring disk stats for volumes, and we have existing util functions for that, such as https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/metrics_du.go. We can directly reuse them.

Given so, I also agree that option 4 seems to be a better option now.

@mrunalp

This comment has been minimized.

Contributor

mrunalp commented Jan 25, 2018

👍 to option 4. Thanks!

@mikebrow

This comment has been minimized.

Member

mikebrow commented Jan 26, 2018

Option 4 seems the best solution.

@jberkus

This comment has been minimized.

jberkus commented Feb 3, 2018

Hello, if this is planned for 1.10, please do two things: 1) add a priority/ label 2) indicate which Feature issue it's related to. Thanks!

abhi added a commit to abhi/kubernetes that referenced this issue Feb 15, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of
kubernetes#55905
The change includes change the log path from
/var/pod/<pod uid>/containername_attempt.log to
/var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

abhi added a commit to abhi/kubernetes that referenced this issue Feb 15, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of
kubernetes#55905
The change includes change the log path from
/var/pod/<pod uid>/containername_attempt.log to
/var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

abhi added a commit to abhi/kubernetes that referenced this issue Feb 15, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of
kubernetes#55905
The change includes change the log path from
/var/pod/<pod uid>/containername_attempt.log to
/var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

abhi added a commit to abhi/kubernetes that referenced this issue Feb 15, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of
kubernetes#55905
The change includes change the log path from
/var/pod/<pod uid>/containername_attempt.log to
/var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

abhi added a commit to abhi/kubernetes that referenced this issue Feb 17, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of kubernetes#55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

abhi added a commit to abhi/kubernetes that referenced this issue Feb 17, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of kubernetes#55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

abhi added a commit to abhi/kubernetes that referenced this issue Feb 21, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of kubernetes#55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>
@jberkus

This comment has been minimized.

jberkus commented Feb 21, 2018

Reminder: code freeze is in 5 days. Please add approved-for-milestone if this feature is staying in the milestone. Thanks!

k8s-merge-robot added a commit that referenced this issue Feb 22, 2018

Merge pull request #59906 from abhi/log_stats
Automatic merge from submit-queue (batch tested with PRs 54191, 59374, 59824, 55032, 59906). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Adding per container stats for CRI runtimes

**What this PR does / why we need it**

This commit aims to collect per container log stats. The change was proposed as a part of #55905. The change includes change the log path from /var/pod/<pod uid>/containername_attempt.log to /var/pod/<pod uid>/containername/containername_attempt.log. The logs are collected by reusing volume package to collect metrics from the log path.
Fixes #55905

**Special notes for your reviewer:**
cc @Random-Liu

**Release note:**

```
Adding container log stats for CRI runtimes.
```

dmlambea added a commit to dmlambea/kubernetes that referenced this issue Feb 22, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of kubernetes#55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>

jingxu97 added a commit to jingxu97/kubernetes that referenced this issue Mar 13, 2018

Adding per container stats for CRI runtimes
This commit aims to collect per container log stats. The
change was proposed as a part of kubernetes#55905. The change includes
change of the log path from /var/pod/<pod uid>/containername_attempt.log
to /var/pod/<pod uid>/containername/containername_attempt.log.
The logs are collected by reusing volume package to collect
metrics from the log path.

Signed-off-by: abhi <abhi@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment