Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRI: Clearly define the expectation of Summary API and CRI Stats #53514

Closed
Random-Liu opened this issue Oct 5, 2017 · 6 comments · Fixed by #54606
Closed

CRI: Clearly define the expectation of Summary API and CRI Stats #53514

Random-Liu opened this issue Oct 5, 2017 · 6 comments · Fixed by #54606
Labels
area/kubelet area/monitoring sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@Random-Liu
Copy link
Member

Random-Liu commented Oct 5, 2017

In Kubernetes 1.8, we added the support to let kubelet get container stats from CRI.

However, it's not clearly defined whether stats should be returned for non-running container.

Today's situation:

  1. Cadvisor integration: it removes stats for terminated containers (containers without cpu/memory usage) https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/stats/cadvisor_stats_provider.go#L93,
    and our summary api node e2e test makes the same assumption https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go#L51;
  2. CRI stats integration: it shows all container stats got from container runtime. It does include stats for terminated container if container runtime returns terminated container stats.

We should:

  1. Clearly define the behavior of Summary API in this case.
  2. If stats for terminated containers are not required, we should either update kubelet to filter them out, or define in CRI that CRI stats should not include stats with zero memory/cpu usage;
  3. If stats for terminated containers are required, we should update Kubelet to properly consume it.

For now, I'll just skip stats with zero memory or cpu usage in cri-containerd.

@timstclair @yujuhong @yguo0905 @dashpole @feiskyer
/cc @kubernetes/sig-node-api-reviews

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Oct 5, 2017
@Random-Liu Random-Liu added area/kubelet area/monitoring and removed kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Oct 5, 2017
@yguo0905
Copy link
Contributor

yguo0905 commented Oct 6, 2017

To be precise, we only exclude the duplicated terminated container in the cadvisor case.

// A ContainerInfo is considered to be of a terminated container if it has an
// older CreationTime and zero CPU instantaneous and memory RSS usage.

@Random-Liu
Copy link
Member Author

@yguo0905 If it only excludes the duplicated terminated container, I feel like this logic should be in kubelet instead of container runtime.

Container runtime should just return stats for all containers.

@yguo0905
Copy link
Contributor

yguo0905 commented Oct 6, 2017

Yeah, this logic was added only for fixing the issue #47853, where cadvisor reported metrics of uncleaned container cgroups.

@yujuhong
Copy link
Contributor

yujuhong commented Oct 9, 2017

I think providing stats for all containers are good. kubelet can filter them out if necessary. cadvisor simply doesn't support that since it only tracks cgroups (for running containers).

@yguo0905 If it only excludes the duplicated terminated container, I feel like this logic should be in kubelet instead of container runtime.

Container runtime should just return stats for all containers.

Excluding duplicated terminated containers is just a workaround for bugs in the existing integration. It was hard to fix for cadvisor+docker because the concept of container in the two components are different.
I don't expect we see duplicated from runtimes integrated via CRI. If we did, it'd be a bug that we need to fix in the specific CRI shim.

@Random-Liu
Copy link
Member Author

@yujuhong Thanks. Then based on the above discussion I'll close containerd/cri#330 and update kubelet instead. :)

Thanks!

@Random-Liu
Copy link
Member Author

I don't expect we see duplicated from runtimes integrated via CRI. If we did, it'd be a bug that we need to fix in the specific CRI shim.

What do you mean "duplicated"? Say we have a container which has been restarted once.
Currently, CRI stats will return stats for both - the newly created one and the terminated one.
Is this duplication in our opinion? /cc @yguo0905

k8s-github-robot pushed a commit that referenced this issue Nov 18, 2017
…-stats

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Filter out duplicated container stats

**What this PR does / why we need it**:

**Which issue this PR fixes** *
fixes #53514

**Special notes for your reviewer**:

/cc @Random-Liu 

Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/monitoring sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants