Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with cAdvisor #160

Closed
monnand opened this issue Jun 18, 2014 · 14 comments
Closed

Integration with cAdvisor #160

monnand opened this issue Jun 18, 2014 · 14 comments
Assignees
Labels
area/introspection area/isolation sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@monnand
Copy link
Contributor

monnand commented Jun 18, 2014

cAdvisor right now already have some basic statistical information about containers running on the machine. We currently hold most recent stats in memory but we have a framework to dump stats into some backend storage. google/cadvisor#39 has a discussion about support influxdb and I think @erikh is working on that now.

I think the information collected by cAdvisor may be useful for kubernetes. I'm currently considering to add some code into kubelet so that kubelet could pull information from cAdvisor periodically.

Before getting started, I would like to discuss the approach we should take and other issues.

Currently, cAdvisor collects stats and stores them into memory. It will only remember recent stats and provide resource usage percentiles (currently, there are only CPU and memory). I think the resource usage percentiles would be useful for kubernetes master to do scheduling.

There are two possible ways to retrieve such information from cAdvisor:

Solution 1: kubelet pulls information from cAdvisor through REST API and expose another REST API for the master. The master will periodically check the containers' stats through kubelet's REST API. In this case, any information is sent through REST API. Currently, kubelet communicate with master through etcd only. So this approach will add one more communicate channel between kubelet and the master.

Solution 2: Since the percentiles information (e.g. what is the 90th percentile memory usage of a container) is not too big and may be small enough to fit into etcd. We could let cAdvisor update containers' information in etcd and let the master retrieve it whenever it wants to. Or, we could let kubelet pulls such information from cAdvisor and update the corresponding ectd key. In both cases, the communication is through etcd. The disadvantage of this approach is that the information that the master needs may increase to some big message not suitable to be put into etcd.

I would like to see other approaches or discussions of proposed ones.

@brendandburns
Copy link
Contributor

I think I prefer option 1. There is already a communication channel from master to kublet (see kublet_server.go) and I think we can merge the info in there.

Let me know if you want/need more details.

Let's not store this in etcd for now, I don't see too much utility in storing this kind of transient info in etcd.

@monnand
Copy link
Contributor Author

monnand commented Jun 18, 2014

@brendanburns Oh, I didn't notice kubelet has already exposed a REST API. OK. I could work on it this week. Thank you @brendandburns .

@proppy
Copy link
Contributor

proppy commented Jun 27, 2014

I'd be great if the scheduler was also leverage this data for pod placement, should I file a separate bug?

@vmarmol
Copy link
Contributor

vmarmol commented Jun 27, 2014

I believe that is planned, but we can file an issue to track it
On Jun 27, 2014 4:12 PM, "Johan Euphrosine" notifications@github.com
wrote:

I'd be great if the scheduler was also leverage this data for pod
placement, should I file a separate bug?


Reply to this email directly or view it on GitHub
#160 (comment)
.

@monnand
Copy link
Contributor Author

monnand commented Jul 7, 2014

Currently, #328 and #174 could let kubelet retrieve some basic stats info from cAdvisor. On cAdvisor side, google/cadvisor#74 could put all stats into influxdb.

If we want to display the data, we could write a separate UI program to retrieve stats from the backend storage (There is only influxdb now) directly (reading the influxdb) or indirectly (retrieving from cAdvisor). Or, we could put this feature into the api server.

@timothysc
Copy link
Member

Why can't the kublet just regularly send a 'nvp thunk' including cAdvisor data to a store which the scheduler can then use well established variables for initial algorithm, then extra discrimination via constraint.

@monnand
Copy link
Contributor Author

monnand commented Jul 9, 2014

@timothysc cAdvisor now could dump all stats into influxdb. Kubelet will retrieve a summary of such stats from cAdvisor and in turn pulled by master to do scheduling decisions. I think its similar to what you described?

@brendandburns brendandburns added this to the v0.5 milestone Sep 9, 2014
@bgrant0607 bgrant0607 added area/introspection sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. area/isolation labels Oct 4, 2014
@dchen1107 dchen1107 assigned dchen1107 and unassigned vmarmol Oct 7, 2014
@dchen1107
Copy link
Member

Re-assign this back to me since @vmarmol is out this week. I will update the status by end of today, then decide the next step.

@dchen1107
Copy link
Member

Here is a quick summary on the status:

  1. cAdvisor is running on all minions by default.

Status: Done.

  1. Kubelet does have REST API through which master could retrieve per container's stats.

Status: API is there, and implementation is done, but I failed to retrieve any container stats due to: "Internal Error: couldn't find container". I believe this is just a bug. Will fix next.

  1. Kubelet does have REST API through which master could retrieve per Pod's stats.

Status: API is there, but not implemented yet. I think I can add a simple support easily by returning the stats all containers within a pod

  1. Currently master never retrieve and consume those stats, even there is no REST API for apiserver to get such stats. The closest thing we had is a TODO I added recently:
    type ContainerStatus struct {
    State ContainerState json:"state,omitempty" yaml:"state,omitempty"
    RestartCount int json:"restartCount" yaml:"restartCount"
    ...
    // TODO(dchen1107): Once we have done with integration with cadvisor, resource
    // usage should be included.
    }
    // PodInfo contains one entry for every container with available info.
    type PodInfo map[string]ContainerStatus

But I guess this is belonging to a separate issue, especially downward API? @bgrant0607

  1. On stats side, currently only cpu stats are available. Issue Enable memory cgroup on GCE VMs #1548 and Using latest containervm image when bring up a Kubernetes cluster. #738 were filed.

@dchen1107
Copy link
Member

Sent too soon, there are more to update:

  1. Machine capacity discovery
    Status: Kubelet has REST API to retrieve machine capacity from cAdvisor, but master doesn't retrieve such stats from kubelet. It also belongs to downward API.

  2. Root stats
    Status: Kubelet has REST API to retrieve root usage stats from cAdvisor, but there is no way to propagate it to master too. Root stats should include the usages of kernel threads, kubelet and other daemons today which are not running in a docker container. Those stats should be reported back master for proper scheduling.

cc/ @bgrant0607 @thockin @brendandburns @smarterclayton
For 0.5 milestone, I think we should only fix 2) and maybe 3), then fix 4), 7) along with v1beta3 support separately. 5) should be fixed through #738 since we want to have that available to track the instance hours and other metrics anyway.

@thockin
Copy link
Member

thockin commented Oct 7, 2014

We'll need (6) for scheduling won't we?

On Tue, Oct 7, 2014 at 4:41 PM, Dawn Chen notifications@github.com wrote:

Sent too soon, there are more to update:

  1. Machine capacity discovery
    Status: Kubelet has REST API to retrieve machine capacity from cAdvisor,
    but master doesn't retrieve such stats from kubelet. It also belongs to
    downward API.

  2. Root stats
    Status: Kubelet has REST API to retrieve root usage stats from cAdvisor,
    but there is no way to propagate it to master too. Root stats should
    include the usages of kernel threads, kubelet and other daemons today which
    are not running in a docker container. Those stats should be reported back
    master for proper scheduling.

cc/ @bgrant0607 https://github.com/bgrant0607 @thockin
https://github.com/thockin @brendandburns
https://github.com/brendandburns @smarterclayton
https://github.com/smarterclayton
For 0.5 milestone, I think we should only fix 2) and maybe 3), then fix
4), 7) along with v1beta3 support separately. 5) should be fixed through
#738 #738 since
we want to have that available to track the instance hours and other
metrics anyway.

Reply to this email directly or view it on GitHub
#160 (comment)
.

@dchen1107
Copy link
Member

@thockin, we do need 6) for scheduling, and it should be fixed together with 4) and 7) along with v1beta3 support. Thanks for pointing it out.

Also I will file a separate issue to discuss if including namespace in kubelet's REST api.

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Oct 8, 2014
I believe the issue was first introduced when we append namespace to docker
container's name. This is a temporary fix for kubernetes#160.
@bgrant0607
Copy link
Member

Terminology-wise the "downward API" relates to APIs exposed within containers and to the container environment (filesystem, env. vars., etc.). A downward API is needed, but that's a separate discussion, happening on a Docker issue right now.

For the v0.5 milestone, bug fixes seem sufficient.

v1beta3 is on v0.7. Once we have that, the Kubelet and apiserver APIs will be distinct, which will give us the freedom to decide what to expose via the general apiserver /pods API. Resource data changes continuously and will become more and more voluminous, with more resources represented (e.g., I/O), histograms, avg/max, load/latency, effective soft/hard limits, etc. We won't want to expose all of that via /pods.

I'm ok with the scheduling-related bits being moved to milestone v0.8, which is where I put the other scheduling and isolation issues.

@dchen1107
Copy link
Member

Based on above discussion. The fix is in, and v1beta3 related is already on v0.7, and scheduling-related is already on v0.8. I plan to have ContainerVM image by default for GCE in v0.7, so that it automatically resolve the issue of enabling memcg cgroup. Closing the issue.

xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016
feiskyer added a commit to feiskyer/kubernetes that referenced this issue Feb 13, 2017
seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019
marun added a commit to marun/kubernetes that referenced this issue Jun 26, 2020
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
pjh pushed a commit to pjh/kubernetes that referenced this issue Jan 31, 2022
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024
Use debian-base image from kubernetes repository as base for NPD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/introspection area/isolation sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

No branches or pull requests

9 participants