Measure kubernetes addons resource usage #10335

Closed
dchen1107 opened this Issue Jun 25, 2015 · 15 comments

Comments

Projects
None yet
4 participants
@dchen1107
Member

dchen1107 commented Jun 25, 2015

In #5880, We measure resource usage of heapster and other monitoring containers in a 100 node cluster with and without load. Based on that those container's container manifests (#10260, #10334) are updated with proper resource limit. But we don't have measurements for dns, and kibana yet.

cc/ @davidopp @brendandburns

@dchen1107

This comment has been minimized.

Show comment Hide comment
@dchen1107

dchen1107 Jun 25, 2015

Member

cc/ @satnam6502 on bringing up 100-node scalability tests.

Member

dchen1107 commented Jun 25, 2015

cc/ @satnam6502 on bringing up 100-node scalability tests.

@wojtek-t

This comment has been minimized.

Show comment Hide comment
@wojtek-t

wojtek-t Jun 25, 2015

Member

@dchen1107 - we are already running 100-node scalability tests on Jenkins:
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-scalability/
Does it make sense to run tests you mentioned as part of it?

Member

wojtek-t commented Jun 25, 2015

@dchen1107 - we are already running 100-node scalability tests on Jenkins:
http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-scalability/
Does it make sense to run tests you mentioned as part of it?

@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jun 30, 2015

Member

Here are the results from a 4 node cluster 200 pod run.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 24 hours total
    • 200 pods (in addition to default), with 2 containers each (total 400)
    • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

Memory

SkyDNS Memory Usage

  • Memory usage rising over 24 hours, but remained below 3.1 MB.
    skydns_memusage_4node_200pod

kube2sky Memory Usage

  • Memory usage rising over 24 hours, but remained below 3.0 MB.
    kube2sky_memusage_4node_200pod

dns etcd Memory Usage

  • Memory usage remained stable between 8-14 MB.
    dnsetcd_memusage_4node_200pod

Kibana-Logging Memory Usage

  • Memory usage steadily rising over 24 hours, but remained below 100 MB.
    kibanalogging_memusage_4node_200pod

Heapster Memory Usage

  • Memory usage remained more or less stable below 150 MB.
    heapster_memusage_4node_200pod

InfluxDB Memory Usage

  • This was rather surprising, influxDB usage continued to grow over the course of 24 hours hitting 1.1 GB. This needs to be investigated further (is InfluxDB trying to hold everything in memory and not flushing to disk periodically?).
    influxdb_memusage_4node_200pod

Fluentd/ElasticSearch Memory Usage

  • Also steadily rising over 24 hours, but remained below 100 MB.
    fluentd-elasticsearch_memusage_4node_200pod

CPU

SkyDNS CPU Usage
skydns_cpuusage_4node_200pods

kube2sky CPU Usage
kube2sky_cpuusage_4node_200pod

dns etcd CPU Usage
dnsetcd_cpuusage_4node_200pod

Kibana-Logging CPU Usage
kibanalogging_cpuusage_4node_200pod

Heapster CPU Usage
heapster_cpuusage_4node_200pod

InfluxDB CPU Usage
influxdb_cpuusage_4node_200pod

Fluentd/ElasticSearch CPU Usage
fluentdelasticsearch_cupusage_4node_200pod

Member

saad-ali commented Jun 30, 2015

Here are the results from a 4 node cluster 200 pod run.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 24 hours total
    • 200 pods (in addition to default), with 2 containers each (total 400)
    • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

Memory

SkyDNS Memory Usage

  • Memory usage rising over 24 hours, but remained below 3.1 MB.
    skydns_memusage_4node_200pod

kube2sky Memory Usage

  • Memory usage rising over 24 hours, but remained below 3.0 MB.
    kube2sky_memusage_4node_200pod

dns etcd Memory Usage

  • Memory usage remained stable between 8-14 MB.
    dnsetcd_memusage_4node_200pod

Kibana-Logging Memory Usage

  • Memory usage steadily rising over 24 hours, but remained below 100 MB.
    kibanalogging_memusage_4node_200pod

Heapster Memory Usage

  • Memory usage remained more or less stable below 150 MB.
    heapster_memusage_4node_200pod

InfluxDB Memory Usage

  • This was rather surprising, influxDB usage continued to grow over the course of 24 hours hitting 1.1 GB. This needs to be investigated further (is InfluxDB trying to hold everything in memory and not flushing to disk periodically?).
    influxdb_memusage_4node_200pod

Fluentd/ElasticSearch Memory Usage

  • Also steadily rising over 24 hours, but remained below 100 MB.
    fluentd-elasticsearch_memusage_4node_200pod

CPU

SkyDNS CPU Usage
skydns_cpuusage_4node_200pods

kube2sky CPU Usage
kube2sky_cpuusage_4node_200pod

dns etcd CPU Usage
dnsetcd_cpuusage_4node_200pod

Kibana-Logging CPU Usage
kibanalogging_cpuusage_4node_200pod

Heapster CPU Usage
heapster_cpuusage_4node_200pod

InfluxDB CPU Usage
influxdb_cpuusage_4node_200pod

Fluentd/ElasticSearch CPU Usage
fluentdelasticsearch_cupusage_4node_200pod

@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jun 30, 2015

Member

Here are the results from a 4 node with 0 additional pods ("no load").

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 24 hours total
    • Only default kubernetes pods (no additional pods)
    • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

Memory

SkyDNS Memory Usage

  • Memory usage rising over 24 hours to around 3.1 MB with spikes up to 4.3 MB.
    skydns_memusage_4nodes_noload

kube2sky Memory Usage

  • Memory usage rising over 24 hours, but remained below 3.2 MB.
    kube2sky_memusage_4node_noload

dns etcd Memory Usage

  • Memory usage remained stable between 8-14 MB.
    dnsetcd_memusage_4node_noload

Kibana-Logging Memory Usage

  • Memory usage steadily rising over 24 hours to nearly 110 MB.
    kibana_memusage_4node_noload

CPU

SkyDNS CPU Usage
skydns_cpuusage_4node_noload

kube2sky CPU Usage
kube2sky_cpuusage_4node_noload

dns etcd CPU Usage
dnsetcd_cpuusage_4node_noload

Kibana-Logging CPU Usage
kibana_cpuusage_4nodes_noload

Member

saad-ali commented Jun 30, 2015

Here are the results from a 4 node with 0 additional pods ("no load").

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 24 hours total
    • Only default kubernetes pods (no additional pods)
    • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

Memory

SkyDNS Memory Usage

  • Memory usage rising over 24 hours to around 3.1 MB with spikes up to 4.3 MB.
    skydns_memusage_4nodes_noload

kube2sky Memory Usage

  • Memory usage rising over 24 hours, but remained below 3.2 MB.
    kube2sky_memusage_4node_noload

dns etcd Memory Usage

  • Memory usage remained stable between 8-14 MB.
    dnsetcd_memusage_4node_noload

Kibana-Logging Memory Usage

  • Memory usage steadily rising over 24 hours to nearly 110 MB.
    kibana_memusage_4node_noload

CPU

SkyDNS CPU Usage
skydns_cpuusage_4node_noload

kube2sky CPU Usage
kube2sky_cpuusage_4node_noload

dns etcd CPU Usage
dnsetcd_cpuusage_4node_noload

Kibana-Logging CPU Usage
kibana_cpuusage_4nodes_noload

@dchen1107

This comment has been minimized.

Show comment Hide comment
@dchen1107

dchen1107 Jul 1, 2015

Member

@saad-ali and @vishh I did measurement last night over heapster, and found there are big increase of memory usage from 0.14.3 to 0.15.0

Member

dchen1107 commented Jul 1, 2015

@saad-ali and @vishh I did measurement last night over heapster, and found there are big increase of memory usage from 0.14.3 to 0.15.0

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Jul 1, 2015

Set resource limit for both heapster and influxdb container based on …
…data collected

by #10335. Please noted that both influxdb and heapster could be oom-killed due to
memory leakage here.

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Jul 1, 2015

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Jul 1, 2015

dchen1107 added a commit to dchen1107/kubernetes-1 that referenced this issue Jul 1, 2015

@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jul 2, 2015

Member

This is a follow up to the 4 node cluster 200 pod run, after 2 days.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 48 hours total
    • 200 pods (in addition to default), with 2 containers each (total 400)
    • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

4node_200pods_2days_1of2

Member

saad-ali commented Jul 2, 2015

This is a follow up to the 4 node cluster 200 pod run, after 2 days.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 48 hours total
    • 200 pods (in addition to default), with 2 containers each (total 400)
    • Container used was a small a statically-linked C program that just sleeps for 28 days with a container size of 877.6 kB.

Results

4node_200pods_2days_1of2

@dchen1107

This comment has been minimized.

Show comment Hide comment
@dchen1107

dchen1107 Jul 2, 2015

Member

@saad-ali Could please measure ui addons too?

Member

dchen1107 commented Jul 2, 2015

@saad-ali Could please measure ui addons too?

@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jul 2, 2015

Member

@dchen1107 Yep, will do

Member

saad-ali commented Jul 2, 2015

@dchen1107 Yep, will do

@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jul 3, 2015

Member

Here are the initial results for the kube-ui container with no load.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 48 hours total
    • Default with no additional pods

Results

kubeui_memusage_4node_noload

kubeui_cupusage_4node_noload

Seems pretty light weight when not in use. Next I will add load to the cluster and leave the kube-ui web interface open.

Member

saad-ali commented Jul 3, 2015

Here are the initial results for the kube-ui container with no load.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 48 hours total
    • Default with no additional pods

Results

kubeui_memusage_4node_noload

kubeui_cupusage_4node_noload

Seems pretty light weight when not in use. Next I will add load to the cluster and leave the kube-ui web interface open.

@wojtek-t

This comment has been minimized.

Show comment Hide comment
@wojtek-t

wojtek-t Jul 6, 2015

Member

Since #10653 is merged - can we close this issue?

Member

wojtek-t commented Jul 6, 2015

Since #10653 is merged - can we close this issue?

@dchen1107

This comment has been minimized.

Show comment Hide comment
@dchen1107

dchen1107 Jul 6, 2015

Member

One last thing: kube-ui which requires resource limit was added lately. After that, we can close this one.

Member

dchen1107 commented Jul 6, 2015

One last thing: kube-ui which requires resource limit was added lately. After that, we can close this one.

@thockin

This comment has been minimized.

Show comment Hide comment
@thockin

thockin Jul 6, 2015

Member

Who is doing this last one? Saad?

Member

thockin commented Jul 6, 2015

Who is doing this last one? Saad?

jayunit100 added a commit to jayunit100/kubernetes that referenced this issue Jul 6, 2015

Set resource limit for both heapster and influxdb container based on …
…data collected

by #10335. Please noted that both influxdb and heapster could be oom-killed due to
memory leakage here.

jayunit100 added a commit to jayunit100/kubernetes that referenced this issue Jul 6, 2015

jayunit100 added a commit to jayunit100/kubernetes that referenced this issue Jul 6, 2015

jayunit100 added a commit to jayunit100/kubernetes that referenced this issue Jul 6, 2015

Set resource limit for both heapster and influxdb container based on …
…data collected

by #10335. Please noted that both influxdb and heapster could be oom-killed due to
memory leakage here.
@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jul 6, 2015

Member

Yep, I had a cluster running over the long weekend. I'll publish the remaining results for kube-ui with load shortly.

Member

saad-ali commented Jul 6, 2015

Yep, I had a cluster running over the long weekend. I'll publish the remaining results for kube-ui with load shortly.

@saad-ali

This comment has been minimized.

Show comment Hide comment
@saad-ali

saad-ali Jul 6, 2015

Member

Here are the results for the kube-ui container with load.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 72 hours total
    • 200 pods with 2 containers each in addition to the default pods

Results

kubeui_memusage_4nodes_200pods

kubeui_cpuusage_4node_200pods

Member

saad-ali commented Jul 6, 2015

Here are the results for the kube-ui container with load.

Test Setup

  • Cluster characteristics:
    • 4 nodes
    • n1-standard-1 (1 vCPU, 3.75 GB memory)
  • Test duration: 72 hours total
    • 200 pods with 2 containers each in addition to the default pods

Results

kubeui_memusage_4nodes_200pods

kubeui_cpuusage_4node_200pods

@dchen1107

This comment has been minimized.

Show comment Hide comment
@dchen1107

dchen1107 Jul 8, 2015

Member

All addons have default resource limit associated with. The document is addressed through #10779, and updating resource limit for addons is addressed through #7046. I am closing this issue now.

Member

dchen1107 commented Jul 8, 2015

All addons have default resource limit associated with. The document is addressed through #10779, and updating resource limit for addons is addressed through #7046. I am closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment