Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support addon Deployments, make heapster a deployment with a nanny. #22893

Merged
merged 4 commits into from
Mar 22, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
71 changes: 65 additions & 6 deletions cluster/addons/cluster-monitoring/google/heapster-controller.yaml
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
{% set metrics_memory = "200Mi" -%}
{% set eventer_memory = "200Mi" -%}
{% set metrics_memory_per_node = 4 -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit to the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't, because I use this constant in math down below. Unless you mean something like "metrics_memory_per_node_unit"?

{% set eventer_memory_per_node = 500 -%}
{% set num_nodes = pillar.get('num_nodes', -1) -%}
{% if num_nodes >= 0 -%}
{% set metrics_memory = (200 + num_nodes * 4)|string + "Mi" -%}
{% set eventer_memory = (200 * 1024 + num_nodes * 500)|string + "Ki" -%}
{% set metrics_memory = (200 + num_nodes * metrics_memory_per_node)|string + "Mi" -%}
{% set eventer_memory = (200 * 1024 + num_nodes * eventer_memory_per_node)|string + "Ki" -%}
{% endif -%}

apiVersion: v1
kind: ReplicationController
apiVersion: extensions/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please reuse constants above and in the nanny?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

kind: Deployment
metadata:
name: heapster-v1.0.0
name: heapster
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: heapster
matchLabels:
k8s-app: heapster
template:
metadata:
labels:
Expand Down Expand Up @@ -62,6 +65,62 @@ spec:
- name: ssl-certs
mountPath: /etc/ssl/certs
readOnly: true
- image: gcr.io/google_containers/addon-resizer:1.0
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwielgus Please verify these values based on your perf tests. AFAIR we need more than 100m cpu in large clusters and 4MB of memory for each node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1.2 cut gives 4MB per node, but leaves CPU at 100m. I've changed this to use the same per-node value as static sizing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 1000 node cluster we are using 50 millis on average with occasional spikes to 500 (every minute for a couple seconds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this define?

- --extra-cpu=0m
- --memory={{ metrics_memory }}
- --extra-memory={{metrics_memory_per_node}}Mi
- --threshold=5
- --deployment=heapster
- --container=heapster
- --poll-period=300000
- image: gcr.io/google_containers/addon-resizer:1.0
name: eventer-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory={{eventer_memory}}
- --extra-memory={{eventer_memory_per_node}}Ki
- --threshold=5
- --deployment=heapster
- --container=eventer
- --poll-period=300000
volumes:
- name: ssl-certs
hostPath:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
{% set metrics_memory = "200Mi" -%}
{% set eventer_memory = "200Mi" -%}
{% set metrics_memory_per_node = 4 -%}
{% set eventer_memory_per_node = 500 -%}
{% set num_nodes = pillar.get('num_nodes', -1) -%}
{% if num_nodes >= 0 -%}
{% set metrics_memory = (200 + num_nodes * 4)|string + "Mi" -%}
{% set eventer_memory = (200 * 1024 + num_nodes * 500)|string + "Ki" -%}
{% set metrics_memory = (200 + num_nodes * metrics_memory_per_node)|string + "Mi" -%}
{% set eventer_memory = (200 * 1024 + num_nodes * eventer_memory_per_node)|string + "Ki" -%}
{% endif -%}

apiVersion: v1
kind: ReplicationController
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.0.0
name: heapster
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this only has 1 replica, you might consider the Recreate update policy. The default is RollingUpdate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. The code for the container in question lives in contrib/addon-resizer, where that would be an easy (1-line) change. I'll definitely consider it.

selector:
k8s-app: heapster
matchLabels:
k8s-app: heapster
template:
metadata:
labels:
Expand Down Expand Up @@ -63,6 +66,62 @@ spec:
- name: ssl-certs
mountPath: /etc/ssl/certs
readOnly: true
- image: gcr.io/google_containers/addon-resizer:1.0
name: heapster-nanny
resources:
limits:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just specify limits, request should be set from limits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#18216 added requests to all addons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the config best practices is:

Don’t specify default values unnecessarily, in order to simplify and minimize configs, and to reduce error.

I don't really understand the argument behind #18216 but it seems to violate that best practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why don't we do this for other singletons (e.g., heapster itself)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of #18216. Can you explain why this is important @gmarek? Here's the defaulting code https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/defaults.go#L99-L116

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we were designing QoS we decided to have this defaulting as a backward compatibility feature. I didn't had impression that it's there to stay forever, so I think we should have requests explicit. Generally all those QoS stuff is really confusing as it is today. @piosz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting: I didn't know about that default behavior. It actually conflicts with the addon-resizer, which explicitly sets requests=limits to keep its dependents in the guaranteed class. The addon-resizer also performs an update on any qualitative difference (e.g., requests being expected but not found). This would manifest as a single deployment update at startup.

It feels weird to add a flag to disregard requests, but we'll need to if we want to utilize this default behavior. But since literally all of our addons specify both requests and limits, why don't I cut an issue for 1.3 to rectify both the pod_nanny and all of our addon yamls with best practices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another proof that the current state is confusing. We should work on that at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #23229

cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory={{ metrics_memory }}
- --extra-memory={{ metrics_memory_per_node }}Mi
- --threshold=5
- --deployment=heapster
- --container=heapster
- --poll-period=300000
- image: gcr.io/google_containers/addon-resizer:1.0
name: eventer-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory={{ eventer_memory }}
- --extra-memory={{ eventer_memory_per_node }}Ki
- --threshold=5
- --deployment=heapster
- --container=eventer
- --poll-period=300000
volumes:
- name: ssl-certs
hostPath:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
{% set metrics_memory = "200Mi" -%}
{% set eventer_memory = "200Mi" -%}
{% set metrics_memory_per_node = 4 -%}
{% set eventer_memory_per_node = 500 -%}
{% set num_nodes = pillar.get('num_nodes', -1) -%}
{% if num_nodes >= 0 -%}
{% set metrics_memory = (200 + num_nodes * 4)|string + "Mi" -%}
{% set eventer_memory = (200 * 1024 + num_nodes * 500)|string + "Ki" -%}
{% set metrics_memory = (200 + num_nodes * metrics_memory_per_node)|string + "Mi" -%}
{% set eventer_memory = (200 * 1024 + num_nodes * eventer_memory_per_node)|string + "Ki" -%}
{% endif -%}

apiVersion: v1
kind: ReplicationController
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.0.0
name: heapster
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: heapster
matchLabels:
k8s-app: heapster
template:
metadata:
labels:
Expand Down Expand Up @@ -54,3 +57,60 @@ spec:
- /eventer
- --source=kubernetes:''
- --sink=influxdb:http://monitoring-influxdb:8086
- image: gcr.io/google_containers/addon-resizer:1.0
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory={{ metrics_memory }}
- --extra-memory={{ metrics_memory_per_node }}Mi
- --threshold=5
- --deployment=heapster
- --container=heapster
- --poll-period=300000
- image: gcr.io/google_containers/addon-resizer:1.0
name: eventer-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory={{ eventer_memory }}
- --extra-memory={{ eventer_memory_per_node }}Ki
- --threshold=5
- --deployment=heapster
- --container=eventer
- --poll-period=300000

Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
{% set metrics_memory = "200Mi" -%}
{% set metrics_memory_per_node = 4 -%}
{% set num_nodes = pillar.get('num_nodes', -1) -%}
{% if num_nodes >= 0 -%}
{% set metrics_memory = (200 + num_nodes * 4)|string + "Mi" -%}
{% set metrics_memory = (200 + num_nodes * metrics_memory_per_node)|string + "Mi" -%}
{% endif -%}

apiVersion: v1
kind: ReplicationController
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.0.0
name: heapster
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: heapster
matchLabels:
k8s-app: heapster
template:
metadata:
labels:
Expand All @@ -37,3 +39,31 @@ spec:
- /heapster
- --source=kubernetes.summary_api:''
- --metric_resolution=60s
- image: gcr.io/google_containers/addon-resizer:1.0
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does pod_nanny do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It vertically scales a container based on the number of nodes.

- --cpu=100m
- --extra-cpu=0m
- --memory={{ metrics_memory }}
- --extra-memory={{ metrics_memory_per_node }}Mi
- --threshold=5
- --deployment=heapster
- --container=heapster
- --poll-period=300000
1 change: 1 addition & 0 deletions cluster/saltbase/salt/kube-addons/kube-addon-update.sh
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,7 @@ function update-addons() {
# That's why we pass an empty string as the version separator.
# If the description differs on disk, the object should be recreated.
# This is not implemented in this version.
reconcile-objects ${addon_path} Deployment "" &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to update Deployments? AFAIU name for an object will not change, so how addon-updater will realise it has to do an update? Has this been tested?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @Q-Lee
Are you sure this won't broke addons update procedure?

reconcile-objects ${addon_path} Service "" &
reconcile-objects ${addon_path} PersistentVolume "" &
reconcile-objects ${addon_path} PersistentVolumeClaim "" &
Expand Down
24 changes: 22 additions & 2 deletions test/e2e/monitoring.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,14 +102,19 @@ func verifyExpectedRcsExistAndGetExpectedPods(c *client.Client) ([]string, error
for _, rcLabel := range rcLabels {
selector := labels.Set{"k8s-app": rcLabel}.AsSelector()
options := api.ListOptions{LabelSelector: selector}
deploymentList, err := c.Deployments(api.NamespaceSystem).List(options)
if err != nil {
return nil, err
}
rcList, err := c.ReplicationControllers(api.NamespaceSystem).List(options)
if err != nil {
return nil, err
}
if len(rcList.Items) != 1 {
return nil, fmt.Errorf("expected to find one replica for RC with label %s but got %d",
if (len(rcList.Items) + len(deploymentList.Items)) != 1 {
return nil, fmt.Errorf("expected to find one replica for RC or deployment with label %s but got %d",
rcLabel, len(rcList.Items))
}
// Check all the replication controllers.
for _, rc := range rcList.Items {
selector := labels.Set(rc.Spec.Selector).AsSelector()
options := api.ListOptions{LabelSelector: selector}
Expand All @@ -124,6 +129,21 @@ func verifyExpectedRcsExistAndGetExpectedPods(c *client.Client) ([]string, error
expectedPods = append(expectedPods, string(pod.UID))
}
}
// Do the same for all deployments.
for _, rc := range deploymentList.Items {
selector := labels.Set(rc.Spec.Selector.MatchLabels).AsSelector()
options := api.ListOptions{LabelSelector: selector}
podList, err := c.Pods(api.NamespaceSystem).List(options)
if err != nil {
return nil, err
}
for _, pod := range podList.Items {
if pod.DeletionTimestamp != nil {
continue
}
expectedPods = append(expectedPods, string(pod.UID))
}
}
}
return expectedPods, nil
}
Expand Down