Set requests on operator and on daemonset #73

Miciah · 2019-02-05T02:41:47Z

Set resources.requests.cpu=10m on the operator's container; set resources.requests.cpu=100m, resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the CoreDNS container; and set resources.requests.cpu=10m on the node resolver sidecar container.

The setting for the operator is the standard setting for operators that run on masters. The requests setting for CoreDNS is based on the upstream default whereas the limit is based on upstream discussions (see comments on this PR), which indicate that the upstream default limit of 170Mi may be too low at scale. The setting for the sidecar is based on guidance that "it's ok for them to be low to start" and absence of data.

This commit resolves NE-141.

assets/dns/daemonset.yaml: Set resources.requests.cpu=100m, resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the CoreDNS container. Set resources.requests.cpu=10m on the sidecar.
manifests/0000_08_cluster-dns-operator_02-deployment.yaml: Set resources.requests.cpu=10m on the operator container.
pkg/manifests/bindata.go: Regenerate.

Miciah · 2019-02-05T03:00:53Z

One difference between the way Kubernetes upstream deploys CoreDNS and the way we do is that we run CoreDNS on every node, which means we're requesting 70Mi on every node in the cluster. Given that we recommend a minimum of 8 GB RAM on OpenShift Container Platform 3.11, I figure 70 MiB is negligible.

Miciah · 2019-02-05T03:25:46Z

pods.json from the e2e-aws-operator test run has

    {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {
            "creationTimestamp": "2019-02-05T02:58:39Z",
            "generateName": "dns-operator-7f68578dc7-",
            "labels": {
                "name": "dns-operator",
                "pod-template-hash": "7f68578dc7"
            },
            "name": "dns-operator-7f68578dc7-b92nw",
            "namespace": "openshift-dns-operator",
            ...
        },
        "spec": {
            "containers": [
                {
                    ...
                    "resources": {
                        "requests": {
                            "cpu": "10m"
                        }
                    },
                    ...

and

    {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {
            "annotations": {
                "k8s.v1.cni.cncf.io/networks-status": "[{\n    \"name\": \"openshift-sdn\",\n    \"ips\": [\n        \"10.130.0.3\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]"
            },
            "creationTimestamp": "2019-02-05T03:00:09Z",
            "generateName": "dns-default-",
            "labels": {
                "controller-revision-hash": "665b977c65",
                "dns": "dns-default",
                "openshift-app": "dns",
                "pod-template-generation": "1"
            },
            "name": "dns-default-5kjts",
            "namespace": "openshift-dns",
            ...
        },
        "spec": {
            ...
            "containers": [
                {
                    ...
                    "name": "dns",
                    ...
                    "resources": {
                        "limits": {
                            "memory": "170Mi"
                        },
                        "requests": {
                            "cpu": "100m",
                            "memory": "70Mi"
                        }
                    },
                    ...
                },
                {
                    ...
                    "name": "dns-node-resolver",
                    "resources": {
                        "requests": {
                            "cpu": "10m"
                        }
                    },
                    ...

pravisankar · 2019-02-05T04:46:33Z

@Miciah @ironcladlou We had these requests/limits set on DNS before but we wanted to remove these limits earlier until we have better understanding of the workload: #7 (comment)

Miciah · 2019-02-05T07:21:59Z

@Miciah @ironcladlou We had these requests/limits set on DNS before but we wanted to remove these limits earlier until we have better understanding of the workload: #7 (comment)

Was there a reason to remove requests as well as limits? I see you removed both here: https://github.com/openshift/cluster-dns-operator/compare/2c143fd6221cdda16f95a1090de514e2866e71d9..2d83ff295c96e0a905462095e3491fb6a7ee1c08

On further review of related upstream issues, I agree that 170Mi might be too low. kubernetes/kubernetes#68613 (comment) shows that CoreDNS is well within the 170Mi limit at 2000 nodes by may exceed it at 5000. kubernetes/kubernetes#67392 was made to gather actual data to determine an appropriate limit, but the Kubernetes 1.12 release interfered, and it appears that effort went nowhere. kubernetes/kubernetes#67392 (comment) and kubernetes/kubernetes#67392 (comment) offhandedly propose setting the limit at 512Mi. Confusingly, kubernetes/kubernetes#67392 (comment) reports, "This issue has been fixed in CoreDNS 1.2.2 and no longer needs a memory increase", but that comment offers no further explanation and contradicts newer comments. kubernetes/kubernetes#68613 (comment) reports on follow-up work that was anticipated to allow CoreDNS to scale to 5000 nodes while staying within 170Mi, but coredns/coredns#2095 (comment) indicates that that work did have the anticipated results and was abandoned, and kubernetes/kubernetes#68629 reverted back to kube-dns for Kubernetes 1.12.

After all that, I think that omitting resources.limits.memory or setting it to 512Mi would be reasonable for now.

openshift-merge-robot · 2019-02-06T06:25:39Z

/retest

openshift-merge-robot · 2019-02-06T08:34:58Z

/retest

sjug · 2019-02-07T16:32:29Z

We don't have our own suggested performance limits at this time but I think it would be better to ship with some to begin with as a safety precaution.

Miciah · 2019-02-07T17:04:19Z

Given that (1) we want some limit, (2) upstream discussions indicate that 170Mi is too low, (3) CoreDNS is a critical service, (4) we recommend a minimum of 8GB of RAM for a node, and (5) nobody has raised concerns against choosing 512Mi, I'd like to go ahead with 512Mi.

Set resources.requests.cpu=10m on the operator's container; set resources.requests.cpu=100m, resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the CoreDNS container; and set resources.requests.cpu=10m on the node resolver sidecar container. The setting for the operator is the standard setting for operators that run on masters. The requests setting for CoreDNS is based on the upstream default whereas the limit is based on upstream discussions, which indicate that the upstream default limit of 170Mi may be too low at scale. The setting for the sidecar is based on guidance that "it's ok for them to be low to start" and absence of data. This commit resolves NE-141. https://jira.coreos.com/browse/NE-141 * assets/dns/daemonset.yaml: Set resources.requests.cpu=100m, resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the CoreDNS container. Set resources.requests.cpu=10m on the sidecar. * manifests/0000_08_cluster-dns-operator_02-deployment.yaml: Set resources.requests.cpu=10m on the operator container. * pkg/manifests/bindata.go: Regenerate.

Miciah · 2019-02-07T18:15:12Z

/test e2e-aws

ironcladlou · 2019-02-07T22:03:27Z

/lgtm
/retest

openshift-ci-robot · 2019-02-07T22:15:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah,ironcladlou]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot requested review from pravisankar and ramr February 5, 2019 02:42

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 5, 2019

Miciah force-pushed the NE-141-set-requests-on-operator-and-on-daemonset branch from b4772a3 to c105084 Compare February 7, 2019 17:18

openshift-ci-robot assigned ironcladlou Feb 7, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2019

openshift-merge-robot merged commit 2cdf14b into openshift:master Feb 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set requests on operator and on daemonset #73

Set requests on operator and on daemonset #73

Miciah commented Feb 5, 2019 •

edited

Loading

Miciah commented Feb 5, 2019

Miciah commented Feb 5, 2019

pravisankar commented Feb 5, 2019

Miciah commented Feb 5, 2019

openshift-merge-robot commented Feb 6, 2019

openshift-merge-robot commented Feb 6, 2019

sjug commented Feb 7, 2019

Miciah commented Feb 7, 2019

Miciah commented Feb 7, 2019

ironcladlou commented Feb 7, 2019

openshift-ci-robot commented Feb 7, 2019

Set requests on operator and on daemonset #73

Set requests on operator and on daemonset #73

Conversation

Miciah commented Feb 5, 2019 • edited Loading

Miciah commented Feb 5, 2019

Miciah commented Feb 5, 2019

pravisankar commented Feb 5, 2019

Miciah commented Feb 5, 2019

openshift-merge-robot commented Feb 6, 2019

openshift-merge-robot commented Feb 6, 2019

sjug commented Feb 7, 2019

Miciah commented Feb 7, 2019

Miciah commented Feb 7, 2019

ironcladlou commented Feb 7, 2019

openshift-ci-robot commented Feb 7, 2019

Miciah commented Feb 5, 2019 •

edited

Loading