Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set requests on operator and on daemonset #73

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Feb 5, 2019

Set resources.requests.cpu=10m on the operator's container; set resources.requests.cpu=100m, resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the CoreDNS container; and set resources.requests.cpu=10m on the node resolver sidecar container.

The setting for the operator is the standard setting for operators that run on masters. The requests setting for CoreDNS is based on the upstream default whereas the limit is based on upstream discussions (see comments on this PR), which indicate that the upstream default limit of 170Mi may be too low at scale. The setting for the sidecar is based on guidance that "it's ok for them to be low to start" and absence of data.

This commit resolves NE-141.

  • assets/dns/daemonset.yaml: Set resources.requests.cpu=100m, resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the CoreDNS container. Set resources.requests.cpu=10m on the sidecar.
  • manifests/0000_08_cluster-dns-operator_02-deployment.yaml: Set resources.requests.cpu=10m on the operator container.
  • pkg/manifests/bindata.go: Regenerate.

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 5, 2019
@Miciah
Copy link
Contributor Author

Miciah commented Feb 5, 2019

One difference between the way Kubernetes upstream deploys CoreDNS and the way we do is that we run CoreDNS on every node, which means we're requesting 70Mi on every node in the cluster. Given that we recommend a minimum of 8 GB RAM on OpenShift Container Platform 3.11, I figure 70 MiB is negligible.

@Miciah
Copy link
Contributor Author

Miciah commented Feb 5, 2019

pods.json from the e2e-aws-operator test run has

    {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {
            "creationTimestamp": "2019-02-05T02:58:39Z",
            "generateName": "dns-operator-7f68578dc7-",
            "labels": {
                "name": "dns-operator",
                "pod-template-hash": "7f68578dc7"
            },
            "name": "dns-operator-7f68578dc7-b92nw",
            "namespace": "openshift-dns-operator",
            ...
        },
        "spec": {
            "containers": [
                {
                    ...
                    "resources": {
                        "requests": {
                            "cpu": "10m"
                        }
                    },
                    ...

and

    {
        "apiVersion": "v1",
        "kind": "Pod",
        "metadata": {
            "annotations": {
                "k8s.v1.cni.cncf.io/networks-status": "[{\n    \"name\": \"openshift-sdn\",\n    \"ips\": [\n        \"10.130.0.3\"\n    ],\n    \"default\": true,\n    \"dns\": {}\n}]"
            },
            "creationTimestamp": "2019-02-05T03:00:09Z",
            "generateName": "dns-default-",
            "labels": {
                "controller-revision-hash": "665b977c65",
                "dns": "dns-default",
                "openshift-app": "dns",
                "pod-template-generation": "1"
            },
            "name": "dns-default-5kjts",
            "namespace": "openshift-dns",
            ...
        },
        "spec": {
            ...
            "containers": [
                {
                    ...
                    "name": "dns",
                    ...
                    "resources": {
                        "limits": {
                            "memory": "170Mi"
                        },
                        "requests": {
                            "cpu": "100m",
                            "memory": "70Mi"
                        }
                    },
                    ...
                },
                {
                    ...
                    "name": "dns-node-resolver",
                    "resources": {
                        "requests": {
                            "cpu": "10m"
                        }
                    },
                    ...

@pravisankar
Copy link

@Miciah @ironcladlou We had these requests/limits set on DNS before but we wanted to remove these limits earlier until we have better understanding of the workload: #7 (comment)

@Miciah
Copy link
Contributor Author

Miciah commented Feb 5, 2019

@Miciah @ironcladlou We had these requests/limits set on DNS before but we wanted to remove these limits earlier until we have better understanding of the workload: #7 (comment)

Was there a reason to remove requests as well as limits? I see you removed both here: https://github.com/openshift/cluster-dns-operator/compare/2c143fd6221cdda16f95a1090de514e2866e71d9..2d83ff295c96e0a905462095e3491fb6a7ee1c08

On further review of related upstream issues, I agree that 170Mi might be too low. kubernetes/kubernetes#68613 (comment) shows that CoreDNS is well within the 170Mi limit at 2000 nodes by may exceed it at 5000. kubernetes/kubernetes#67392 was made to gather actual data to determine an appropriate limit, but the Kubernetes 1.12 release interfered, and it appears that effort went nowhere. kubernetes/kubernetes#67392 (comment) and kubernetes/kubernetes#67392 (comment) offhandedly propose setting the limit at 512Mi. Confusingly, kubernetes/kubernetes#67392 (comment) reports, "This issue has been fixed in CoreDNS 1.2.2 and no longer needs a memory increase", but that comment offers no further explanation and contradicts newer comments. kubernetes/kubernetes#68613 (comment) reports on follow-up work that was anticipated to allow CoreDNS to scale to 5000 nodes while staying within 170Mi, but coredns/coredns#2095 (comment) indicates that that work did have the anticipated results and was abandoned, and kubernetes/kubernetes#68629 reverted back to kube-dns for Kubernetes 1.12.

After all that, I think that omitting resources.limits.memory or setting it to 512Mi would be reasonable for now.

@openshift-merge-robot
Copy link
Contributor

/retest

1 similar comment
@openshift-merge-robot
Copy link
Contributor

/retest

@sjug
Copy link

sjug commented Feb 7, 2019

We don't have our own suggested performance limits at this time but I think it would be better to ship with some to begin with as a safety precaution.

@Miciah
Copy link
Contributor Author

Miciah commented Feb 7, 2019

Given that (1) we want some limit, (2) upstream discussions indicate that 170Mi is too low, (3) CoreDNS is a critical service, (4) we recommend a minimum of 8GB of RAM for a node, and (5) nobody has raised concerns against choosing 512Mi, I'd like to go ahead with 512Mi.

Set resources.requests.cpu=10m on the operator's container; set
resources.requests.cpu=100m, resources.requests.memory=70Mi, and
resources.limits.memory=512Mi on the CoreDNS container; and set
resources.requests.cpu=10m on the node resolver sidecar container.

The setting for the operator is the standard setting for operators that run
on masters.  The requests setting for CoreDNS is based on the upstream
default whereas the limit is based on upstream discussions, which indicate
that the upstream default limit of 170Mi may be too low at scale.  The
setting for the sidecar is based on guidance that "it's ok for them to be
low to start" and absence of data.

This commit resolves NE-141.

https://jira.coreos.com/browse/NE-141

* assets/dns/daemonset.yaml: Set resources.requests.cpu=100m,
resources.requests.memory=70Mi, and resources.limits.memory=512Mi on the
CoreDNS container.  Set resources.requests.cpu=10m on the sidecar.
* manifests/0000_08_cluster-dns-operator_02-deployment.yaml: Set
resources.requests.cpu=10m on the operator container.
* pkg/manifests/bindata.go: Regenerate.
@Miciah Miciah force-pushed the NE-141-set-requests-on-operator-and-on-daemonset branch from b4772a3 to c105084 Compare February 7, 2019 17:18
@Miciah
Copy link
Contributor Author

Miciah commented Feb 7, 2019

/test e2e-aws

@ironcladlou
Copy link
Contributor

/lgtm
/retest

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 2cdf14b into openshift:master Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants