templates: add priority class `system-node-critical` to etcd pod #353

abhinavdahiya · 2019-01-29T17:38:55Z

using system-node-critical from https://docs.okd.io/latest/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

smarterclayton · 2019-01-29T17:40:28Z

templates/master/00-master/_base/files/etc-kubernetes-manifests-etcd-member.yaml

@@ -120,6 +120,9 @@ contents:
          containerPort: 2379
          protocol: TCP
      hostNetwork: true
+      priorityClassName: system-node-critical


@sjenning are there any other magic incantations missing from etcd here?

really should be system-cluster-critical

from https://docs.okd.io/latest/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

System-cluster-critical - This priority class has a value of 2000000000 (two billion) and is used with pods that are important for the cluster. Pods with this priority class can be evicted from a node in certain circumstances. For example, pods configured with the system-node-critical priority class can take priority. However, this priority class does ensure guaranteed scheduling. Examples of pods that can have this priority class are fluentd, add-on components like descheduler, and so forth.

seems like cluster-critical can be evicted...

Technically on that node the static pod is critical (so I can buy node critical). Seems cluster critical may need to be defined better. Also etcd is special.

cgwalters · 2019-02-01T14:06:23Z

Looks like this needs a rebase. (I am still tempted to change the template unit tests to only sanity check one or two generated files, not all of them)

crawford · 2019-02-05T19:19:19Z

I tried testing this locally and I'm still seeing kubelet preempt etcd-member. Either my test procedure is faulty or this isn't sufficient.

abhinavdahiya · 2019-02-05T19:20:43Z

I tried testing this locally and I'm still seeing kubelet preempt etcd-member. Either my test procedure is faulty or this isn't sufficient.

@sjenning on a local cluster that doesn't have this change, the etcd pod was evicted.

We edited the etcd static pod to includes these changes on the node and restarted the kubelet. But the kubelet still evicted the etcd pod...

openshift-merge-robot · 2019-02-06T06:27:57Z

/retest

openshift-merge-robot · 2019-02-06T12:17:32Z

/retest

https://docs.okd.io/latest/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

```console go test ./pkg/controller/template/... -u ```

abhinavdahiya · 2019-02-07T20:41:51Z

/retest

sjenning · 2019-02-07T21:04:51Z

/lgtm

openshift-ci-robot · 2019-02-07T21:05:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2019-02-07T21:37:54Z

rate limiting errors e2e-aws

level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating cluster..."
level=error
level=error msg="Error: Error applying plan:"
level=error
level=error msg="2 errors occurred:"
level=error msg="\t* module.vpc.aws_route_table_association.worker_routing[0]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.worker_routing.0: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error msg="\t* module.vpc.aws_route_table_association.route_net[5]: 1 error occurred:"
level=error msg="\t* aws_route_table_association.route_net.5: timeout while waiting for state to become 'success' (timeout: 5m0s)"
level=error
level=error
level=error
level=error
level=error
level=error msg="Terraform does not automatically rollback in the face of errors."
level=error msg="Instead, your Terraform state file has been partially updated with"
level=error msg="any resources that successfully completed. Please address the error"
level=error msg="above and apply again to incrementally change your infrastructure."
level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"

will retest in a bit.

sjenning · 2019-02-08T22:18:27Z

/retest

ashcrow · 2019-02-09T00:59:24Z

/retest

ashcrow · 2019-02-09T15:04:00Z

clusterversion.config.openshift.io/version condition met
/bin/bash: line 52: /etc/passwd: Permission denied

ashcrow · 2019-02-09T15:04:10Z

/test e2e-aws-op

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 29, 2019

openshift-ci-robot requested review from cgwalters and smarterclayton January 29, 2019 17:39

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 29, 2019

smarterclayton reviewed Jan 29, 2019

View reviewed changes

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 1, 2019

abhinavdahiya force-pushed the etcd_priority_class branch from a073a91 to a75a166 Compare February 5, 2019 19:17

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2019

abhinavdahiya added 2 commits February 6, 2019 08:56

templates: add priority class system-node-critical to etcd pod

faf4a93

https://docs.okd.io/latest/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

update golden files

266b6d2

```console go test ./pkg/controller/template/... -u ```

abhinavdahiya force-pushed the etcd_priority_class branch from a75a166 to 266b6d2 Compare February 6, 2019 16:56

openshift-ci-robot assigned sjenning Feb 7, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 7, 2019

openshift-merge-robot merged commit 943ed14 into openshift:master Feb 9, 2019

runcom mentioned this pull request Feb 9, 2019

Assign a priority class to pods #197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

templates: add priority class `system-node-critical` to etcd pod #353

templates: add priority class `system-node-critical` to etcd pod #353

abhinavdahiya commented Jan 29, 2019

smarterclayton Jan 29, 2019

sjenning Feb 5, 2019

abhinavdahiya Feb 5, 2019

smarterclayton Feb 7, 2019

cgwalters commented Feb 1, 2019

crawford commented Feb 5, 2019

abhinavdahiya commented Feb 5, 2019

openshift-merge-robot commented Feb 6, 2019

openshift-merge-robot commented Feb 6, 2019

abhinavdahiya commented Feb 7, 2019

sjenning commented Feb 7, 2019

openshift-ci-robot commented Feb 7, 2019

abhinavdahiya commented Feb 7, 2019

sjenning commented Feb 8, 2019

ashcrow commented Feb 9, 2019

ashcrow commented Feb 9, 2019

ashcrow commented Feb 9, 2019

templates: add priority class system-node-critical to etcd pod #353

templates: add priority class system-node-critical to etcd pod #353

Conversation

abhinavdahiya commented Jan 29, 2019

smarterclayton Jan 29, 2019

Choose a reason for hiding this comment

sjenning Feb 5, 2019

Choose a reason for hiding this comment

abhinavdahiya Feb 5, 2019

Choose a reason for hiding this comment

smarterclayton Feb 7, 2019

Choose a reason for hiding this comment

cgwalters commented Feb 1, 2019

crawford commented Feb 5, 2019

abhinavdahiya commented Feb 5, 2019

openshift-merge-robot commented Feb 6, 2019

openshift-merge-robot commented Feb 6, 2019

abhinavdahiya commented Feb 7, 2019

sjenning commented Feb 7, 2019

openshift-ci-robot commented Feb 7, 2019

abhinavdahiya commented Feb 7, 2019

sjenning commented Feb 8, 2019

ashcrow commented Feb 9, 2019

ashcrow commented Feb 9, 2019

ashcrow commented Feb 9, 2019

templates: add priority class `system-node-critical` to etcd pod #353

templates: add priority class `system-node-critical` to etcd pod #353