Skip to content

installer: add controller that watch pending installer pods#550

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
mfojtik:installer-state-controller
Oct 11, 2019
Merged

installer: add controller that watch pending installer pods#550
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
mfojtik:installer-state-controller

Conversation

@mfojtik
Copy link
Copy Markdown
Contributor

@mfojtik mfojtik commented Oct 8, 2019

This will add controller that watches the installer pods in Pending state that are in this state for longer then 5 minutes. If such pods are found, the controller will then make the operator go to Degraded and report the reason and message found for such pod/container state.

This will help improve debugging and triaging bugs caused by slow networking or kubelet that prevents rolling updates to static pod based operators.

/cc @deads2k
/cc @sttts

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 8, 2019
@mfojtik mfojtik force-pushed the installer-state-controller branch 2 times, most recently from 7d3aa06 to 9244238 Compare October 8, 2019 13:25
@mfojtik mfojtik force-pushed the installer-state-controller branch from 9244238 to 059fec0 Compare October 8, 2019 13:49
@mfojtik mfojtik force-pushed the installer-state-controller branch from 059fec0 to 55e91fb Compare October 8, 2019 15:16
@mfojtik mfojtik force-pushed the installer-state-controller branch from 55e91fb to f3a5f1c Compare October 8, 2019 15:20
@mfojtik
Copy link
Copy Markdown
Contributor Author

mfojtik commented Oct 8, 2019

The proof PR is green: openshift/cluster-kube-apiserver-operator#586

@mfojtik
Copy link
Copy Markdown
Contributor Author

mfojtik commented Oct 8, 2019

/cherrypick release-4.2

@openshift-cherrypick-robot
Copy link
Copy Markdown

@mfojtik: once the present PR merges, I will cherry-pick it on top of release-4.2 in a new PR and assign it to you.

Details

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question.
/lgtm

if len(pods) == 0 {
return conditions, nil
}
namespaceEvents, err := c.eventsGetter.Events(c.targetNamespace).List(metav1.ListOptions{})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use Search(scheme, obj) here? This way you'd get only events for the pods. You would be exchanging one call vs several but each will be about specific object?
I'm not sure if obj can be a list of objects, though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure the .Search will make it faster/more efficient... this is only listing events for single namespace and only when there is actual pending pods :-)

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2019
if event.InvolvedObject.Kind != "Pod" {
continue
}
if !strings.Contains(event.Message, "failed to create pod network") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we could benefit from the more general approach of https://github.com/openshift/cluster-kube-apiserver-operator/pull/571/files#diff-d4d4aa822fed3489ac3aee1560b501b4R94 where the idea was that you can later easily extend the number of regexes to get a specific reason about why the pod is failing and not only stick to network failures in the longer run.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a good follow up I think

@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Oct 11, 2019

this is ok to start, but I would try to write the code so that it works when kubelet events fail to de-dupe and when we have more than one installer pod

/lgtm

@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, mfojtik, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2019
@openshift-merge-robot openshift-merge-robot merged commit 1c39da7 into openshift:master Oct 11, 2019
@openshift-cherrypick-robot
Copy link
Copy Markdown

@mfojtik: new pull request created: #555

Details

In response to this:

/cherrypick release-4.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mfojtik added a commit to mfojtik/cluster-kube-apiserver-operator that referenced this pull request Oct 14, 2019
openshift/library-go#551: Add pkg/crypto:MakeSelfSignedCAConfigForSubject
openshift/library-go#550: installer: add controller that watch pending installer pods
openshift/library-go#546: Emit event when certificate gets updated
bertinatto pushed a commit to bertinatto/library-go that referenced this pull request Jul 2, 2020
installer: add controller that watch pending installer pods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants