Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evicted CatalogSource pod causes failure #1677

Closed
flickerfly opened this issue Jul 23, 2020 · 1 comment · Fixed by #1680
Closed

Evicted CatalogSource pod causes failure #1677

flickerfly opened this issue Jul 23, 2020 · 1 comment · Fixed by #1680

Comments

@flickerfly
Copy link
Contributor

Bug Report

What did you do?
A grpc catalog pod setup by CatalogSource was evicted by the cluster.

What did you expect to see?
A new pod based on the catalog container image would be deployed.

What did you see instead? Under which circumstances?
A new pod did not get deployed until I deleted the evicted pod. Then it immediately deployed a new pod.

Environment
OLM 0.12.0 on OpenShift 3.11

$ oc version
oc v3.11.157
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://awnsercics01.esg.nswc.navy.mil:8443
openshift v3.11.157
kubernetes v1.11.0+d4cacc0

Additional context
I'm not sure how to replicate this issue and am not in a place where I can upgrade OLM. I'm hoping this is a quick "oh we forgot to include that" type thing. If not, I'm comfortable with it simply being closed as WONTFIX. It isn't to a point where it is worth major effort.

@kramvan1
Copy link
Contributor

I believe this is because of the open toleration:

tolerations:
      - operator: Exists

Which matches any node, but does not account for nodes that are not schedulable or pods that have been evicted.

I see no reason to have this toleration for the catalog source pod, there's no specific use case for it.
I will put up a PR to remove this.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/operator-lifecycle-manager that referenced this issue Aug 5, 2020
This toleration will try to go to any node, but does not handle
nodes that are not schedulable or pods that have been evicted from a
node.

Fixes operator-framework#1677
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants