Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog Source Pod is not recreated when transitioned to the terminated state #2709

Open
Gentoli opened this issue Mar 25, 2022 · 1 comment
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@Gentoli
Copy link

Gentoli commented Mar 25, 2022

Bug Report

What did you do?

  • Have a catalog source.
  • catalog source generates a pod on node A.
  • node A get restarted/replaced.
  • Pod not replaced

What did you expect to see?
Pod is recreated.

What did you see instead? Under which circumstances?
Dead pod not replaced. But deleting the dead pods manually will trigger recreate.

Environment

  • operator-lifecycle-manager version:
OLM version: v0.20.0
git commit: e6428a19b52d2fd7e689577d7be55223b1b2e5f8
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6-gke.1500", GitCommit:"5595443086b60d8c5c62342fadc2d4fda9c793e8", GitTreeState:"clean", BuildDate:"2022-02-09T09:25:03Z", GoVersion:"go1.16.12b7", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind: GKE

Possible Solution

Check for this pod condition and replace it.

Or

One of the comment in #2666 suggests to make CatalogSource Pod controller back (deployment/ss), that should also resolve this.

Additional context

Pod status
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-03-24T18:48:24Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-03-24T21:02:39Z"
    message: 'containers with unready status: [registry-server]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-03-24T21:02:39Z"
    message: 'containers with unready status: [registry-server]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-03-24T18:48:24Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://d327b53da13df65232bfaa19120022a6a96af630350ac43fb51a17b15f0c55bb
    image: quay.io/operatorhubio/catalog:latest
    imageID: quay.io/operatorhubio/catalog@sha256:009ba4d793616312c7a847dd4a64455971b2d7d68a5d2a16e76d6df3ce03eedc
    lastState: {}
    name: registry-server
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://d327b53da13df65232bfaa19120022a6a96af630350ac43fb51a17b15f0c55bb
        exitCode: 0
        finishedAt: "2022-03-24T21:02:38Z"
        reason: Completed
        startedAt: "2022-03-24T18:48:28Z"
  hostIP: 10.100.4.19
  message: Pod was terminated in response to imminent node shutdown.
  phase: Failed
  podIP: 10.100.18.45
  podIPs:
  - ip: 10.100.18.45
  qosClass: Burstable
  reason: Terminated
  startTime: "2022-03-24T18:48:24Z"```
@Gentoli Gentoli added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2022
@exdx
Copy link
Member

exdx commented Mar 31, 2022

Hi @Gentoli,

Thanks for bringing this up -- we know this is affecting users and is poor UX (to not have the catalog source pod be managed by a built-in controller). We will open up an RFE on the JIRA board and see that we can get this work prioritized.

@exdx exdx self-assigned this Mar 31, 2022
@exdx exdx added this to the Backlog milestone Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants