Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLM operator installation stuck for v0.5.0 #429

Closed
czunker opened this issue Jul 5, 2022 · 3 comments · Fixed by #432 or #434
Closed

OLM operator installation stuck for v0.5.0 #429

czunker opened this issue Jul 5, 2022 · 3 comments · Fixed by #432 or #434
Assignees
Labels
bug Something isn't working

Comments

@czunker
Copy link
Contributor

czunker commented Jul 5, 2022

Describe the bug
bundle container is stuck in CrashLoopBackOff.
Error from the logs:

mkdir: can't create directory '/database': Permission denied

To Reproduce
Steps to reproduce the behavior:

  1. start fresh minikube
  2. operator-sdk olm install
  3. kubectl create ns mondoo-operator
  4. operator-sdk run bundle ghcr.io/mondoohq/mondoo-operator-bundle:v0.5.0 --namespace mondoo-operator --timeout 3m0s
  5. check Pods in mondoo-operator namespace: kubectl -n mondoo-operator get pod
  6. Get logs of ghcr-io-mondoohq-mondoo-operator-bundle-v0-5-0: kubectl -n mondoo-operator logs ghcr-io-mondoohq-mondoo-operator-bundle-v0-5-0

Expected behavior
Pod should start.

Screenshots

k get pod -A
NAMESPACE         NAME                                             READY   STATUS    RESTARTS        AGE
kube-system       coredns-64897985d-w4nlc                          1/1     Running   0               2m49s
kube-system       etcd-minikube                                    1/1     Running   0               3m3s
kube-system       kube-apiserver-minikube                          1/1     Running   0               3m4s
kube-system       kube-controller-manager-minikube                 1/1     Running   0               3m3s
kube-system       kube-proxy-r9f4l                                 1/1     Running   0               2m49s
kube-system       kube-scheduler-minikube                          1/1     Running   0               3m3s
kube-system       storage-provisioner                              1/1     Running   1 (2m19s ago)   3m2s
mondoo-operator   ghcr-io-mondoohq-mondoo-operator-bundle-v0-5-0   0/1     Error     3 (33s ago)     61s
olm               catalog-operator-67bcbb4f5d-jjsck                1/1     Running   0               2m5s
olm               olm-operator-d9f76fdc9-2pm47                     1/1     Running   0               2m5s
olm               operatorhubio-catalog-jxwg8                      1/1     Running   0               118s
olm               packageserver-685694fddd-g2hxh                   1/1     Running   0               117s
olm               packageserver-685694fddd-lmnb8                   1/1     Running   0               117s

Additional context
This PR was merged: #424

But the securityContext of the failed Pod is not derived from this:

kubectl -n mondoo-operator get po ghcr-io-mondoohq-mondoo-operator-bundle-v0-5-0 -o yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-07-05T12:04:59Z"
  name: ghcr-io-mondoohq-mondoo-operator-bundle-v0-5-0
  namespace: mondoo-operator
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    kind: CatalogSource
    name: mondoo-operator-catalog
    uid: 4451e898-54ca-485f-9693-cde291f57daf
  resourceVersion: "957"
  uid: 28aa6f3b-91fa-4993-9a1b-697aaaa5ddef
spec:
  containers:
  - command:
    - sh
    - -c
    - |
      mkdir -p /database && \
      opm registry add -d /database/index.db -b ghcr.io/mondoohq/mondoo-operator-bundle:v0.5.0 --mode=semver --skip-tls-verify=false --use-http=false && \
      opm registry serve -d /database/index.db -p 50051
    image: quay.io/operator-framework/opm:latest
    imagePullPolicy: Always
    name: registry-grpc
    ports:
    - containerPort: 50051
      name: grpc
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xc9kq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: minikube
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-xc9kq
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-07-05T12:04:59Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-07-05T12:04:59Z"
    message: 'containers with unready status: [registry-grpc]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-07-05T12:04:59Z"
    message: 'containers with unready status: [registry-grpc]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-07-05T12:04:59Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://4a9c7a8bc05d5b87a77eaa43caf3bdbf22c094c54f82d807ffecdac63e393951
    image: quay.io/operator-framework/opm:latest
    imageID: docker-pullable://quay.io/operator-framework/opm@sha256:fda1d54edaf76df5baad31dc01954bb0112d36743cfe5b70d2d22fcc314ab36c
    lastState:
      terminated:
        containerID: docker://4a9c7a8bc05d5b87a77eaa43caf3bdbf22c094c54f82d807ffecdac63e393951
        exitCode: 1
        finishedAt: "2022-07-05T12:06:36Z"
        reason: Error
        startedAt: "2022-07-05T12:06:36Z"
    name: registry-grpc
    ready: false
    restartCount: 4
    started: false
    state:
      waiting:
        message: back-off 1m20s restarting failed container=registry-grpc pod=ghcr-io-mondoohq-mondoo-operator-bundle-v0-5-0_mondoo-operator(28aa6f3b-91fa-4993-9a1b-697aaaa5ddef)
        reason: CrashLoopBackOff
  hostIP: 192.168.49.2
  phase: Running
  podIP: 172.17.0.8
  podIPs:
  - ip: 172.17.0.8
  qosClass: BestEffort
  startTime: "2022-07-05T12:04:59Z"

@czunker czunker self-assigned this Jul 5, 2022
@czunker czunker added the bug Something isn't working label Jul 5, 2022
@czunker
Copy link
Contributor Author

czunker commented Jul 6, 2022

The container is running with uid 1001 and this user cannot create the directory:

/ $ id
uid=1001 gid=0(root)
/ $ ls -l
total 60
drwxr-xr-x    1 root     root          4096 Jul  4 13:58 bin
drwxr-xr-x    2 root     root          4096 Jan  1  1970 boot
drwxr-xr-x    2 root     root         12288 Jan  1  1970 busybox
drwxr-xr-x    5 root     root           360 Jul  6 03:51 dev
drwxr-xr-x    1 root     root          4096 Jul  6 03:51 etc
drwxr-xr-x    3 nonroot  nonroot       4096 Jan  1  1970 home
drwxr-xr-x    2 root     root          4096 Jan  1  1970 lib
dr-xr-xr-x  524 root     root             0 Jul  6 03:51 proc
drwx------    2 root     root          4096 Jan  1  1970 root
drwxr-xr-x    2 root     root          4096 Jan  1  1970 run
drwxr-xr-x    2 root     root          4096 Jan  1  1970 sbin
dr-xr-xr-x   13 root     root             0 Jul  6 03:51 sys
drwxrwxrwt    2 root     root          4096 Jan  1  1970 tmp
drwxr-xr-x    9 root     root          4096 Jan  1  1970 usr
drwxr-xr-x    1 root     root          4096 Jan  1  1970 var
/ $ mkdir -p /database
mkdir: can't create directory '/database': Permission denied

@czunker
Copy link
Contributor Author

czunker commented Jul 6, 2022

It seems to be the latest opm image.
The latest tag references v1.23.2.

Starting the Pod with quay.io/operator-framework/opm:v1.23.1, the container is running as root and can create the directory:

/ # id
uid=0(root) gid=0(root)
/ # mkdir -p /database
/ # ls -l
total 64
drwxr-xr-x    1 root     root          4096 Jul  1 14:08 bin
drwxr-xr-x    2 root     root          4096 Jan  1  1970 boot
drwxr-xr-x    2 root     root         12288 Jan  1  1970 busybox
drwxr-xr-x    2 root     root          4096 Jul  6 04:00 database
drwxr-xr-x    5 root     root           360 Jul  6 03:59 dev
...

@czunker
Copy link
Contributor Author

czunker commented Jul 6, 2022

Upstream issue: operator-framework/operator-registry#984

czunker added a commit that referenced this issue Jul 6, 2022
Workaround for operator-framework/operator-registry#984

Fixes #429

Signed-off-by: Christian Zunker <christian@mondoo.com>
chris-rock pushed a commit that referenced this issue Jul 7, 2022
Workaround for operator-framework/operator-registry#984

Fixes #429

Signed-off-by: Christian Zunker <christian@mondoo.com>
czunker added a commit that referenced this issue Jul 7, 2022
This prevents `permission denied` errors.

Workaround for operator-framework/operator-registry#984

Fixes #429

Signed-off-by: Christian Zunker <christian@mondoo.com>
czunker added a commit that referenced this issue Jul 7, 2022
This prevents `permission denied` errors.

Workaround for operator-framework/operator-registry#984

Fixes #429

Signed-off-by: Christian Zunker <christian@mondoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant