Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CephNFS example fails to start pod with: "ganesha.nfsd": executable file not found in $PATH: unknown #7671

Closed
tibbe opened this issue Apr 17, 2021 · 11 comments
Labels

Comments

@tibbe
Copy link

tibbe commented Apr 17, 2021

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

Ganesha NFS job didn't start when using example the rook/cluster/examples/kubernetes/ceph/nfs.yaml config. Fails with:

Error: failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "ganesha.nfsd": executable file not found in $PATH: unknown

Expected behavior:

Expected ganesha job to start.

How to reproduce it (minimal and precise):

Set up a rook-ceph cluster using the exact configs provided in the quickstart guide. Basically:

git clone --single-branch --branch v1.6.0 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml
kubectl create -f filesystem.yaml

Finally start a ganesha job:

kubectl apply -f nfs.yaml

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator's logs, if necessary
  • Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read Github documentation if you need help.

$ kubectl describe pod -n rook-ceph rook-ceph-nfs-my-nfs-a-5c4694c5bf-b8q99
Name:         rook-ceph-nfs-my-nfs-a-5c4694c5bf-b8q99
Namespace:    rook-ceph
Priority:     0
Node:         pi2/192.168.1.102
Start Time:   Sat, 17 Apr 2021 10:32:35 +0200
Labels:       app=rook-ceph-nfs
              ceph_daemon_id=a
              ceph_daemon_type=nfs
              ceph_nfs=my-nfs
              instance=a
              nfs=a
              pod-template-hash=5c4694c5bf
              rook_cluster=rook-ceph
Annotations:  <none>
Status:       Running
IP:           10.42.2.20
IPs:
  IP:           10.42.2.20
Controlled By:  ReplicaSet/rook-ceph-nfs-my-nfs-a-5c4694c5bf
Init Containers:
  generate-minimal-ceph-conf:
    Container ID:  containerd://d15c0db6e9a59bb3e11fa696a65f3f05fce1c3528b147e52902c8e38fb049ff3
    Image:         ceph/ceph:v15.2.9
    Image ID:      docker.io/ceph/ceph@sha256:ffc5c6c93b4ff584400eab951ae5d41ef0042031c96fb539b50f75cca5c68548
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c

      set -xEeuo pipefail

      cat << EOF > /etc/ceph/ceph.conf
      [global]
      mon_host = $(ROOK_CEPH_MON_HOST)

      [client.nfs-ganesha.my-nfs.a]
      keyring = /etc/ceph/keyring-store/keyring
      EOF

      chmod 444 /etc/ceph/ceph.conf

      cat /etc/ceph/ceph.conf

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 17 Apr 2021 10:32:37 +0200
      Finished:     Sat, 17 Apr 2021 10:32:37 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
    Mounts:
      /etc/ceph from etc-ceph (rw)
      /etc/ceph/keyring-store/ from rook-ceph-nfs-my-nfs-a-keyring (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r8hvx (ro)
Containers:
  nfs-ganesha:
    Container ID:  containerd://15581b6c9fcb3f4ab87f62893c397c6ff2dd1b6b8774add941315d452a4b15de
    Image:         ceph/ceph:v15.2.9
    Image ID:      docker.io/ceph/ceph@sha256:ffc5c6c93b4ff584400eab951ae5d41ef0042031c96fb539b50f75cca5c68548
    Port:          <none>
    Host Port:     <none>
    Command:
      ganesha.nfsd
    Args:
      -F
      -L
      STDERR
      -p
      /var/run/ganesha/ganesha.pid
      -N
      NIV_INFO
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       StartError
      Message:      failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "ganesha.nfsd": executable file not found in $PATH: unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 01:00:00 +0100
      Finished:     Sat, 17 Apr 2021 10:35:38 +0200
    Ready:          False
    Restart Count:  5
    Environment:
      CONTAINER_IMAGE:                ceph/ceph:v15.2.9
      POD_NAME:                       rook-ceph-nfs-my-nfs-a-5c4694c5bf-b8q99 (v1:metadata.name)
      POD_NAMESPACE:                  rook-ceph (v1:metadata.namespace)
      NODE_NAME:                       (v1:spec.nodeName)
      POD_MEMORY_LIMIT:               node allocatable (limits.memory)
      POD_MEMORY_REQUEST:             0 (requests.memory)
      POD_CPU_LIMIT:                  node allocatable (limits.cpu)
      POD_CPU_REQUEST:                0 (requests.cpu)
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
    Mounts:
      /etc/ceph from etc-ceph (rw)
      /etc/ceph/keyring-store/ from rook-ceph-nfs-my-nfs-a-keyring (ro)
      /etc/ganesha from ganesha-config (rw)
      /run/dbus from run-dbus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r8hvx (ro)
  dbus-daemon:
    Container ID:  containerd://e1f55c8ed1d87c4169d0e748f7ed3fc2f8549b86b8a6ad2b05270b110c4b4097
    Image:         ceph/ceph:v15.2.9
    Image ID:      docker.io/ceph/ceph@sha256:ffc5c6c93b4ff584400eab951ae5d41ef0042031c96fb539b50f75cca5c68548
    Port:          <none>
    Host Port:     <none>
    Command:
      dbus-daemon
    Args:
      --nofork
      --system
      --nopidfile
    State:          Running
      Started:      Sat, 17 Apr 2021 10:32:39 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      CONTAINER_IMAGE:     ceph/ceph:v15.2.9
      POD_NAME:            rook-ceph-nfs-my-nfs-a-5c4694c5bf-b8q99 (v1:metadata.name)
      POD_NAMESPACE:       rook-ceph (v1:metadata.namespace)
      NODE_NAME:            (v1:spec.nodeName)
      POD_MEMORY_LIMIT:    node allocatable (limits.memory)
      POD_MEMORY_REQUEST:  0 (requests.memory)
      POD_CPU_LIMIT:       node allocatable (limits.cpu)
      POD_CPU_REQUEST:     0 (requests.cpu)
    Mounts:
      /run/dbus from run-dbus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r8hvx (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  etc-ceph:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  rook-ceph-nfs-my-nfs-a-keyring:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-nfs-my-nfs-a-keyring
    Optional:    false
  ganesha-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-ceph-nfs-my-nfs-a
    Optional:  false
  run-dbus:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-r8hvx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-r8hvx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 5s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m20s                  default-scheduler  Successfully assigned rook-ceph/rook-ceph-nfs-my-nfs-a-5c4694c5bf-b8q99 to pi2
  Normal   Pulled     3m19s                  kubelet            Container image "ceph/ceph:v15.2.9" already present on machine
  Normal   Created    3m19s                  kubelet            Created container generate-minimal-ceph-conf
  Normal   Started    3m18s                  kubelet            Started container generate-minimal-ceph-conf
  Normal   Pulled     3m17s                  kubelet            Container image "ceph/ceph:v15.2.9" already present on machine
  Normal   Created    3m16s                  kubelet            Created container dbus-daemon
  Normal   Started    3m16s                  kubelet            Started container dbus-daemon
  Normal   Pulled     2m29s (x4 over 3m17s)  kubelet            Container image "ceph/ceph:v15.2.9" already present on machine
  Normal   Created    2m29s (x4 over 3m17s)  kubelet            Created container nfs-ganesha
  Warning  Failed     2m28s (x4 over 3m17s)  kubelet            Error: failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "ganesha.nfsd": executable file not found in $PATH: unknown
  Warning  BackOff    2m3s (x7 over 3m14s)   kubelet            Back-off restarting failed container

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 20.04.2 LTS
  • Kernel (e.g. uname -a): Linux pi0 5.4.0-1034-raspi Enable test coverage with the unit tests #37-Ubuntu SMP PREEMPT Mon Apr 12 23:14:49 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
  • Cloud provider or hardware configuration: Bare metal 4x Raspberry Pi 4B
  • Rook version (use rook version inside of a Rook Pod):
    rook: v1.6.0
    go: go1.16.3
    
  • Storage backend version (e.g. for ceph do ceph -v): ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", 
    GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", 
    GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4+k3s1", 
    GitCommit:"838a906ab5eba62ff529d6a3a746384eba810758", GitTreeState:"clean", BuildDate:"2021-02-22T19:49:35Z", 
    GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/arm64"}
    
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): k3s
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK
@tibbe tibbe added the bug label Apr 17, 2021
@tibbe
Copy link
Author

tibbe commented Apr 19, 2021

I've debugged a little further: /bin/ganesha.nfsd seems to be simply missing in the arm64 image but it's present in the amd64 image. Seems like a broken arm64 release.

@leseb
Copy link
Member

leseb commented Apr 19, 2021

I've debugged a little further: /bin/ganesha.nfsd seems to be simply missing in the arm64 image but it's present in the amd64 image. Seems like a broken arm64 release.

Where things different in 1.5?

@tibbe
Copy link
Author

tibbe commented Apr 19, 2021

This is the first image of ceph I've ever tried so I don't know (yet).

@leseb
Copy link
Member

leseb commented Apr 19, 2021

This is the first image of ceph I've ever tried so I don't know (yet).

Ah, so you might also be the first one to try ceph-nfs on arm64 :-). The answer might be as simple as they are no arm64 builds.

@leseb
Copy link
Member

leseb commented Apr 19, 2021

@dsavineau do you know?

@tibbe
Copy link
Author

tibbe commented Apr 19, 2021

The issue persists in ceph/ceph:v16.2.0. There's definitely an arm64 image and someone must have gone through the effort to at least create it. I might be the first person to actually test it however. :)

@tibbe
Copy link
Author

tibbe commented Apr 19, 2021

It struck me that since this is a generic "ceph" image rather than a NFS-specific one it could well be that there's just a "build rule" missing for ganesha somewhere in the build system.

@tibbe
Copy link
Author

tibbe commented Apr 19, 2021

Also filed ceph/ceph-container#1878 as I'm unclear who's responsible for building these images.

@dsavineau
Copy link

@dsavineau do you know?

Yes I do.

nfs-ganesha packages aren't available on arm64 architecture on download.ceph.com, we explicitly disable ganesha on arm64 during the container image build.

@tibbe
Copy link
Author

tibbe commented Apr 19, 2021

That explains it. I'll close this issue for now and file a feature request against ceph to have them built.

@tibbe tibbe closed this as completed Apr 19, 2021
@tibbe
Copy link
Author

tibbe commented Apr 20, 2021

Filed an issue against ceph: https://tracker.ceph.com/issues/50437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants