Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EdgeFS: All rook discover pods go in crashloopback /error #5815

Closed
babvin opened this issue Jul 13, 2020 · 3 comments
Closed

EdgeFS: All rook discover pods go in crashloopback /error #5815

babvin opened this issue Jul 13, 2020 · 3 comments
Labels

Comments

@babvin
Copy link

babvin commented Jul 13, 2020

  • Bug Report
    `
    ubuntu@k8s-pi-master:~$ kubectl get all -n rook-edgefs-system
    NAME READY STATUS RESTARTS AGE
    pod/rook-discover-25rnq 0/1 CrashLoopBackOff 32 138m
    pod/rook-discover-2w5ms 0/1 CrashLoopBackOff 31 138m
    pod/rook-discover-xk222 0/1 CrashLoopBackOff 31 138m
    pod/rook-edgefs-operator-599c575484-zzhm4 1/1 Running 0 138m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/rook-discover 3 3 0 3 0 138m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/rook-edgefs-operator 1/1 1 1 138m

NAME DESIRED CURRENT READY AGE
replicaset.apps/rook-edgefs-operator-599c575484 1 1 1 138m
ubuntu@k8s-pi-master:~$

`

How to reproduce it (minimal and precise):

kubectl create -f operator.yaml

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
  • Operator's logs, if necessary
  • Crashing pod(s) logs, if necessary
    `
    ubuntu@k8s-pi-master:$ kubectl get all -n rook-edgefs
    No resources found in rook-edgefs namespace.
    ubuntu@k8s-pi-master:
    $ kubectl -n rook-edgefs-system logs pod/rook-discover-25rnq
    2020-07-13 15:25:42.216280 I | rookcmd: starting Rook v1.3.7 with arguments '/usr/local/bin/rook discover --discover-interval 60m'
    2020-07-13 15:25:42.216650 I | rookcmd: flag values: --discover-interval=1h0m0s, --help=false, --log-flush-frequency=5s, --log-level=INFO, --operator-image=, --service-account=, --use-ceph-volume=false
    2020-07-13 15:25:42.222668 I | rook-discover: updating device configmap
    panic: interface conversion: error is *os.PathError, not *exec.ExitError

goroutine 1 [running]:
github.com/rook/rook/pkg/util/exec.runCommandWithOutput(0x400037e000, 0x0, 0x400019e000, 0x5, 0x5, 0x400037e000)
/home/rook/go/src/github.com/rook/rook/pkg/util/exec/exec.go:268 +0x210
github.com/rook/rook/pkg/util/exec.(*CommandExecutor).ExecuteCommandWithOutput(0x2fa0918, 0x1a2db37, 0x5, 0x400019e000, 0x5, 0x5, 0x400069d518, 0xe2ea8, 0x400069d799, 0x400069d55a)
/home/rook/go/src/github.com/rook/rook/pkg/util/exec/exec.go:126 +0x84
github.com/rook/rook/pkg/util/sys.ListDevices(0x1ecf320, 0x2fa0918, 0x400069d558, 0x47474, 0x19, 0x4000486240, 0x400069d5a8)
/home/rook/go/src/github.com/rook/rook/pkg/util/sys/device.go:118 +0x88
github.com/rook/rook/pkg/clusterd.DiscoverDevices(0x1ecf320, 0x2fa0918, 0x0, 0x2fa0918, 0x0, 0x400069d900, 0x40000aa120)
/home/rook/go/src/github.com/rook/rook/pkg/clusterd/disk.go:53 +0x38
github.com/rook/rook/pkg/daemon/discover.probeDevices(0x40002d3800, 0x40000b6103, 0x1a52a03, 0x19, 0x0, 0x0)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:400 +0x58
github.com/rook/rook/pkg/daemon/discover.updateDeviceCM(0x40002d3800, 0x400069dc00, 0x1)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:326 +0x68
github.com/rook/rook/pkg/daemon/discover.Run(0x40002d3800, 0x34630b8a000, 0x400069dc00, 0x400069dc88, 0x0)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:89 +0x23c
main.startDiscover(0x2f6a400, 0x40006138e0, 0x0, 0x2, 0x0, 0x0)
/home/rook/go/src/github.com/rook/rook/cmd/rook/discover.go:55 +0x68
github.com/spf13/cobra.(*Command).execute(0x2f6a400, 0x4000613880, 0x2, 0x2, 0x2f6a400, 0x4000613880)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x314
github.com/spf13/cobra.(*Command).ExecuteC(0x2f69500, 0x400069def8, 0xa, 0xa)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x22c
github.com/spf13/cobra.(*Command).Execute(...)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
/home/rook/go/src/github.com/rook/rook/cmd/rook/main.go:34 +0x110

ubuntu@k8s-pi-master:~$ kubectl -n rook-edgefs-system logs pod/rook-discover-2w5ms
2020-07-13 15:26:02.205303 I | rookcmd: starting Rook v1.3.7 with arguments '/usr/local/bin/rook discover --discover-interval 60m'
2020-07-13 15:26:02.205520 I | rookcmd: flag values: --discover-interval=1h0m0s, --help=false, --log-flush-frequency=5s, --log-level=INFO, --operator-image=, --service-account=, --use-ceph-volume=false
2020-07-13 15:26:02.212126 I | rook-discover: updating device configmap
panic: interface conversion: error is *os.PathError, not *exec.ExitError

goroutine 1 [running]:
github.com/rook/rook/pkg/util/exec.runCommandWithOutput(0x40005ba000, 0x0, 0x400068ee10, 0x5, 0x5, 0x40005ba000)
/home/rook/go/src/github.com/rook/rook/pkg/util/exec/exec.go:268 +0x210
github.com/rook/rook/pkg/util/exec.(*CommandExecutor).ExecuteCommandWithOutput(0x2fa0918, 0x1a2db37, 0x5, 0x400068ee10, 0x5, 0x5, 0x40006eb518, 0xe2ea8, 0x40006eb79a, 0x40006eb55b)
/home/rook/go/src/github.com/rook/rook/pkg/util/exec/exec.go:126 +0x84
github.com/rook/rook/pkg/util/sys.ListDevices(0x1ecf320, 0x2fa0918, 0x40006eb558, 0x47474, 0x19, 0x40005e2860, 0x40006eb5a8)
/home/rook/go/src/github.com/rook/rook/pkg/util/sys/device.go:118 +0x88
github.com/rook/rook/pkg/clusterd.DiscoverDevices(0x1ecf320, 0x2fa0918, 0x0, 0x2fa0918, 0x0, 0x40006eb900, 0x40000d80c0)
/home/rook/go/src/github.com/rook/rook/pkg/clusterd/disk.go:53 +0x38
github.com/rook/rook/pkg/daemon/discover.probeDevices(0x400032a200, 0x400063a503, 0x1a52a03, 0x19, 0x0, 0x0)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:400 +0x58
github.com/rook/rook/pkg/daemon/discover.updateDeviceCM(0x400032a200, 0x40006ebc00, 0x1)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:326 +0x68
github.com/rook/rook/pkg/daemon/discover.Run(0x400032a200, 0x34630b8a000, 0x40006ebc00, 0x40006ebc88, 0x0)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:89 +0x23c
main.startDiscover(0x2f6a400, 0x40006ad780, 0x0, 0x2, 0x0, 0x0)
/home/rook/go/src/github.com/rook/rook/cmd/rook/discover.go:55 +0x68
github.com/spf13/cobra.(*Command).execute(0x2f6a400, 0x40006ad6c0, 0x2, 0x2, 0x2f6a400, 0x40006ad6c0)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x314
github.com/spf13/cobra.(*Command).ExecuteC(0x2f69500, 0x40006ebef8, 0xa, 0xa)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x22c
github.com/spf13/cobra.(*Command).Execute(...)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
/home/rook/go/src/github.com/rook/rook/cmd/rook/main.go:34 +0x110
ubuntu@k8s-pi-master:$

ubuntu@k8s-pi-master:
$ kubectl -n rook-edgefs-system logs pod/rook-discover-xk222
2020-07-13 15:26:10.023193 I | rookcmd: starting Rook v1.3.7 with arguments '/usr/local/bin/rook discover --discover-interval 60m'
2020-07-13 15:26:10.023409 I | rookcmd: flag values: --discover-interval=1h0m0s, --help=false, --log-flush-frequency=5s, --log-level=INFO, --operator-image=, --service-account=, --use-ceph-volume=false
2020-07-13 15:26:10.028857 I | rook-discover: updating device configmap
panic: interface conversion: error is *os.PathError, not *exec.ExitError

goroutine 1 [running]:
github.com/rook/rook/pkg/util/exec.runCommandWithOutput(0x40003c6000, 0x0, 0x4000114140, 0x5, 0x5, 0x40003c6000)
/home/rook/go/src/github.com/rook/rook/pkg/util/exec/exec.go:268 +0x210
github.com/rook/rook/pkg/util/exec.(*CommandExecutor).ExecuteCommandWithOutput(0x2fa0918, 0x1a2db37, 0x5, 0x4000114140, 0x5, 0x5, 0x4000633518, 0xe2ea8, 0x4000633799, 0x400063355a)
/home/rook/go/src/github.com/rook/rook/pkg/util/exec/exec.go:126 +0x84
github.com/rook/rook/pkg/util/sys.ListDevices(0x1ecf320, 0x2fa0918, 0x4000633558, 0x47474, 0x19, 0x40004324e0, 0x40006335a8)
/home/rook/go/src/github.com/rook/rook/pkg/util/sys/device.go:118 +0x88
github.com/rook/rook/pkg/clusterd.DiscoverDevices(0x1ecf320, 0x2fa0918, 0x0, 0x2fa0918, 0x0, 0x4000633900, 0x40000aa120)
/home/rook/go/src/github.com/rook/rook/pkg/clusterd/disk.go:53 +0x38
github.com/rook/rook/pkg/daemon/discover.probeDevices(0x400023da00, 0x4000194203, 0x1a52a03, 0x19, 0x0, 0x0)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:400 +0x58
github.com/rook/rook/pkg/daemon/discover.updateDeviceCM(0x400023da00, 0x4000633c00, 0x1)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:326 +0x68
github.com/rook/rook/pkg/daemon/discover.Run(0x400023da00, 0x34630b8a000, 0x4000633c00, 0x4000633c88, 0x0)
/home/rook/go/src/github.com/rook/rook/pkg/daemon/discover/discover.go:89 +0x23c
main.startDiscover(0x2f6a400, 0x4000113460, 0x0, 0x2, 0x0, 0x0)
/home/rook/go/src/github.com/rook/rook/cmd/rook/discover.go:55 +0x68
github.com/spf13/cobra.(*Command).execute(0x2f6a400, 0x4000113320, 0x2, 0x2, 0x2f6a400, 0x4000113320)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x314
github.com/spf13/cobra.(*Command).ExecuteC(0x2f69500, 0x4000633ef8, 0xa, 0xa)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x22c
github.com/spf13/cobra.(*Command).Execute(...)
/home/rook/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
/home/rook/go/src/github.com/rook/rook/cmd/rook/main.go:34 +0x110
ubuntu@k8s-pi-master:~$

`
Environment:

  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu" VERSION="20.04 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
  • Kernel (e.g. uname -a): Linux k8s-pi-master 5.4.0-1013-raspi add mercurial to the build #13-Ubuntu SMP Mon Jun 15 03:17:37 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
  • Cloud provider or hardware configuration: Raspberry Pi
  • Rook version (use rook version inside of a Rook Pod): 1.3.7
  • Storage backend version (e.g. for ceph do ceph -v): edgefs
  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-07-07T14:11:32Z", GoVersion:"go1.13.12", Compiler:"gc", Platform:"linux/arm64"} Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.4-1+629a74c268c116", GitCommit:"629a74c268c116cd47a12f745e6949ec99a71c3a", GitTreeState:"clean", BuildDate:"2020-06-19T20:33:28Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"linux/arm64"} ubuntu@k8s-pi-master:~$
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): micro-k8s
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): NFS
@babvin babvin added the bug label Jul 13, 2020
@travisn travisn added the EdgeFS label Jul 13, 2020
@travisn travisn changed the title All rook discover pods go in crashloopback /error EdgeFS: All rook discover pods go in crashloopback /error Jul 13, 2020
@dotmaster
Copy link

dotmaster commented Aug 15, 2020

I patched the code in the operator image to log the real error (basically I removed the casting, which makes problems here)

In

output = []byte(fmt.Sprintf("%s. %s", string(output), string(err.(*exec.ExitError).Stderr)))

replaced the line with:

output = []byte(fmt.Sprintf("%s. %s", string(output), string(err.Error())))

When I then run the new operator it shows th real error:

020-08-15T17:06:41.600800252+01:00 2020-08-15 16:06:41.599604 I | rookcmd: starting Rook v1.3.0-beta.0.797.ga8381c33-dirty with arguments '/usr/local/bin/rook discover --discover-interval 60m'
2020-08-15T17:06:41.601061988+01:00 2020-08-15 16:06:41.600182 I | rookcmd: flag values: --discover-interval=1h0m0s, --help=false, --log-flush-frequency=5s, --log-level=INFO, --operator-image=, --service-account=, --use-ceph-volume=false
2020-08-15T17:06:41.612264865+01:00 2020-08-15 16:06:41.611725 I | rook-discover: updating device configmap
2020-08-15T17:06:41.626544203+01:00 2020-08-15 16:06:41.625954 I | rook-discover: failed to probe devices: failed initial hardware discovery. failed to list all devices: fork/exec /bin/lsblk: exec format error
2020-08-15T17:06:41.626677904+01:00 2020-08-15 16:06:41.626075 I | rook-discover: failed to update device configmap: failed initial hardware discovery. failed to list all devices: fork/exec /bin/lsblk: exec format error
2020-08-15T17:06:41.626742884+01:00 failed initial hardware discovery. failed to list all devices: fork/exec /bin/lsblk: exec format error

I am quite sure, that this is due to the edgefs image not being built for arm64 architecture.

Unfortunately it seems the edgefs image repository is not really available anymore under https://github.com/Nexenta/edgefs.

Is edgefs still supported?

@dotmaster
Copy link

Addon: I tried to compile edgefs for Arm64 from a fork I found on github.
See my work on https://github.com/dotmaster/edgefs if you are interested.
I managed to fix some x86 (SSE42) specific dependencies with

  • libisa-h
  • libblake2
  • crc32c.c

but now I am stuck with some assembler code in libccowutil
../../include/ccowutil.h:771:2: error: impossible constraint in 'asm' __asm__ __volatile__("lock cmpxchgl %k2,%1"
From what I read somewhere, this code uses EAX registers not present on Arm64.
I don't know enough about assembler, to know how to get this code run on Arm. Probably someone can give me a hint.

@sabbot
Copy link
Member

sabbot commented Oct 6, 2020

Closing since EdgeFS is deprecated as mentioned in #5823

@sabbot sabbot closed this as completed Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants