Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Ceph OSD Configuration for IPv6-only clusters #6266

Closed
lyind opened this issue Sep 16, 2020 · 1 comment
Closed

Invalid Ceph OSD Configuration for IPv6-only clusters #6266

lyind opened this issue Sep 16, 2020 · 1 comment
Labels

Comments

@lyind
Copy link
Contributor

lyind commented Sep 16, 2020

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

OSD do never come UP because they try to bind and do traffic via IPv4 when no IPv4 addresses are available. There is no error message in any log.

Expected behavior:

rook-ceph-operator manages to correctly configure OSDs on IPv4, IPv6-only and IPv4/IPv6-dualstack clusters.

How to reproduce it (minimal and precise):

  1. Setup an IPv6-only cluster and install rook-ceph via kubernetes/operator.yaml and kubernetes/common.yaml.
  2. Monitor OSD state using ceph status command on the rook-ceph-toolbox pod.

File(s) to submit:

  • Log from an effected OSD:
...
debug 2020-09-15T13:17:36.154+0000 7f0b9eeae700  1 osd.0 138 tick checking mon for new map
debug 2020-09-15T13:18:07.039+0000 7f0b9eeae700  1 osd.0 138 tick checking mon for new map
debug 2020-09-15T13:18:37.220+0000 7f0b9eeae700  1 osd.0 138 tick checking mon for new map
...<continues forever, OSD never comes up>...

Environment:

  • OS: Debian Bullseye
  • Kernel: Linux 5.8.7
  • Cloud provider or hardware configuration: bare-metal
  • Rook version: 1.4.4
  • Storage backend version: ceph v15 (15.2.4-0)
  • Kubernetes version: 1.19.1
  • Kubernetes cluster type: kubeadm (custom automation)
  • Storage backend status:
  cluster:
    id:     5a7cf85c-cd1d-4472-9340-78f3759a2151
    health: HEALTH_WARN
            32 osds down
            8 hosts (32 osds) down
            1 root (32 osds) down
            Reduced data availability: 145 pgs inactive  services:
    mon: 3 daemons, quorum a,b,c (age 71m)
    mgr: a(active, since 71m)
    osd: 32 osds: 0 up, 32 in (since 20s)  data:
    pools:   10 pools, 145 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             145 unknown

Workaround

  1. Edit configmap/configoverride (using kubectl edit configmaps/rook-config-override -n rook-ceph) and add (exclusively for IPv6-only cluster):
apiVersion: v1
data:
config: |
  [global]
  ms_bind_ipv4 = false
  ms_bind_ipv6 = true
  1. Force-restart all OSDs by deleting the pods using kubectl -n rook-ceph delete pod -l app=rook-ceph-osd
  2. Check if OSDs come up repeatedly executing ceph status command in the rook-ceph-toolbox pod.

The workaround is also documented here: Proxmox Wiki Ceph Docs

@lyind lyind added the bug label Sep 16, 2020
@lyind
Copy link
Contributor Author

lyind commented Sep 16, 2020

Duplicate of 3850. Closing.

@lyind lyind closed this as completed Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant