Invalid Ceph OSD Configuration for IPv6-only clusters #6266

lyind · 2020-09-16T13:24:36Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

OSD do never come UP because they try to bind and do traffic via IPv4 when no IPv4 addresses are available. There is no error message in any log.

Expected behavior:

rook-ceph-operator manages to correctly configure OSDs on IPv4, IPv6-only and IPv4/IPv6-dualstack clusters.

How to reproduce it (minimal and precise):

Setup an IPv6-only cluster and install rook-ceph via kubernetes/operator.yaml and kubernetes/common.yaml.
Monitor OSD state using ceph status command on the rook-ceph-toolbox pod.

File(s) to submit:

Log from an effected OSD:

...
debug 2020-09-15T13:17:36.154+0000 7f0b9eeae700  1 osd.0 138 tick checking mon for new map
debug 2020-09-15T13:18:07.039+0000 7f0b9eeae700  1 osd.0 138 tick checking mon for new map
debug 2020-09-15T13:18:37.220+0000 7f0b9eeae700  1 osd.0 138 tick checking mon for new map
...<continues forever, OSD never comes up>...

Environment:

OS: Debian Bullseye
Kernel: Linux 5.8.7
Cloud provider or hardware configuration: bare-metal
Rook version: 1.4.4
Storage backend version: ceph v15 (15.2.4-0)
Kubernetes version: 1.19.1
Kubernetes cluster type: kubeadm (custom automation)
Storage backend status:

  cluster:
    id:     5a7cf85c-cd1d-4472-9340-78f3759a2151
    health: HEALTH_WARN
            32 osds down
            8 hosts (32 osds) down
            1 root (32 osds) down
            Reduced data availability: 145 pgs inactive  services:
    mon: 3 daemons, quorum a,b,c (age 71m)
    mgr: a(active, since 71m)
    osd: 32 osds: 0 up, 32 in (since 20s)  data:
    pools:   10 pools, 145 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             145 unknown

Workaround

Edit configmap/configoverride (using kubectl edit configmaps/rook-config-override -n rook-ceph) and add (exclusively for IPv6-only cluster):

apiVersion: v1
data:
config: |
  [global]
  ms_bind_ipv4 = false
  ms_bind_ipv6 = true

Force-restart all OSDs by deleting the pods using kubectl -n rook-ceph delete pod -l app=rook-ceph-osd
Check if OSDs come up repeatedly executing ceph status command in the rook-ceph-toolbox pod.

The workaround is also documented here: Proxmox Wiki Ceph Docs

The text was updated successfully, but these errors were encountered:

lyind · 2020-09-16T13:26:18Z

Duplicate of 3850. Closing.

lyind added the bug label Sep 16, 2020

lyind closed this as completed Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid Ceph OSD Configuration for IPv6-only clusters #6266

Invalid Ceph OSD Configuration for IPv6-only clusters #6266

lyind commented Sep 16, 2020 •

edited

lyind commented Sep 16, 2020

Invalid Ceph OSD Configuration for IPv6-only clusters #6266

Invalid Ceph OSD Configuration for IPv6-only clusters #6266

Comments

lyind commented Sep 16, 2020 • edited

lyind commented Sep 16, 2020

lyind commented Sep 16, 2020 •

edited