Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to upgrade a PVC backed cluster from rook 1.1.2 to either 1.1.4 or 1.1.6 #4299

Closed
paalkr opened this issue Nov 12, 2019 · 13 comments
Closed
Labels

Comments

@paalkr
Copy link

paalkr commented Nov 12, 2019

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
prepare osd jobs/pods fails to complete

Expected behavior:
That the cluster upgrade succeeded.

How to reproduce it (minimal and precise):

Create a pvc backed 1.1.2 cluster, and try to upgrade to any newer version.

Environment:

  • OS (e.g. from /etc/os-release):
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2247.5.0
VERSION_ID=2247.5.0
BUILD_ID=2019-10-14-2340
PRETTY_NAME="Container Linux by CoreOS 2247.5.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
  • Kernel (e.g. uname -a):
    SMP Mon Oct 14 22:56:39 -00 2019 x86_64 AMD EPYC 7571 AuthenticAMD GNU/Linux
  • Cloud provider or hardware configuration:
    AWS
  • Rook version (use rook version inside of a Rook Pod):
    rook: v1.1.2
  • Storage backend version (e.g. for ceph do ceph -v):
    ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

1.1.6 operator config

#################################################################################################################
# The deployment for the rook operator
# Contains the common settings for most Kubernetes deployments.
# For example, to create the rook-ceph cluster:
#   kubectl create -f common.yaml
#   kubectl create -f operator.yaml
#   kubectl create -f cluster.yaml
#
# Also see other operator sample files for variations of operator.yaml:
# - operator-openshift.yaml: Common settings for running in OpenShift
#################################################################################################################
# OLM: BEGIN OPERATOR DEPLOYMENT
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-operator
  namespace: rook-ceph
  labels:
    operator: rook
    storage-backend: ceph
spec:
  selector:
    matchLabels:
      app: rook-ceph-operator
  replicas: 1
  template:
    metadata:
      labels:
        app: rook-ceph-operator
    spec:
      serviceAccountName: rook-ceph-system
      containers:
      - name: rook-ceph-operator
        image: rook/ceph:v1.1.6
        args: ["ceph", "operator"]
        volumeMounts:
        - mountPath: /var/lib/rook
          name: rook-config
        - mountPath: /etc/ceph
          name: default-config-dir
        env:
        # If the operator should only watch for cluster CRDs in the same namespace, set this to "true".
        # If this is not set to true, the operator will watch for cluster CRDs in all namespaces.
        - name: ROOK_CURRENT_NAMESPACE_ONLY
          value: "false"
        # To disable RBAC, uncomment the following:
        # - name: RBAC_ENABLED
        #   value: "false"
        # Rook Agent toleration. Will tolerate all taints with all keys.
        # Choose between NoSchedule, PreferNoSchedule and NoExecute:
        # - name: AGENT_TOLERATION
        #   value: "NoSchedule"
        # (Optional) Rook Agent toleration key. Set this to the key of the taint you want to tolerate
        # - name: AGENT_TOLERATION_KEY
        #   value: "storage"
        # (Optional) Rook Agent tolerations list. Put here list of taints you want to tolerate in YAML format.
        - name: AGENT_TOLERATIONS
          value: |
            - key: kube-aws.coreos.com/role
              operator: Equal
              value: storage
              effect: NoSchedule        
        #   value: |
        #     - effect: NoSchedule
        #       key: node-role.kubernetes.io/controlplane
        #       operator: Exists
        #     - effect: NoExecute
        #       key: node-role.kubernetes.io/etcd
        #       operator: Exists
        # (Optional) Rook Agent NodeAffinity.
        # - name: AGENT_NODE_AFFINITY
        #   value: "role=storage-node; storage=rook,ceph"
        # (Optional) Rook Agent mount security mode. Can by `Any` or `Restricted`.
        # `Any` uses Ceph admin credentials by default/fallback.
        # For using `Restricted` you must have a Ceph secret in each namespace storage should be consumed from and
        # set `mountUser` to the Ceph user, `mountSecret` to the Kubernetes secret name.
        # to the namespace in which the `mountSecret` Kubernetes secret namespace.
        # - name: AGENT_MOUNT_SECURITY_MODE
        #   value: "Any"
        # Set the path where the Rook agent can find the flex volumes
        - name: FLEXVOLUME_DIR_PATH
          value: /var/lib/kubelet/volumeplugins
        # Set the path where kernel modules can be found
        # - name: LIB_MODULES_DIR_PATH
        #   value: "<PathToLibModules>"
        # Mount any extra directories into the agent container
        # - name: AGENT_MOUNTS
        #   value: "somemount=/host/path:/container/path,someothermount=/host/path2:/container/path2"
        # Rook Discover toleration. Will tolerate all taints with all keys.
        # Choose between NoSchedule, PreferNoSchedule and NoExecute:
        # - name: DISCOVER_TOLERATION
        #   value: "NoSchedule"
        # (Optional) Rook Discover toleration key. Set this to the key of the taint you want to tolerate
        # - name: DISCOVER_TOLERATION_KEY
        #   value: "<KeyOfTheTaintToTolerate>"
        # (Optional) Rook Discover tolerations list. Put here list of taints you want to tolerate in YAML format.
        - name: DISCOVER_TOLERATIONS
          value: |
            - key: kube-aws.coreos.com/role
              operator: Equal
              value: storage
              effect: NoSchedule
        #    - effect: NoSchedule
        #      key: node-role.kubernetes.io/controlplane
        #      operator: Exists
        #    - effect: NoExecute
        #      key: node-role.kubernetes.io/etcd
        #      operator: Exists
        # (Optional) Discover Agent NodeAffinity.
        # - name: DISCOVER_AGENT_NODE_AFFINITY
          # value: "kube-aws.coreos.com/role=storage"
        # Allow rook to create multiple file systems. Note: This is considered
        # an experimental feature in Ceph as described at
        # http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
        # which might cause mons to crash as seen in https://github.com/rook/rook/issues/1027
        - name: ROOK_ALLOW_MULTIPLE_FILESYSTEMS
          value: "false"
        # The logging level for the operator: INFO | DEBUG
        - name: ROOK_LOG_LEVEL
          value: "INFO"
        # The interval to check the health of the ceph cluster and update the status in the custom resource.
        - name: ROOK_CEPH_STATUS_CHECK_INTERVAL
          value: "30s"
        # The interval to check if every mon is in the quorum.
        - name: ROOK_MON_HEALTHCHECK_INTERVAL
          value: "30s"
        # The duration to wait before trying to failover or remove/replace the
        # current mon with a new mon (useful for compensating flapping network).
        - name: ROOK_MON_OUT_TIMEOUT
          value: "300s"
        # The duration between discovering devices in the rook-discover daemonset.
        - name: ROOK_DISCOVER_DEVICES_INTERVAL
          value: "4m"
        # Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.
        # This is necessary to workaround the anyuid issues when running on OpenShift.
        # For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641
        - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
          value: "false"
        # In some situations SELinux relabelling breaks (times out) on large filesystems, and doesn't work with cephfs ReadWriteMany volumes (last relabel wins).
        # Disable it here if you have similar issues.
        # For more details see https://github.com/rook/rook/issues/2417
        - name: ROOK_ENABLE_SELINUX_RELABELING
          value: "true"
        # In large volumes it will take some time to chown all the files. Disable it here if you have performance issues.
        # For more details see https://github.com/rook/rook/issues/2254
        - name: ROOK_ENABLE_FSGROUP
          value: "true"
        # Disable automatic orchestration when new devices are discovered
        - name: ROOK_DISABLE_DEVICE_HOTPLUG
          value: "false"

        # Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
        # in favor of the CSI driver.
        - name: ROOK_ENABLE_FLEX_DRIVER
          value: "true"

        # Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
        # This daemon does not need to run if you are only going to create your OSDs based on StorageClassDeviceSets with PVCs.
        - name: ROOK_ENABLE_DISCOVERY_DAEMON
          value: "false"

        # Enable the default version of the CSI driver. To start another version of the CSI driver, see image properties below.
        - name: ROOK_CSI_ENABLE_CEPHFS
          value: "true"
        - name: ROOK_CSI_ENABLE_RBD
          value: "false"
        - name: ROOK_CSI_ENABLE_GRPC_METRICS
          value: "true"
        # The default version of CSI supported by Rook will be started. To change the version
        # of the CSI driver to something other than what is officially supported, change
        # these images to the desired release of the CSI driver.
        #- name: ROOK_CSI_CEPH_IMAGE
        #  value: "quay.io/cephcsi/cephcsi:canary"
        #- name: ROOK_CSI_REGISTRAR_IMAGE
        #  value: "quay.io/k8scsi/csi-node-driver-registrar:v1.1.0"
        #- name: ROOK_CSI_PROVISIONER_IMAGE
        #  value: "quay.io/k8scsi/csi-provisioner:v1.3.0"
        #- name: ROOK_CSI_SNAPSHOTTER_IMAGE
        #  value: "quay.io/k8scsi/csi-snapshotter:v1.2.0"
        #- name: ROOK_CSI_ATTACHER_IMAGE
        #  value: "quay.io/k8scsi/csi-attacher:v1.2.0"

        # The name of the node to pass with the downward API
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        # The pod name to pass with the downward API
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        # The pod namespace to pass with the downward API
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
      - name: rook-config
        emptyDir: {}
      - name: default-config-dir
        emptyDir: {}
# OLM: END OPERATOR DEPLOYMENT

1.1.6 cluster config

apiVersion: v1
kind: ConfigMap
metadata:
  name: rook-config-override
  namespace: rook-ceph
data:
  config: |
    [global]
    mon_pg_warn_min_per_osd = 10
---
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  dataDirHostPath: /var/lib/rook/cluster
  mon:
    count: 3
    allowMultiplePerNode: false
    volumeClaimTemplate:
      spec:
        storageClassName: ebs-gp2
        resources:
          requests:
            storage: 10Gi
  mgr:
    modules:
    - name: pg_autoscaler
      enabled: true           
  placement:
    mon:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kube-aws.coreos.com/role
              operator: In
              values:
              - storage 
      # podAntiAffinity:
        # requiredDuringSchedulingIgnoredDuringExecution:
        # - labelSelector:
            # matchExpressions:
            # - key: app
              # operator: In
              # values:
              # - rook-ceph-mon
          # topologyKey: failure-domain.beta.kubernetes.io/zone              
      tolerations:
        - key: kube-aws.coreos.com/role
          operator: Equal
          value: storage
          effect: NoSchedule 
    mgr:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kube-aws.coreos.com/role
              operator: In
              values:
              - storage 
      tolerations:
        - key: kube-aws.coreos.com/role
          operator: Equal
          value: storage
          effect: NoSchedule               
  cephVersion:
    image: ceph/ceph:v14.2.4-20190917
    allowUnsupported: false
  dashboard:
    enabled: true
    urlPrefix: /ceph-dashboard
    port: 8443
    ssl: false
  monitoring:
    # requires Prometheus to be pre-installed
    enabled: true
    # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
    # Recommended:
    # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
    # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
    # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
    rulesNamespace: rook-ceph    
  network:
    hostNetwork: false
  storage:
    topologyAware: true
    storageClassDeviceSets:
    - name: zone-a
      count: 1
      portable: true
      resources:
        limits:
          cpu: "1000m"
          memory: "4Gi"
        requests:
          cpu: "500m"
          memory: "4Gi"
      placement:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kube-aws.coreos.com/zone
                operator: In
                values:
                - a 
              - key: kube-aws.coreos.com/role
                operator: In
                values:
                - storage   
        tolerations:
        - key: kube-aws.coreos.com/role
          operator: Equal
          value: storage
          effect: NoSchedule                
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 250Gi
          storageClassName: ebs-gp2
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
    - name: zone-b
      count: 1
      portable: true
      resources:
        limits:
          cpu: "1000m"
          memory: "4Gi"
        requests:
          cpu: "500m"
          memory: "4Gi"
      placement:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kube-aws.coreos.com/zone
                operator: In
                values:
                - b
              - key: kube-aws.coreos.com/role
                operator: In
                values:
                - storage  
        tolerations:
        - key: kube-aws.coreos.com/role
          operator: Equal
          value: storage
          effect: NoSchedule                
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 250Gi
          storageClassName: ebs-gp2
          volumeMode: Block
          accessModes:
            - ReadWriteOnce 
    - name: zone-c
      count: 1
      portable: true
      resources:
        limits:
          cpu: "1000m"
          memory: "4Gi"
        requests:
          cpu: "500m"
          memory: "4Gi"
      placement:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kube-aws.coreos.com/zone
                operator: In
                values:
                - c
              - key: kube-aws.coreos.com/role
                operator: In
                values:
                - storage 
        tolerations:
        - key: kube-aws.coreos.com/role
          operator: Equal
          value: storage
          effect: NoSchedule
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 250Gi
          storageClassName: ebs-gp2
          volumeMode: Block
          accessModes:
            - ReadWriteOnce            

errors snippet from operator log

2019-11-12 13:48:33.819163 I | op-osd: start running osds in namespace rook-ceph
2019-11-12 13:48:33.819172 I | op-osd: start provisioning the osds on pvcs, if needed
2019-11-12 13:48:33.822946 I | op-osd: successfully provisioned osd for storageClassDeviceSet zone-a of set 0
2019-11-12 13:48:33.826742 I | op-osd: successfully provisioned osd for storageClassDeviceSet zone-b of set 0
2019-11-12 13:48:33.830342 I | op-osd: successfully provisioned osd for storageClassDeviceSet zone-c of set 0
2019-11-12 13:48:33.852783 I | op-osd: osd provision job started for node zone-a-0-data-b9c8m
2019-11-12 13:48:33.880883 I | op-osd: osd provision job started for node zone-b-0-data-hh445
2019-11-12 13:48:33.904001 I | op-osd: osd provision job started for node zone-c-0-data-8rqc2
2019-11-12 13:48:33.904024 I | op-osd: start osds after provisioning is completed, if needed
2019-11-12 13:48:33.955298 I | op-osd: osd orchestration status for node zone-a-0-data-b9c8m is starting
2019-11-12 13:48:33.955358 I | op-osd: osd orchestration status for node zone-b-0-data-hh445 is starting
2019-11-12 13:48:33.955378 I | op-osd: osd orchestration status for node zone-c-0-data-8rqc2 is starting
2019-11-12 13:48:33.955425 I | op-osd: 0/3 node(s) completed osd provisioning, resource version 62024953
2019-11-12 13:48:37.939390 I | op-osd: osd orchestration status for node zone-c-0-data-8rqc2 is computingDiff
2019-11-12 13:48:37.958833 I | op-osd: osd orchestration status for node zone-c-0-data-8rqc2 is orchestrating
2019-11-12 13:48:37.998817 I | op-osd: osd orchestration status for node zone-c-0-data-8rqc2 is failed
2019-11-12 13:48:37.998871 E | op-osd: orchestration for node zone-c-0-data-8rqc2 failed: &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-c-0-data-8rqc2"}
2019-11-12 13:48:38.193531 I | op-osd: osd orchestration status for node zone-a-0-data-b9c8m is computingDiff
2019-11-12 13:48:38.209179 I | op-osd: osd orchestration status for node zone-b-0-data-hh445 is computingDiff
2019-11-12 13:48:38.216082 I | op-osd: osd orchestration status for node zone-a-0-data-b9c8m is orchestrating
2019-11-12 13:48:38.254082 I | op-osd: osd orchestration status for node zone-a-0-data-b9c8m is failed
2019-11-12 13:48:38.254120 E | op-osd: orchestration for node zone-a-0-data-b9c8m failed: &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-a-0-data-b9c8m"}
2019-11-12 13:48:38.262982 I | op-osd: osd orchestration status for node zone-b-0-data-hh445 is orchestrating
2019-11-12 13:48:38.357399 I | op-osd: osd orchestration status for node zone-b-0-data-hh445 is failed
2019-11-12 13:48:38.357608 E | op-osd: orchestration for node zone-b-0-data-hh445 failed: &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-b-0-data-hh445"}
2019-11-12 13:48:38.357636 I | op-osd: 3/3 node(s) completed osd provisioning
2019-11-12 13:48:38.357717 I | op-osd: start provisioning the osds on nodes, if needed
2019-11-12 13:48:38.365121 I | op-osd: 0 of the 0 storage nodes are valid
2019-11-12 13:48:38.365177 W | op-osd: no valid nodes available to run an osd in namespace rook-ceph. Rook will not create any new OSD nodes and will skip checking for removed nodes since removing all OSD nodes without destroying the Rook cluster is unlikely to be intentional
2019-11-12 13:48:38.365209 E | op-cluster: failed to create cluster in namespace rook-ceph. failed to start the osds. 3 failures encountered while running osds in namespace rook-ceph: orchestration for node zone-c-0-data-8rqc2 failed: &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-c-0-data-8rqc2"}
orchestration for node zone-a-0-data-b9c8m failed: &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-a-0-data-b9c8m"}
orchestration for node zone-b-0-data-hh445 failed: &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-b-0-data-hh445"}

error snippet from a rook-ceph-osd-prepare pod

2019-11-12 13:49:19.059360 I | cephcmd: desired devices to configure osds: [{Name:/mnt/zone-b-0-data-hh445 OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false}]
2019-11-12 13:49:19.070724 I | rookcmd: starting Rook v1.1.6 with arguments '/rook/rook ceph osd provision'
2019-11-12 13:49:19.070851 I | rookcmd: flag values: --cluster-id=2f7be241-e0fd-11e9-9eae-065f969702c8, --data-device-filter=, --data-devices=/mnt/zone-b-0-data-hh445, --data-directories=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=INFO, --metadata-device=, --node-name=zone-b-0-data-hh445, --operator-image=, --osd-database-size=0, --osd-journal-size=5120, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=true, --service-account=, --topology-aware=true
2019-11-12 13:49:19.070972 I | op-mon: parsing mon endpoints: b=10.96.176.95:6789,c=10.96.10.84:6789,a=10.96.54.209:6789
2019-11-12 13:49:19.090145 I | cephcmd: CRUSH location=root=default host=zone-b-0-data-hh445 zone=eu-west-1b region=eu-west-1
2019-11-12 13:49:19.090169 I | cephcmd: crush location of osd: root=default host=zone-b-0-data-hh445 zone=eu-west-1b region=eu-west-1
2019-11-12 13:49:19.103712 I | cephconfig: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2019-11-12 13:49:19.103839 I | cephconfig: generated admin config in /var/lib/rook/rook-ceph
2019-11-12 13:49:19.103950 I | cephosd: discovering hardware
2019-11-12 13:49:19.103964 I | exec: Running command: lsblk /mnt/zone-b-0-data-hh445 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2019-11-12 13:49:19.107161 I | exec: Running command: sgdisk --print /mnt/zone-b-0-data-hh445
2019-11-12 13:49:19.109849 I | cephosd: creating and starting the osds
2019-11-12 13:49:19.109878 I | exec: Running command: lsblk /mnt/zone-b-0-data-hh445 --bytes --pairs --output NAME,SIZE,TYPE,PKNAME
2019-11-12 13:49:19.112156 I | sys: Output: NAME="nvme1n1" SIZE="268435456000" TYPE="disk" PKNAME=""
2019-11-12 13:49:19.112191 I | exec: Running command: lsblk /mnt/zone-b-0-data-hh445 --bytes --nodeps --noheadings --output FSTYPE
2019-11-12 13:49:19.114390 I | cephosd: skipping device /mnt/zone-b-0-data-hh445 that is in use (not by rook). fs: LVM2_member, ownPartitions: true
2019-11-12 13:49:19.123712 I | cephosd: configuring osd devices: {"Entries":{}}
2019-11-12 13:49:19.123742 I | cephosd: no more devices to configure
2019-11-12 13:49:19.123753 I | exec: Running command: pvdisplay -C -o lvpath --noheadings /mnt/zone-b-0-data-hh445
failed to configure devices. failed to get logical volume path. no logical volume path found for device "/mnt/zone-b-0-data-hh445"
@paalkr paalkr added the bug label Nov 12, 2019
@sp98
Copy link
Contributor

sp98 commented Nov 12, 2019

I believe this issue was fixed in #4277

@paalkr
Copy link
Author

paalkr commented Nov 12, 2019

I tired to get help regarding this issue on slack, here is the thread
https://app.slack.com/client/T47C56S7Q/C46Q5UC05/thread/C46Q5UC05-1573548852.186400

@paalkr
Copy link
Author

paalkr commented Nov 12, 2019

@sp98 thx for the info, but it looks like the problem still persists.

@paalkr
Copy link
Author

paalkr commented Nov 12, 2019

@sp98 , that fix is not included in 1.1.4 or 1.1.6, right? So if I try master, upgrade should work? Is it generally safe to run rook out of master in production?

@sp98
Copy link
Contributor

sp98 commented Nov 12, 2019

@paalkr I can see that the fix was included in Rook v1.1.6 patch release (https://github.com/rook/rook/releases). Its not advisable to use rook out of master in production

@paalkr
Copy link
Author

paalkr commented Nov 12, 2019

@sp98 , upgrading to 1.1.6 still does not work. All logs provided in the OP are from upgrading to 1.1.6. So the issue is still present.

@sp98
Copy link
Contributor

sp98 commented Nov 12, 2019

@paalkr The PR is still not in 1.1.6. Sorry for the confusion. I was looking into a different PR.

@travisn
Copy link
Member

travisn commented Nov 12, 2019

@paalkr The fix is included in the latest release build, but it just isn't officially released yet. The upgrades are expected to work with these interim release builds as well. The tag is rook/ceph:v1.1.6-14.ge69e952 if you want to test it. The v1.1.7 release may not officially be for another week or two.

@paalkr
Copy link
Author

paalkr commented Nov 12, 2019

@travisn , thanks. I can test rook/ceph:v1.1.6-14.ge69e952 and report back. Will it be possible to upgrade from rook/ceph:v1.1.6-14.ge69e952 to 1.1.7?

@sp98 , I appreciate your help, thanks!

@paalkr
Copy link
Author

paalkr commented Nov 12, 2019

Upgrading to rook/ceph:v1.1.6-14.ge69e952 worked!

@travisn
Copy link
Member

travisn commented Nov 12, 2019

@paalkr Great to hear it's working for you. Yes, it's expected that it will upgrade to v1.1.7

@travisn
Copy link
Member

travisn commented Nov 22, 2019

Fixed in v1.1.7

@travisn travisn closed this as completed Nov 22, 2019
@paalkr
Copy link
Author

paalkr commented Nov 22, 2019

I can confirm that it works in 1.1.7. Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants