Migration failure ended into data corruption #82

shubham14bajpai · 2021-01-12T09:16:30Z

Describe the bug: Migration failure ended into data corruption.

Expected behaviour: Even if the migration failed the pool should have been renamed and data should not be corrupted.

Steps to reproduce the bug:

Created a SPC in 1.7.0 then upgraded it to 2.4.0

mayadata:upgrade$ kubectl get spc,csp
NAME                                     AGE
storagepoolclaim.openebs.io/cstor-pool   82m

NAME                                   ALLOCATED   FREE    CAPACITY   STATUS    READONLY   TYPE      AGE
cstorpool.openebs.io/cstor-pool-3w1a   334K        39.7G   39.8G      Healthy   false      striped   82m


Started migration and made it fail just after the CSPI became online

mayadata:upgrade$ k get cspc,cspi
NAME                                           HEALTHYINSTANCES   PROVISIONEDINSTANCES   DESIREDINSTANCES   AGE
cstorpoolcluster.cstor.openebs.io/cstor-pool   1                  1                      1                  40m

NAME                                                 HOSTNAME    FREE     CAPACITY    READONLY   PROVISIONEDREPLICAS   HEALTHYREPLICAS   STATUS   AGE
cstorpoolinstance.cstor.openebs.io/cstor-pool-972g   127.0.0.1   38500M   38500378k   false      1                     0                 ONLINE   40m

Checked the zpool status on CSPI
 
mayadata:upgrade$ kubectl -n openebs exec -it cstor-pool-972g-7f4cfdd794-z598d -c cstor-pool-mgmt -- bash
root@cstor-pool-972g-7f4cfdd794-z598d:/# zpool status
  pool: cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774
 state: ONLINE
  scan: none requested
config:

	NAME                                        STATE     READ WRITE CKSUM
	cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774  ONLINE       0     0     0
	  /var/openebs/sparse/3-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/0-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/1-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/2-ndm-sparse.img      ONLINE       0     0     0

errors: No known data errors

Then scaled up the old CSP deploy and checked the pool status (unexpected behaviour)

mayadata:openebs$ kubectl -n openebs exec -it cstor-pool-3w1a-56695f78b7-x957h -c cstor-pool-mgmt -- bash
root@cstor-pool-3w1a-56695f78b7-x957h:/# zpool status
  pool: cstor-76aad699-4e5f-4bd5-9a1b-16008d0d5c54
 state: ONLINE
  scan: none requested
config:

	NAME                                        STATE     READ WRITE CKSUM
	cstor-76aad699-4e5f-4bd5-9a1b-16008d0d5c54  ONLINE       0     0     0
	  /var/openebs/sparse/3-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/0-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/1-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/2-ndm-sparse.img      ONLINE       0     0     0

errors: No known data errors

Was able to still write data using the application pod

Restarted the CSPI pod and pool got imported but the status gives an error

mayadata:migrate$ k logs -f cstor-pool-972g-7f4cfdd794-2g8l2 -c cstor-pool-mgmt
+ rm /usr/local/bin/zrepl
+ pool_manager_pid=7
+ /usr/local/bin/pool-manager start
+ trap _sigint INT
+ trap _sigterm SIGTERM
+ wait 7
E0112 10:35:27.740140       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:27.740345       7 pool.go:123] Waiting for pool container to start...
E0112 10:35:30.751974       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:30.752010       7 pool.go:123] Waiting for pool container to start...
E0112 10:35:33.755995       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:33.756057       7 pool.go:123] Waiting for pool container to start...
E0112 10:35:36.770793       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:36.770879       7 pool.go:123] Waiting for pool container to start...
E0112 10:35:39.783352       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:39.783374       7 pool.go:123] Waiting for pool container to start...
E0112 10:35:42.787035       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:42.787113       7 pool.go:123] Waiting for pool container to start...
E0112 10:35:45.797694       7 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 10:35:45.797771       7 pool.go:123] Waiting for pool container to start...
I0112 10:35:45.809247       7 controller.go:109] Setting up event handlers for CSPI
I0112 10:35:45.809704       7 controller.go:115] will set up informer event handlers for cvr
I0112 10:35:45.810120       7 new_restore_controller.go:105] Setting up event handlers for restore
I0112 10:35:45.886391       7 controller.go:110] Setting up event handlers for backup
I0112 10:35:45.893357       7 runner.go:38] Starting CStorPoolInstance controller
I0112 10:35:45.893409       7 runner.go:41] Waiting for informer caches to sync
I0112 10:35:45.909280       7 common.go:262] CStorPool found: [cannot open 'name': no such pool ]
I0112 10:35:45.909483       7 run_restore_controller.go:38] Starting CStorRestore controller
I0112 10:35:45.909525       7 run_restore_controller.go:41] Waiting for informer caches to sync
I0112 10:35:45.909556       7 run_restore_controller.go:53] Started CStorRestore workers
I0112 10:35:45.909674       7 runner.go:39] Starting CStorVolumeReplica controller
I0112 10:35:45.909706       7 runner.go:42] Waiting for informer caches to sync
I0112 10:35:45.909727       7 runner.go:47] Starting CStorVolumeReplica workers
I0112 10:35:45.909749       7 runner.go:54] Started CStorVolumeReplica workers
I0112 10:35:45.909893       7 runner.go:38] Starting CStorBackup controller
I0112 10:35:45.909926       7 runner.go:41] Waiting for informer caches to sync
I0112 10:35:45.993629       7 runner.go:45] Starting CStorPoolInstance workers
I0112 10:35:45.993667       7 runner.go:51] Started CStorPoolInstance workers
I0112 10:35:46.010362       7 runner.go:53] Started CStorBackup workers
I0112 10:35:46.017415       7 import.go:73] Importing pool 764d0038-cb8d-4b34-8ef8-5fb1efa80081 cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774
I0112 10:35:51.166697       7 event.go:281] Event(v1.ObjectReference{Kind:"CStorPoolInstance", Namespace:"openebs", Name:"cstor-pool-972g", UID:"764d0038-cb8d-4b34-8ef8-5fb1efa80081", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"9230", FieldPath:""}): type: 'Normal' reason: 'Pool Imported' Pool Import successful: cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774
^C
mayadata:migrate$ k exec -it cstor-pool-972g-7f4cfdd794-2g8l2 -c cstor-pool-mgmt -- bash
root@cstor-pool-972g-7f4cfdd794-2g8l2:/# zpool status
  pool: cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: none requested
config:

	NAME                                        STATE     READ WRITE CKSUM
	cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774  ONLINE       0     0     7
	  /var/openebs/sparse/3-ndm-sparse.img      ONLINE       0     0    14
	  /var/openebs/sparse/0-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/1-ndm-sparse.img      ONLINE       0     0     0
	  /var/openebs/sparse/2-ndm-sparse.img      ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list 

Still able to write the data using the pod.

Restarted the CSP pool pod the import failed (expected behaviour)

mayadata:upgrade$ k logs -f cstor-pool-3w1a-56695f78b7-nb2zp -c cstor-pool-mgmt
+ rm /usr/local/bin/zrepl
+ exec /usr/local/bin/cstor-pool-mgmt start
E0112 11:09:32.888080       7 pool.go:501] zpool status returned error in zrepl startup : exit status 1
I0112 11:09:32.888334       7 pool.go:502] Waiting for zpool replication container to start...
E0112 11:09:35.896036       7 pool.go:501] zpool status returned error in zrepl startup : exit status 1
I0112 11:09:35.896298       7 pool.go:502] Waiting for zpool replication container to start...
E0112 11:09:38.903751       7 pool.go:501] zpool status returned error in zrepl startup : exit status 1
I0112 11:09:38.903805       7 pool.go:502] Waiting for zpool replication container to start...
E0112 11:09:41.912888       7 pool.go:501] zpool status returned error in zrepl startup : exit status 1
I0112 11:09:41.912968       7 pool.go:502] Waiting for zpool replication container to start...
E0112 11:09:44.920051       7 pool.go:501] zpool status returned error in zrepl startup : exit status 1
I0112 11:09:44.920155       7 pool.go:502] Waiting for zpool replication container to start...
E0112 11:09:47.928038       7 pool.go:501] zpool status returned error in zrepl startup : exit status 1
I0112 11:09:47.928138       7 pool.go:502] Waiting for zpool replication container to start...
I0112 11:09:47.983445       7 common.go:218] CStorPool CRD found
I0112 11:09:47.987162       7 common.go:236] CStorVolumeReplica CRD found
I0112 11:09:47.987794       7 new_pool_controller.go:103] Setting up event handlers
I0112 11:09:47.988014       7 new_replica_controller.go:118] will set up informer event handlers for cvr
I0112 11:09:47.988181       7 new_backup_controller.go:104] Setting up event handlers for backup
I0112 11:09:47.990730       7 new_restore_controller.go:103] Setting up event handlers for restore
I0112 11:09:47.993062       7 run_pool_controller.go:43] Starting CStorPool controller
I0112 11:09:47.993095       7 run_pool_controller.go:46] Waiting for informer caches to sync
I0112 11:09:47.996167       7 new_pool_controller.go:125] cStorPool Added event : cstor-pool-3w1a, 76aad699-4e5f-4bd5-9a1b-16008d0d5c54
I0112 11:09:47.997357       7 event.go:281] Event(v1.ObjectReference{Kind:"CStorPool", Namespace:"", Name:"cstor-pool-3w1a", UID:"76aad699-4e5f-4bd5-9a1b-16008d0d5c54", APIVersion:"openebs.io/v1alpha1", ResourceVersion:"13474", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
W0112 11:09:47.997459       7 common.go:271] CStorPool not found. Retrying after 5s, err: <nil>
I0112 11:09:47.997871       7 handler.go:598] cVR 'pvc-cb2f311d-b114-4927-bf1b-ab30738a270d-cstor-pool-3w1a': uid '6109daf9-a239-4049-b255-1aaf9671a7e0': phase 'Healthy': is_empty_status: false
I0112 11:09:47.998211       7 event.go:281] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-cb2f311d-b114-4927-bf1b-ab30738a270d-cstor-pool-3w1a", UID:"6109daf9-a239-4049-b255-1aaf9671a7e0", APIVersion:"openebs.io/v1alpha1", ResourceVersion:"13475", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0112 11:09:48.093300       7 run_pool_controller.go:50] Starting CStorPool workers
I0112 11:09:48.093360       7 run_pool_controller.go:56] Started CStorPool workers
I0112 11:09:48.236208       7 new_pool_controller.go:167] cStorPool Modify event : cstor-pool-3w1a, 76aad699-4e5f-4bd5-9a1b-16008d0d5c54
I0112 11:09:48.237655       7 event.go:281] Event(v1.ObjectReference{Kind:"CStorPool", Namespace:"", Name:"cstor-pool-3w1a", UID:"76aad699-4e5f-4bd5-9a1b-16008d0d5c54", APIVersion:"openebs.io/v1alpha1", ResourceVersion:"13490", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource modify event
E0112 11:09:48.574618       7 run_pool_controller.go:117] error syncing 'cstor-pool-3w1a': expected csp object but got 
cstorpool {null
}
W0112 11:09:53.005226       7 common.go:271] CStorPool not found. Retrying after 5s, err: <nil>
W0112 11:09:58.013215       7 common.go:271] CStorPool not found. Retrying after 5s, err: <nil>
W0112 11:10:03.021787       7 common.go:271] CStorPool not found. Retrying after 5s, err: <nil>
^C
mayadata:upgrade$ k exec -it cstor-pool-3w1a-56695f78b7-nb2zp -- bash
Defaulting container name to cstor-pool.
Use 'kubectl describe pod/cstor-pool-3w1a-56695f78b7-nb2zp -n openebs' to see all of the containers in this pod.
root@cstor-pool-3w1a-56695f78b7-nb2zp:/# zpool status
no pools available
root@cstor-pool-3w1a-56695f78b7-nb2zp:/# zpool import
2021-01-12/11:10:45.346 Iterating over all the devices to find zfs devices using blkid
2021-01-12/11:10:45.377 Iterated over cache devices to find zfs devices
no pools available to import
root@cstor-pool-3w1a-56695f78b7-nb2zp:/# 


Again restarted the CSPI pool pod and ended up in the user issue

mayadata:upgrade$ k logs -f cstor-pool-972g-7f4cfdd794-f2lsm -c cstor-pool-mgmt
+ rm /usr/local/bin/zrepl
+ pool_manager_pid=8
+ trap _sigint INT
+ /usr/local/bin/pool-manager start
+ trap _sigterm SIGTERM
+ wait 8
E0112 11:13:02.634184       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:02.634240       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:05.637713       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:05.637805       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:08.653611       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:08.653714       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:11.668001       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:11.668128       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:14.680239       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:14.680294       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:17.690164       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:17.690218       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:20.702640       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:20.702696       8 pool.go:123] Waiting for pool container to start...
E0112 11:13:23.717248       8 pool.go:122] zpool status returned error in zrepl startup : exit status 1
I0112 11:13:23.717277       8 pool.go:123] Waiting for pool container to start...
I0112 11:13:23.723416       8 controller.go:109] Setting up event handlers for CSPI
I0112 11:13:23.723781       8 controller.go:115] will set up informer event handlers for cvr
I0112 11:13:23.724125       8 new_restore_controller.go:105] Setting up event handlers for restore
I0112 11:13:23.733100       8 controller.go:110] Setting up event handlers for backup
I0112 11:13:23.737086       8 runner.go:38] Starting CStorPoolInstance controller
I0112 11:13:23.737111       8 runner.go:41] Waiting for informer caches to sync
I0112 11:13:23.743502       8 common.go:262] CStorPool found: [cannot open 'name': no such pool ]
I0112 11:13:23.743575       8 run_restore_controller.go:38] Starting CStorRestore controller
I0112 11:13:23.743584       8 run_restore_controller.go:41] Waiting for informer caches to sync
I0112 11:13:23.743595       8 run_restore_controller.go:53] Started CStorRestore workers
I0112 11:13:23.743643       8 runner.go:39] Starting CStorVolumeReplica controller
I0112 11:13:23.743655       8 runner.go:42] Waiting for informer caches to sync
I0112 11:13:23.743662       8 runner.go:47] Starting CStorVolumeReplica workers
I0112 11:13:23.743670       8 runner.go:54] Started CStorVolumeReplica workers
I0112 11:13:23.743719       8 runner.go:38] Starting CStorBackup controller
I0112 11:13:23.743732       8 runner.go:41] Waiting for informer caches to sync
I0112 11:13:23.743742       8 runner.go:53] Started CStorBackup workers
I0112 11:13:23.837328       8 runner.go:45] Starting CStorPoolInstance workers
I0112 11:13:23.837409       8 runner.go:51] Started CStorPoolInstance workers
I0112 11:13:23.891344       8 import.go:73] Importing pool 764d0038-cb8d-4b34-8ef8-5fb1efa80081 cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774
E0112 11:13:24.039603       8 import.go:94] Failed to import pool by reading cache file: cannot import 'cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774': I/O error
	Recovery is possible, but will result in some data loss.
	Returning the pool to its state as of Tue Jan 12 11:13:10 2021
	should correct the problem.  Approximately 5 seconds of data
	must be discarded, irreversibly.  Recovery can be attempted
	by executing 'zpool import -F cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774'.  A scrub of the pool
	is strongly recommended after recovery.
 : exit status 1
E0112 11:13:25.375807       8 import.go:114] Failed to import pool by scanning directory: 2021-01-12/11:13:24.042 Verifying pool existence on the device /var/openebs/sparse/0-ndm-sparse.img
2021-01-12/11:13:24.042 Verifying pool existence on the device /var/openebs/sparse/3-ndm-sparse.img
2021-01-12/11:13:24.042 Verifying pool existence on the device /var/openebs/sparse/4-ndm-sparse.img
2021-01-12/11:13:24.043 Verifying pool existence on the device /var/openebs/sparse/2-ndm-sparse.img
2021-01-12/11:13:24.043 Skipping /var/openebs/sparse/4-ndm-sparse.img device due to no labels on device
2021-01-12/11:13:24.043 Verifying pool existence on the device /var/openebs/sparse/shared-cstor-pool
2021-01-12/11:13:24.043 ERROR Skipping /var/openebs/sparse/shared-cstor-pool device due to failure in read stats or it is not a regular file/block device
2021-01-12/11:13:24.042 Verifying pool existence on the device /var/openebs/sparse/1-ndm-sparse.img
2021-01-12/11:13:25.069 Verified the device /var/openebs/sparse/1-ndm-sparse.img for pool existence
2021-01-12/11:13:25.081 Verified the device /var/openebs/sparse/3-ndm-sparse.img for pool existence
2021-01-12/11:13:25.092 Verified the device /var/openebs/sparse/2-ndm-sparse.img for pool existence
2021-01-12/11:13:25.107 Verified the device /var/openebs/sparse/0-ndm-sparse.img for pool existence
cannot import 'cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774': I/O error
	Recovery is possible, but will result in some data loss.
	Returning the pool to its state as of Tue Jan 12 11:13:10 2021
	should correct the problem.  Approximately 5 seconds of data
	must be discarded, irreversibly.  Recovery can be attempted
	by executing 'zpool import -F cstor-7d9da0d6-904b-4310-8d90-3da1aacf4774'.  A scrub of the pool
	is strongly recommended after recovery.
 : exit status 1

The user had hundreds of restarts on his pods and his node went down a couple of times.

The suspected reason for the lock not working is that the path is not same for csp and cspi deployments

mayadata:upgrade$ k get deploy cstor-pool-3w1a cstor-pool-972g -oyaml
apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
      deployment.kubernetes.io/revision: "2"
      openebs.io/monitoring: pool_exporter_prometheus
    creationTimestamp: "2021-01-12T09:33:27Z"
    generation: 4
    labels:
      app: cstor-pool
      openebs.io/cas-template-name: cstor-pool-create-default-1.7.0
      openebs.io/cstor-pool: cstor-pool-3w1a
      openebs.io/storage-pool-claim: cstor-pool
      openebs.io/version: 2.4.0
      manager: kube-controller-manager
      operation: Update
      time: "2021-01-12T11:17:10Z"
    name: cstor-pool-3w1a
    namespace: openebs
    ownerReferences:
    - apiVersion: openebs.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: CStorPool
      name: cstor-pool-3w1a
      uid: 76aad699-4e5f-4bd5-9a1b-16008d0d5c54
    resourceVersion: "14526"
    selfLink: /apis/apps/v1/namespaces/openebs/deployments/cstor-pool-3w1a
    uid: d4a15972-8ede-4f07-abd8-fc51e9061f19
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: cstor-pool
    strategy:
      type: Recreate
    template:
      metadata:
        annotations:
          cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
          openebs.io/monitoring: pool_exporter_prometheus
          prometheus.io/path: /metrics
          prometheus.io/port: "9500"
          prometheus.io/scrape: "true"
        creationTimestamp: null
        labels:
          app: cstor-pool
          openebs.io/cstor-pool: cstor-pool-3w1a
          openebs.io/storage-pool-claim: cstor-pool
          openebs.io/version: 2.4.0
      spec:
        containers:
        - env:
          - name: OPENEBS_IO_CSTOR_ID
            value: 76aad699-4e5f-4bd5-9a1b-16008d0d5c54
          image: quay.io/openebs/cstor-pool:2.4.0
          imagePullPolicy: IfNotPresent
          lifecycle:
            postStart:
              exec:
                command:
                - /bin/sh
                - -c
                - sleep 2
          livenessProbe:
            exec:
              command:
              - /bin/sh
              - -c
              - timeout 120 zfs set io.openebs:livenesstimestamp="$(date +%s)" cstor-$OPENEBS_IO_CSTOR_ID
            failureThreshold: 3
            initialDelaySeconds: 300
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 150
          name: cstor-pool
          ports:
          - containerPort: 12000
            protocol: TCP
          - containerPort: 3233
            protocol: TCP
          - containerPort: 3232
            protocol: TCP
          resources:
            limits:
              memory: 4Gi
            requests:
              memory: 2Gi
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dev
            name: device
          - mountPath: /var/openebs/cstor-pool
            name: storagepath
          - mountPath: /tmp
            name: tmp
          - mountPath: /var/openebs/sparse
            name: sparse
          - mountPath: /run/udev
            name: udev
          - mountPath: /var/tmp/sock
            name: sockfile
        - env:
          - name: OPENEBS_IO_CSTOR_ID
            value: 76aad699-4e5f-4bd5-9a1b-16008d0d5c54
          - name: POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: RESYNC_INTERVAL
            value: "30"
          image: quay.io/openebs/cstor-pool-mgmt:2.4.0
          imagePullPolicy: IfNotPresent
          name: cstor-pool-mgmt
          resources: {}
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dev
            name: device
          - mountPath: /tmp
            name: tmp
          - mountPath: /var/openebs/cstor-pool
            name: storagepath
          - mountPath: /var/openebs/sparse
            name: sparse
          - mountPath: /run/udev
            name: udev
          - mountPath: /var/tmp/sock
            name: sockfile
        - args:
          - -e=pool
          command:
          - maya-exporter
          image: quay.io/openebs/m-exporter:2.4.0
          imagePullPolicy: IfNotPresent
          name: maya-exporter
          ports:
          - containerPort: 9500
            protocol: TCP
          resources: {}
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dev
            name: device
          - mountPath: /tmp
            name: tmp
          - mountPath: /var/openebs/cstor-pool
            name: storagepath
          - mountPath: /var/openebs/sparse
            name: sparse
          - mountPath: /run/udev
            name: udev
          - mountPath: /var/tmp/sock
            name: sockfile
        dnsPolicy: ClusterFirst
        nodeSelector:
          kubernetes.io/hostname: 127.0.0.1
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: openebs-maya-operator
        serviceAccountName: openebs-maya-operator
        terminationGracePeriodSeconds: 30
        volumes:
        - hostPath:
            path: /dev
            type: Directory
          name: device
        - hostPath:
            path: /var/openebs/cstor-pool/cstor-pool
            type: DirectoryOrCreate
          name: storagepath
        - emptyDir: {}
          name: sockfile
        - hostPath:
            path: /var/openebs/sparse/shared-cstor-pool
            type: DirectoryOrCreate
          name: tmp
        - hostPath:
            path: /var/openebs/sparse
            type: DirectoryOrCreate
          name: sparse
        - hostPath:
            path: /run/udev
            type: Directory
          name: udev
  status:
    conditions:
    - lastTransitionTime: "2021-01-12T09:33:28Z"
      lastUpdateTime: "2021-01-12T09:50:42Z"
      message: ReplicaSet "cstor-pool-3w1a-56695f78b7" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
    - lastTransitionTime: "2021-01-12T11:17:10Z"
      lastUpdateTime: "2021-01-12T11:17:10Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    observedGeneration: 4
    replicas: 1
    unavailableReplicas: 1
    updatedReplicas: 1
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      openebs.io/monitoring: pool_exporter_prometheus
    creationTimestamp: "2021-01-12T10:16:15Z"
    generation: 1
    labels:
      app: cstor-pool
      openebs.io/cstor-pool-cluster: cstor-pool
      openebs.io/cstor-pool-instance: cstor-pool-972g
      openebs.io/version: 2.4.0
    manager: kube-controller-manager
      operation: Update
      time: "2021-01-12T11:13:07Z"
    name: cstor-pool-972g
    namespace: openebs
    ownerReferences:
    - apiVersion: cstor.openebs.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: CStorPoolInstance
      name: cstor-pool-972g
      uid: 764d0038-cb8d-4b34-8ef8-5fb1efa80081
    resourceVersion: "13982"
    selfLink: /apis/apps/v1/namespaces/openebs/deployments/cstor-pool-972g
    uid: d0c27829-288b-4300-9301-b5da91841ebc
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app: cstor-pool
    strategy:
      type: Recreate
    template:
      metadata:
        annotations:
          cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
          openebs.io/monitoring: pool_exporter_prometheus
          prometheus.io/path: /metrics
          prometheus.io/port: "9500"
          prometheus.io/scrape: "true"
        creationTimestamp: null
        labels:
          app: cstor-pool
          openebs.io/cstor-pool-cluster: cstor-pool
          openebs.io/cstor-pool-instance: cstor-pool-972g
          openebs.io/version: 2.4.0
      spec:
        containers:
        - env:
          - name: OPENEBS_IO_CSPI_ID
            value: 764d0038-cb8d-4b34-8ef8-5fb1efa80081
          - name: RESYNC_INTERVAL
            value: "30"
          - name: POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: OPENEBS_IO_POOL_NAME
            value: 7d9da0d6-904b-4310-8d90-3da1aacf4774
          image: openebs/cstor-pool-manager:2.4.0
          imagePullPolicy: IfNotPresent
          name: cstor-pool-mgmt
          resources: {}
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dev
            name: device
          - mountPath: /tmp
            name: tmp
          - mountPath: /run/udev
            name: udev
          - mountPath: /var/openebs/cstor-pool
            name: storagepath
          - mountPath: /var/tmp/sock
            name: sockfile
          - mountPath: /var/openebs/sparse
            name: sparse
        - env:
          - name: OPENEBS_IO_CSTOR_ID
            value: 764d0038-cb8d-4b34-8ef8-5fb1efa80081
          - name: OPENEBS_IO_POOL_NAME
            value: 7d9da0d6-904b-4310-8d90-3da1aacf4774
          image: openebs/cstor-pool:2.4.0
          imagePullPolicy: IfNotPresent
          lifecycle:
            postStart:
              exec:
                command:
                - /bin/sh
                - -c
                - sleep 2
          livenessProbe:
            exec:
              command:
              - /bin/sh
              - -c
              - timeout 120 zfs set io.openebs:livenesstimestamp="$(date +%s)" cstor-$OPENEBS_IO_POOL_NAME
            failureThreshold: 3
            initialDelaySeconds: 300
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 150
          name: cstor-pool
          ports:
          - containerPort: 12000
            protocol: TCP
          - containerPort: 3232
            protocol: TCP
          - containerPort: 3233
            protocol: TCP
          resources:
            limits:
              memory: 4Gi
            requests:
              memory: 2Gi
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dev
            name: device
          - mountPath: /tmp
            name: tmp
          - mountPath: /run/udev
            name: udev
          - mountPath: /var/openebs/cstor-pool
            name: storagepath
          - mountPath: /var/tmp/sock
            name: sockfile
          - mountPath: /var/openebs/sparse
            name: sparse
        - args:
          - -e=pool
          command:
          - maya-exporter
          image: openebs/m-exporter:2.4.0
          imagePullPolicy: IfNotPresent
          name: maya-exporter
          ports:
          - containerPort: 9500
            protocol: TCP
          resources: {}
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /dev
            name: device
          - mountPath: /tmp
            name: tmp
          - mountPath: /run/udev
            name: udev
          - mountPath: /var/openebs/cstor-pool
            name: storagepath
          - mountPath: /var/tmp/sock
            name: sockfile
          - mountPath: /var/openebs/sparse
            name: sparse
        dnsPolicy: ClusterFirst
        nodeSelector:
          kubernetes.io/hostname: 127.0.0.1
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: openebs-maya-operator
        serviceAccountName: openebs-maya-operator
        terminationGracePeriodSeconds: 30
        volumes:
        - hostPath:
            path: /dev
            type: Directory
          name: device
        - hostPath:
            path: /run/udev
            type: Directory
          name: udev
        - hostPath:
            path: /var/openebs/cstor-pool/cstor-pool
            type: DirectoryOrCreate
          name: tmp
        - hostPath:
            path: /var/openebs/sparse
            type: DirectoryOrCreate
          name: sparse
        - hostPath:
            path: /var/openebs/cstor-pool/cstor-pool
            type: DirectoryOrCreate
          name: storagepath
        - emptyDir: {}
          name: sockfile
  status:
    availableReplicas: 1
    conditions:
    - lastTransitionTime: "2021-01-12T10:16:15Z"
      lastUpdateTime: "2021-01-12T10:16:39Z"
      message: ReplicaSet "cstor-pool-972g-7f4cfdd794" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
    - lastTransitionTime: "2021-01-12T11:13:07Z"
      lastUpdateTime: "2021-01-12T11:13:07Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    observedGeneration: 1
    readyReplicas: 1
    replicas: 1
    updatedReplicas: 1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

To track we can follow up with this thread: https://kubernetes.slack.com/archives/CUAKPFU78/p1608665319368100

Environment details:

OpenEBS version (use kubectl get po -n openebs --show-labels): 2.4.0
Kubernetes version (use kubectl version): 1.18
Cloud provider or hardware configuration: Rancher
OS (e.g: cat /etc/os-release): Centos
kernel (e.g: uname -a):
others: VMware Virtual disks

The text was updated successfully, but these errors were encountered:

shubham14bajpai linked a pull request Jan 15, 2021 that will close this issue

fix(migrate): update lock file path when csp is scaled down #85

Merged

6 tasks

shubham14bajpai mentioned this issue Jan 18, 2021

fix(migrate): update lock file path when csp is scaled down #85

Merged

6 tasks

prateekpandey14 closed this as completed in #85 Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration failure ended into data corruption #82

Migration failure ended into data corruption #82

shubham14bajpai commented Jan 12, 2021 •

edited

Migration failure ended into data corruption #82

Migration failure ended into data corruption #82

Comments

shubham14bajpai commented Jan 12, 2021 • edited

shubham14bajpai commented Jan 12, 2021 •

edited