Skip to content
This repository has been archived by the owner on May 24, 2020. It is now read-only.

Brick goes offline when created a new PVC #24

Closed
ksandha opened this issue Oct 11, 2018 · 9 comments
Closed

Brick goes offline when created a new PVC #24

ksandha opened this issue Oct 11, 2018 · 9 comments
Assignees
Labels
GCS/0.4 GCS 0.4 release

Comments

@ksandha
Copy link

ksandha commented Oct 11, 2018

  1. Created a GCS cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY     STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-4cvkd   2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-m9z9n   2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-wclwr   2/2       Running   0          1h
csi-provisioner-glusterfsplugin-0      2/2       Running   0          1h
etcd-chvm79wqr4                        1/1       Running   0          1h
etcd-ndccs6pkq7                        1/1       Running   0          1h
etcd-operator-54bbdfc55d-vkxh6         1/1       Running   0          1h
etcd-rrfgwq5xkd                        1/1       Running   0          3m
kube1-0                                1/1       Running   0          1h
kube2-0                                1/1       Running   0          1h
kube3-0                                1/1       Running   0          1h

  1. create a PVC mount it on a app pod.
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007"
Volume : pvc-350277cfcd3111e8un/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/bri
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true   | 49152 |  53 |
| 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | true   | 49152 |  53 |
| ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true   | 49152 |  53 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
  1. Kill a brick from the gd2 pod
[root@kube1-0 /]# kill -9 53
[root@kube1-0 /]# 
[root@kube1-0 /]#  glustercli volume status --endpoints="http://kube2-0.glusterd2.
Volume : pvc-350277cfcd3111e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true   | 49152 |  53 |
| 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | false  |     0 |   0 |
| ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true   | 49152 |  53 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]# 

  1. delete the app pods and the pvc
    [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ kubectl delete pod redis1 pod "redis1" deleted [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ kubectl -n gcs -it exec kube1-0 -- /bin/bash [root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007" Volume : pvc-350277cfcd3111e8 +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ | BRICK ID | HOST | PATH | ONLINE | PORT | PID | +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ | 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true | 49152 | 53 | | 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | false | 0 | 0 | | ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true | 49152 | 53 | +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ [root@kube1-0 /]# [root@kube1-0 /]# exit [vagrant@kube1 ~]$ kubectl get pods No resources found. [vagrant@kube1 ~]$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gcs-pvc1 Bound pvc-350277cfcd3111e8 2Gi RWX glusterfs-csi 19m [vagrant@kube1 ~]$ kubectl delete pvc gcs-pvc1 persistentvolumeclaim "gcs-pvc1" deleted [vagrant@kube1 ~]$ kubectl get pvc No resources found. [vagrant@kube1 ~]$ kubectl -n gcs -it exec kube1-0 -- /bin/bash [root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007" No volumes found [root@kube1-0 /]#

  2. delete the gd2 pod and wait for a new pod to spin

[vagrant@kube1 ~]$ kubectl delete -n gcs pods kube1-0 --grace-period=0
pod "kube1-0" deleted

  1. Create a new pvc and check the volume status in the gd2 pod
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube3-0.glusterd2.gcs:24007"
Volume : pvc-044c90d4cd3411e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 981ca2e3-e282-4073-b540-82a1bd849a5c | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick3/brick | true   | 49152 | 175 |
| 22752a52-4be3-4ec2-beec-ad53290dfd3e | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick1/brick | true   | 49152 | 173 |
| 81d97c58-4f76-4c10-bcb7-1d64c552e515 | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick2/brick | false  |     0 |   0 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]# 

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 11, 2018

@ksandha can you paste the kubectl get po -ngcs output after bricks goes to offline

@ksandha
Copy link
Author

ksandha commented Oct 11, 2018

[vagrant@kube1 ~]$ kubectl get po -n gcs
NAME                                   READY     STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-4cvkd   2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-m9z9n   2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-wclwr   2/2       Running   0          1h
csi-provisioner-glusterfsplugin-0      2/2       Running   0          1h
etcd-chvm79wqr4                        1/1       Running   0          1h
etcd-ndccs6pkq7                        1/1       Running   0          1h
etcd-operator-54bbdfc55d-vkxh6         1/1       Running   0          1h
etcd-rrfgwq5xkd                        1/1       Running   0          29m
kube1-0                                1/1       Running   2          19m
kube2-0                                1/1       Running   0          1h
kube3-0                                1/1       Running   0          1h

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 11, 2018

RCA:

as we are not persisting the run directory in the glusterd2 stateful-set where are bricks gets created, once the glusterd2 pod restarts the content present in this directory will get vanished, because of this bricks are not coming up, even we are not able to delete the volume

@JohnStrunk JohnStrunk added the bug Something isn't working label Oct 11, 2018
@aravindavk
Copy link
Member

as we are not persisting the run directory in the glusterd2 stateful-set where are bricks gets created, once the glusterd2 pod restarts the content present in this directory will get vanished, because of this bricks are not coming up, even we are not able to delete the volume

As per my understanding, bricks are mounted in run directory. Glusterd2 restart will remount these bricks on start. Are there any failures while mounting the bricks? We don't need run dir to be persistent, am I missing something here?

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 25, 2018

I need to Analyze the bug again, this may take time. moving out of GCS/0.2

@Madhu-1 Madhu-1 added GCS/0.3 GCS 0.3 release and removed GCS/0.2 GCS 0.2 release labels Oct 25, 2018
@Madhu-1
Copy link
Member

Madhu-1 commented Nov 13, 2018

not able to reproduce with the latest build, @ksandha please verify this bug with latest build

@Madhu-1 Madhu-1 added GCS/0.4 GCS 0.4 release and removed bug Something isn't working GCS/0.3 GCS 0.3 release labels Nov 13, 2018
@Madhu-1
Copy link
Member

Madhu-1 commented Nov 15, 2018

@ksandha PTAL

@ksandha
Copy link
Author

ksandha commented Nov 20, 2018

Couldn't hit the issue with the latest build. @Madhu-1 please take appropriate action

@Madhu-1
Copy link
Member

Madhu-1 commented Nov 20, 2018

closing as per @ksandha comment.

@Madhu-1 Madhu-1 closed this as completed Nov 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
GCS/0.4 GCS 0.4 release
Projects
None yet
Development

No branches or pull requests

4 participants