Old brick process is still running after volume reset -> stop -> start #1451

PrasadDesala · 2019-01-02T10:56:40Z

Observed behavior

On a brick-mux enabled setup, old brick process is still running after volume reset -> stop -> start.

Expected/desired behavior

Old brick process should not be running.

Details on how to reproduce (minimal and precise)

Create a 3 node gcs system using valgrind.
With brick-mux enabled, create 100 pvc.
Pick a volume and change one volume option so that brick mux spawns a new process for that volume.
glustercli volume set pvc-520682df-0e6e-11e9-af0b-525400f94cb8 cluster/replicate.self-heal-daemon off --advanced
Volume stop/start that volume. A new process is spawned on that node.
Now, reset the volume option and stop/start the volume.
glustercli volume reset pvc-520682df-0e6e-11e9-af0b-525400f94cb8 cluster/replicate.self-heal-daemon
pgrep glusterfsd; we can see the old brick process is still running.

Information about the environment:

Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.94.git601ba61
Operating system used: Centos 7.6
Glusterd2 compiled from sources, as a package (rpm/deb), or container:
Using External ETCD: (yes/no, if yes ETCD version): yes; version 3.3.8
If container, which container image:
Using kubernetes, openshift, or direct install:
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: Kubernetes
```

PrasadDesala · 2019-01-02T11:17:33Z

[root@gluster-kube1-0 /]# ps -ef | grep -i glusterfsd
root 9425 11692 0 11:16 pts/4 00:00:00 grep --color=auto -i glusterfsd
root 9469 1 0 10:36 ? 00:00:00 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-520682df-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-520682df-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-520682df-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/e7e1ef348943f9ac.socket --brick-name /var/run/glusterd2/bricks/pvc-520682df-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-520682df-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d
root 11720 1 10 09:29 ? 00:11:08 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/3c5c17b3422e2a07.socket --brick-name /var/run/glusterd2/bricks/pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d

statedump_kube-1.txt

kube3-glusterd2.log.gz
kube2-glusterd2.log.gz
kube1-glusterd2.log.gz

atinmu · 2019-01-02T11:24:14Z

@PrasadDesala old brick process is serving the other 99 PVCs, isn't it? I fail to understand why is this a bug?

PrasadDesala · 2019-01-02T13:13:25Z

@PrasadDesala old brick process is serving the other 99 PVCs, isn't it? I fail to understand why is this a bug?

Initially brick process p1 is serving all the volumes. Once I changed the volume option to a volume (lets say PVC100) and after volume stop/start of that volume a new brick process p2 is serving it. Now, p1 is serving 99 PVCs and p2 is serving PVC100 which is working as expected.

Now I have reset PVC100 and stop/start the volume. I see p2 process is still running, there is no need for this process to run as now all the PVCs are having same default volume options and p1 is serving all PVCs.

atinmu · 2019-01-02T13:19:42Z

Now I have reset PVC100 and stop/start the volume. I see p2 process is still running, there is no need for this process to run as now all the PVCs are having same default volume options and p1 is serving all PVCs.

Hmm, I think this process was registered in the daemon which is why it still comes up as separate process. @vpandey-RH is there a easy way to handle this scenario?

In any case, please note in GCS environment volume reset isn't an operation which we'd recommend users to perform. So the priority of this issue should remain as low.

atinmu added priority: low brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug labels Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Old brick process is still running after volume reset -> stop -> start #1451

Old brick process is still running after volume reset -> stop -> start #1451

PrasadDesala commented Jan 2, 2019

PrasadDesala commented Jan 2, 2019

atinmu commented Jan 2, 2019

PrasadDesala commented Jan 2, 2019

atinmu commented Jan 2, 2019

Old brick process is still running after volume reset -> stop -> start #1451

Old brick process is still running after volume reset -> stop -> start #1451

Comments

PrasadDesala commented Jan 2, 2019

Observed behavior

Expected/desired behavior

Details on how to reproduce (minimal and precise)

Information about the environment:

PrasadDesala commented Jan 2, 2019

atinmu commented Jan 2, 2019

PrasadDesala commented Jan 2, 2019

atinmu commented Jan 2, 2019