Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

Old brick process is still running after volume reset -> stop -> start #1451

Open
PrasadDesala opened this issue Jan 2, 2019 · 4 comments
Open
Labels
brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug priority: low

Comments

@PrasadDesala
Copy link

Observed behavior

On a brick-mux enabled setup, old brick process is still running after volume reset -> stop -> start.

Expected/desired behavior

Old brick process should not be running.

Details on how to reproduce (minimal and precise)

  1. Create a 3 node gcs system using valgrind.
  2. With brick-mux enabled, create 100 pvc.
  3. Pick a volume and change one volume option so that brick mux spawns a new process for that volume.
    glustercli volume set pvc-520682df-0e6e-11e9-af0b-525400f94cb8 cluster/replicate.self-heal-daemon off --advanced
  4. Volume stop/start that volume. A new process is spawned on that node.
  5. Now, reset the volume option and stop/start the volume.
    glustercli volume reset pvc-520682df-0e6e-11e9-af0b-525400f94cb8 cluster/replicate.self-heal-daemon
  6. pgrep glusterfsd; we can see the old brick process is still running.

Information about the environment:

Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.94.git601ba61
Operating system used: Centos 7.6
Glusterd2 compiled from sources, as a package (rpm/deb), or container:
Using External ETCD: (yes/no, if yes ETCD version): yes; version 3.3.8
If container, which container image:
Using kubernetes, openshift, or direct install:
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: Kubernetes
```

@PrasadDesala
Copy link
Author

[root@gluster-kube1-0 /]# ps -ef | grep -i glusterfsd
root 9425 11692 0 11:16 pts/4 00:00:00 grep --color=auto -i glusterfsd
root 9469 1 0 10:36 ? 00:00:00 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-520682df-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-520682df-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-520682df-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/e7e1ef348943f9ac.socket --brick-name /var/run/glusterd2/bricks/pvc-520682df-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-520682df-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d
root 11720 1 10 09:29 ? 00:11:08 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/3c5c17b3422e2a07.socket --brick-name /var/run/glusterd2/bricks/pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d

statedump_kube-1.txt

kube3-glusterd2.log.gz
kube2-glusterd2.log.gz
kube1-glusterd2.log.gz

@atinmu
Copy link
Contributor

atinmu commented Jan 2, 2019

@PrasadDesala old brick process is serving the other 99 PVCs, isn't it? I fail to understand why is this a bug?

@PrasadDesala
Copy link
Author

@PrasadDesala old brick process is serving the other 99 PVCs, isn't it? I fail to understand why is this a bug?

Initially brick process p1 is serving all the volumes. Once I changed the volume option to a volume (lets say PVC100) and after volume stop/start of that volume a new brick process p2 is serving it. Now, p1 is serving 99 PVCs and p2 is serving PVC100 which is working as expected.

Now I have reset PVC100 and stop/start the volume. I see p2 process is still running, there is no need for this process to run as now all the PVCs are having same default volume options and p1 is serving all PVCs.

@atinmu
Copy link
Contributor

atinmu commented Jan 2, 2019

Now I have reset PVC100 and stop/start the volume. I see p2 process is still running, there is no need for this process to run as now all the PVCs are having same default volume options and p1 is serving all PVCs.

Hmm, I think this process was registered in the daemon which is why it still comes up as separate process. @vpandey-RH is there a easy way to handle this scenario?

In any case, please note in GCS environment volume reset isn't an operation which we'd recommend users to perform. So the priority of this issue should remain as low.

@atinmu atinmu added priority: low brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug labels Jan 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug priority: low
Projects
None yet
Development

No branches or pull requests

2 participants