Orphan bricks ? #926
Comments
Hi, We found that heketi reports a full cluster, but in fact, the cluster is only 55% used (1TB).
If I 'gluster volume list' on each node, I have the same list than topology 'volumes' section (nearly 130 volumes), so this part seems correct. How can I clean the topology to remove bad blocks and restore free space ? Does heketi includes something like a 'synchronization process' where it calls gluster to keep only real bricks ? PS: |
This issue is impacting me as well. It seems I got into this mess because I ran low on disk space. I'm in the middle of manually removing the orphaned bricks from the OCP nodes hosting the Gluster pods. Basically I'm comparing the output of "heketi-cli topology info" against each Gluster pod's "lvs"... |
+1, I ran into the same problem a month ago, but I reinitialised from scratch so I can't provide details any more :-( If I had a 'sychronization process' I would have been able to keep used bricks. (BTW: I would like a 'synchronisation process' also to be able to adjust replication on existing volumes. I could do it using gluster and would be happy if heketi would sync against the changed volume info [Issue #963]) |
Problem no resolved... and now prevents me to create new volumes. |
@johnsimcall can you please give me the command you are using to remove empty bricks ? I really don't find hwo to do it manually and right now I'm in an urgent situation: I cannot create new volumes |
Disclaimer: I'm not a heketi/gluster developer. I can't guarantee this is the right way to do it. Start by collecting all the data:
Then compare what you've got, make a list of what needs to stay (e.g. 2GB heketidb volume and bricks), and start removing the stuff that shouldn't be there. I stole these steps from the output of "oc logs heketi-XXXX"
|
@johnsimcall thanks a lot. But there is a problem that is not covered by your explanation (that should work for others than me) I take an example, I've got a brick with id "02fd0a827d4bad85a0b8668297bcd538". I checked in This brick appears in topology. I guess that we should remove that from heketidb so... And for other bricks that are found in topology that have no volumes and no path, even if I remove them with lvremove and from fstab, they appear in topology info output afterward. Really, I think that the problem comes from heketidb. |
@metal3d it sounds like we have different problems. My problem was that the HeketiDB showed nothing, but there were things on disk (e.g. partitions, lvm, fstab, etc...) It sounds like your problem is that there is nothing on disk, but HeketiDB still knows about the volumes and their components. In this case, I would hope that a simple |
@johnsimcall we have different problem, I can confirm ;) Other bricks are mounted, and have entry in fstab, I'm checking how to find wich volume is associated to let my script to remove the gluster volume, then to remove mountpoint, LV and fstab entries. But there will be a lot of bricks that are referenced in heketi wit "no volume", I repeat, heketi doesn't find any volume for that bricks so I cannot remove them. |
Ok, with the help of "obnox" and "rastar" on IRC channel, I'v made a python script that can work with #959 PR. I will send script and procedure to fix it today or tomorrow (I need time to recheck) We need the latest master source to build heketi binary that have "db" commands; then you will be able to fixup json with the python script, reimport and restarted heketi. Orphan bricks disapeared as expected (you need to remove thin pool and volume manually, but a script can also do that, and you need backup if something goes wrong) But, unfortunatly, that didn't fixed the #996 issue Right now, I cannot create volume, despite the fact I've got space and fixed heketi database. |
@metal3d did you fix your issue already? It looks like I'm having similar issues. I had a problem on the underlying glusterfs which had me restarting the glusterd on 3 of the 5 nodes. Now after this restart I'm getting errors like :
When I check my heketi-cli topology info I see the same as you describe, a lot of bricks without paths |
With heketi v6, these orphan bricks can not get created any longer. And there is a tool to clean those:
Note that this is the heketi server binary, not the client! |
It is possible that you need to do some cleanup on the gluster side in addition to this. And there may be more problems in the DB than just the orphan bricks, but incomplete volume entries etc. As said, such inconsistencies can not be created any more starting from heketi v6, but the task here is to clean up the old system. |
@obnoxxx Your solution didn't work, I also didn't find the documentation for such command, db delete-bicks-with-empty-path... Did I misunderstood the usage?
|
I have this same problem. [root@heketi-storage-1-dv98n heketi]# echo $PWD
/var/lib/heketi
[root@heketi-storage-1-dv98n heketi]# heketi db delete-bricks-with-empty-path --dbfile=/var/lib/heketi.db --all
failed to delete bricks with empty path: Unable to access list Edit: $ oc scale dc/heketi-storage --replicas=0
$ wget https://github.com/heketi/heketi/releases/download/v7.0.0/heketi-v7.0.0.linux.amd64.tar.gz
$ tar -xzvf heketi-v7.0.0.linux.amd64.tar.gz
$ cd heketi
$ sudo mount -t glusterfs 192.168.10.161:heketidbstorage /mnt
$ sudo ./heketi db delete-bricks-with-empty-path --dbfile=/mnt/heketi.db --all
$ sudo umount /mnt
$ oc scale dc/heketi-storage --replicas=1 |
I also have this same problem. After remove the orphaned bricks, you need to execute Then, the modified db is active. You can check the free size with command |
I try to run
and check node info But after few minutes check node info again, there is no free space remains
Does anyone have advise? Thanks, |
Thanks, @dientm device resync worked for me as well. |
Yes, see my previous comment, just above yours. But that was for an old version of Heketi, I'm not sure it's ok for new releases. |
This issue is fairly old and heketi has changed a lot since most of these items were posted. Rather than leave this issue open I'm going to close it and encourage anyone who has similar problems to file a new issue as it is likely caused by something else. |
Hi,
Using heketi on Openshift 3.5 with heketi:latest image, I have a strange report - a lot og bricks has no path and I don't know why and how to remove them.
The problem is that I think that space is not free.
I don't want to break heketi database (to reimport topology) but nothing is explained in documentation to remove bricks.
Note that one node was down this morning and I restarted device after reboot.
Is that a bug ?
The text was updated successfully, but these errors were encountered: