Skip to content
This repository has been archived by the owner on Jul 6, 2023. It is now read-only.

Orphan bricks ? #926

Closed
metal3d opened this issue Dec 4, 2017 · 20 comments
Closed

Orphan bricks ? #926

metal3d opened this issue Dec 4, 2017 · 20 comments

Comments

@metal3d
Copy link

metal3d commented Dec 4, 2017

Hi,
Using heketi on Openshift 3.5 with heketi:latest image, I have a strange report - a lot og bricks has no path and I don't know why and how to remove them.

$ heketi-cli topology info
...
Node Id: 4d00220cd5ef691dc5574f7de72c2bce
	State: online
	Cluster Id: 7bae657979f9625bb4ae386b44b0c381
	Zone: 1
	Management Hostname: XXXXXX
	Storage Hostname: 172.16.135.15
	Devices:
		Id:269d5022426278f091001f0f11db3a78   Name:/dev/sdb            State:online    Size (GiB):308     Used (GiB):153     Free (GiB):155     
			Bricks:
				Id:020d4d2cd5bbe43d1cd551df86cdc06a   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_269d5022426278f091001f0f11db3a78/brick_020d4d2cd5bbe43d1cd551df86cdc06a/brick
				Id:02c5dcb1539b44995b3e79e8917fb5da   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_269d5022426278f091001f0f11db3a78/brick_02c5dcb1539b44995b3e79e8917fb5da/brick
				Id:02f68f3c9e5854cd576bfad8dfc997de   Size (GiB):1       Path: 
				Id:03341c06dd8689ab04c138ca952d8820   Size (GiB):1       Path: 
				Id:062ff256832391558388b126eb8f6ecf   Size (GiB):1       Path: 
				Id:0a0bb8a856ccd7e14b87c8a2916ed934   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_269d5022426278f091001f0f11db3a78/brick_0a0bb8a856ccd7e14b87c8a2916ed934/brick
				Id:0a28812119ec0d6995964863892c3de0   Size (GiB):1       Path: 
				Id:0abbd9ccac92f63f03c9e4867ab9136e   Size (GiB):1       Path: 
				Id:0bd7939a5ca53bee9a59ac5d8297def8   Size (GiB):1       Path: 
				Id:0d16712413a46b530f1c431dc97bfe68   Size (GiB):1       Path: 
				Id:0f83bacda9041dfb03819d4cf1d4698a   Size (GiB):1       Path: 
...

The problem is that I think that space is not free.

I don't want to break heketi database (to reimport topology) but nothing is explained in documentation to remove bricks.

Note that one node was down this morning and I restarted device after reboot.

Is that a bug ?

@dquagebeur
Copy link

dquagebeur commented Jan 16, 2018

Hi,
I think we are in a similar case.

We found that heketi reports a full cluster, but in fact, the cluster is only 55% used (1TB).
In our 'heketi-cli topology info', there are 3 cases:

  • Normal: bricks with id and path, and id referenced in a volume (Volumes section)
  • Mounted but unreferenced: bricks with id and path, but id not referenced in any volume
  • Unmounted: like metal3d, bricks with id but no path, and no reference in any volume

If I 'gluster volume list' on each node, I have the same list than topology 'volumes' section (nearly 130 volumes), so this part seems correct.

How can I clean the topology to remove bad blocks and restore free space ? Does heketi includes something like a 'synchronization process' where it calls gluster to keep only real bricks ?

PS:
sh-4.4# heketi-cli -v
heketi-cli v5.0.1
heketi is used containerized in an Openshift cluster

@johnsimcall
Copy link

This issue is impacting me as well. It seems I got into this mess because I ran low on disk space. I'm in the middle of manually removing the orphaned bricks from the OCP nodes hosting the Gluster pods. Basically I'm comparing the output of "heketi-cli topology info" against each Gluster pod's "lvs"...

@robdewit
Copy link

+1, I ran into the same problem a month ago, but I reinitialised from scratch so I can't provide details any more :-(

If I had a 'sychronization process' I would have been able to keep used bricks.

(BTW: I would like a 'synchronisation process' also to be able to adjust replication on existing volumes. I could do it using gluster and would be happy if heketi would sync against the changed volume info [Issue #963])

@metal3d
Copy link
Author

metal3d commented Feb 1, 2018

Problem no resolved... and now prevents me to create new volumes.

@metal3d
Copy link
Author

metal3d commented Feb 1, 2018

@johnsimcall can you please give me the command you are using to remove empty bricks ? I really don't find hwo to do it manually and right now I'm in an urgent situation: I cannot create new volumes
Thanks

@johnsimcall
Copy link

@metal3d

Disclaimer: I'm not a heketi/gluster developer. I can't guarantee this is the right way to do it.

Start by collecting all the data:

# oc get pv
# oc get --all-namespaces
# heketi-cli topology info
# oc rsh gluster-XXXX gluster volume info
# oc rsh gluster-XXXX cat /var/lib/heketi/fstab
# oc rsh gluster-XXXX lvs  # you may also need node-level access where these containers live

Then compare what you've got, make a list of what needs to stay (e.g. 2GB heketidb volume and bricks), and start removing the stuff that shouldn't be there. I stole these steps from the output of "oc logs heketi-XXXX"

# Remove unused PVC and PV from openshift
# ...

# Remove unused volumes from Heketi
# heketi-cli volume delete xxxxxxxx

# Remove orphaned Gluster volumes
# gluster vol stop xxxxxxxx
# gluster vol delete xxxxxxxx

# Remove orphaned bricks and cleanup brick references
# umount /var/lib/heketi/mounts/vg_f067a6d1192e10332ef54923357f5d31/brick_9dff504c8f34d706c1a718f7b3f768da
# lvremove -f vg_f067a6d1192e10332ef54923357f5d31/tp_9dff504c8f34d706c1a718f7b3f768da
# sed -i.save "/brick_9dff504c8f34d706c1a718f7b3f768da/d" /var/lib/heketi/fstab

@metal3d
Copy link
Author

metal3d commented Feb 2, 2018

@johnsimcall thanks a lot. But there is a problem that is not covered by your explanation (that should work for others than me)

I take an example, I've got a brick with id "02fd0a827d4bad85a0b8668297bcd538". I checked in /var/lib/heketi/fstab, in lvs output and so on... and there is no mount point, no entry in fstab, no entry in lvs on the three nodes.

This brick appears in topology. I guess that we should remove that from heketidb so...

And for other bricks that are found in topology that have no volumes and no path, even if I remove them with lvremove and from fstab, they appear in topology info output afterward.

Really, I think that the problem comes from heketidb.

@johnsimcall
Copy link

@metal3d it sounds like we have different problems. My problem was that the HeketiDB showed nothing, but there were things on disk (e.g. partitions, lvm, fstab, etc...)

It sounds like your problem is that there is nothing on disk, but HeketiDB still knows about the volumes and their components. In this case, I would hope that a simple hekti-cli volume delete ... would do the cleanup. I would recommend watching the heketi logs when you issue this command and looking for failures (e.g. failed to remove lvm, because it doesn't exist)

@metal3d
Copy link
Author

metal3d commented Feb 5, 2018

@johnsimcall we have different problem, I can confirm ;)
I made a script to list "bricks" with no path, this script tries to find lvs and fstab entry. 50% are not there.
This bricks have no associated volumes. So I cannot remove anything (no volume to remove, no mountpoint, no fstab entry and heketi is not able to let me remove that entry from its DB)

Other bricks are mounted, and have entry in fstab, I'm checking how to find wich volume is associated to let my script to remove the gluster volume, then to remove mountpoint, LV and fstab entries.

But there will be a lot of bricks that are referenced in heketi wit "no volume", I repeat, heketi doesn't find any volume for that bricks so I cannot remove them.

@metal3d
Copy link
Author

metal3d commented Feb 7, 2018

Ok, with the help of "obnox" and "rastar" on IRC channel, I'v made a python script that can work with #959 PR. I will send script and procedure to fix it today or tomorrow (I need time to recheck)

We need the latest master source to build heketi binary that have "db" commands; then you will be able to fixup json with the python script, reimport and restarted heketi.

Orphan bricks disapeared as expected (you need to remove thin pool and volume manually, but a script can also do that, and you need backup if something goes wrong)

But, unfortunatly, that didn't fixed the #996 issue

Right now, I cannot create volume, despite the fact I've got space and fixed heketi database.

@john-bakker
Copy link

@metal3d did you fix your issue already? It looks like I'm having similar issues. I had a problem on the underlying glusterfs which had me restarting the glusterd on 3 of the 5 nodes. Now after this restart I'm getting errors like :

Warning ProvisioningFailed 38m persistentvolume-controller Failed to provision volume with StorageClass "filesystem-storage": glusterfs: create volume err: error creating vol ume Unable to execute command on glusterfs-mzgsz: volume create: vol_f873a52e2236d5db767ae2c5f7e04130: failed: Staging failed on 829k8storage1 . Error: Brick: 10.165.210.97:/var/lib/heketi/mounts/vg_a3be9f9fb24ee1b0933ac74c48cda955/brick_a375ca3d1e7901a954c7ab5eb657a6ed/brick not available. Brick may be containing or be contained by an e xisting brick.

When I check my heketi-cli topology info I see the same as you describe, a lot of bricks without paths

@obnoxxx
Copy link
Contributor

obnoxxx commented May 24, 2018

With heketi v6, these orphan bricks can not get created any longer. And there is a tool to clean those:

heketi db delete-bricks-with-empty-path --dbfile=/db/file/path/

Note that this is the heketi server binary, not the client!
It is operating in a new db-maintenance mode.
This is to be run while the server is not running as a daemon.

@obnoxxx
Copy link
Contributor

obnoxxx commented May 24, 2018

It is possible that you need to do some cleanup on the gluster side in addition to this. And there may be more problems in the DB than just the orphan bricks, but incomplete volume entries etc. As said, such inconsistencies can not be created any more starting from heketi v6, but the task here is to clean up the old system.

@nelsonfassis
Copy link

@obnoxxx Your solution didn't work, I also didn't find the documentation for such command, db delete-bicks-with-empty-path... Did I misunderstood the usage?

[root@nelson]# heketi-cli --version
heketi-cli 6.0.0
[root@nelson]# heketi --version
Heketi 6.0.0
[root@nelson]# heketi db delete-bricks-with-empty-path --dbfile=/var/lib/heketi/heketi.db 
neither --all flag nor list of clusters/nodes/devices is given

@rblaine95
Copy link

rblaine95 commented Jun 25, 2018

I have this same problem.
I tried the command mentioned above.

[root@heketi-storage-1-dv98n heketi]# echo $PWD
/var/lib/heketi
[root@heketi-storage-1-dv98n heketi]# heketi db delete-bricks-with-empty-path --dbfile=/var/lib/heketi.db --all
failed to delete bricks with empty path: Unable to access list

Edit:
Solved orphaned bricks

$ oc scale dc/heketi-storage --replicas=0

$ wget https://github.com/heketi/heketi/releases/download/v7.0.0/heketi-v7.0.0.linux.amd64.tar.gz
$ tar -xzvf heketi-v7.0.0.linux.amd64.tar.gz
$ cd heketi

$ sudo mount -t glusterfs 192.168.10.161:heketidbstorage /mnt
$ sudo ./heketi db delete-bricks-with-empty-path --dbfile=/mnt/heketi.db --all
$ sudo umount /mnt

$ oc scale dc/heketi-storage --replicas=1

@l0v2
Copy link

l0v2 commented Aug 2, 2018

I also have this same problem.

After remove the orphaned bricks, you need to execute
heketi-cli device resync <DEVICE_ID>

Then, the modified db is active.

You can check the free size with command heketi-cli node info <NODE_ID>

@dientm
Copy link

dientm commented Dec 11, 2018

I try to run
heketi-cli device resync 280692e88aa5b7e7986a199d8d72ec9e

Device 280692e88aa5b7e7986a199d8d72ec9e updated

and check node info
Id:280692e88aa5b7e7986a199d8d72ec9e Name:/dev/sdb State:online Size (GiB):399 Used (GiB):358 Free (GiB):41 Bricks:54

But after few minutes check node info again, there is no free space remains

heketi-cli node info 2df062ba573a1669928de3ea2f

Node Id: 2df062ba573a1669928de3ea2fc2b812
State: online
Cluster Id: 105f727814efa9de41da3e3b72158f2f
Zone: 1
Management Hostname: 192.168.104.102
Storage Hostname: 192.168.104.102
Devices:
Id:280692e88aa5b7e7986a199d8d72ec9e Name:/dev/sdb State:online Size (GiB):399 Used (GiB):398 Free (GiB):1 Bricks:54

Does anyone have advise? Thanks,

@flashdumper
Copy link

Thanks, @dientm device resync worked for me as well.

@metal3d
Copy link
Author

metal3d commented Jan 25, 2019

@metal3d did you fix your issue already?

Yes, see my previous comment, just above yours. But that was for an old version of Heketi, I'm not sure it's ok for new releases.

@phlogistonjohn
Copy link
Contributor

This issue is fairly old and heketi has changed a lot since most of these items were posted. Rather than leave this issue open I'm going to close it and encourage anyone who has similar problems to file a new issue as it is likely caused by something else.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests