virtual images in replicated volume are not healed? #1456

digidax · 2020-08-30T15:44:33Z

Description of problem:
Over three nodes is volume1 replicated, a Proxmox Server mounts 192.168.110.221 as primary and 192.168.110.222 as secondary (backup) server as FUSE mount:
192.168.110.221:volume1 on /mnt/pve/gluster1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
The heal info command shows on server .222 and .223 items to heal, all of them are the files of virtual images.
Since one week is so that situation, executing # gluster volume heal volume1 full doesn't have any effect. Is this normal because the files are everytime open and so in use because this are the images of virtual machines (LXC container)?

The exact command to reproduce the issue:
gluster volume heal volume1 info

The full output of the command that failed:
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
/pve_VMroot/images/191/vm-191-disk-2.raw
/pve_VMroot/images/172/vm-172-disk-0.raw
/pve_VMroot/images/183/vm-183-disk-0.raw
/pve_VMroot/images/184/vm-184-disk-0.raw
Status: Connected
Number of entries: 4

Brick 192.168.110.223:/bricks/raid5array1/brick1
gfid:22bcf75d-c04d-4a1f-94a7-437df3064584
gfid:e750f635-dade-497f-ab5b-fe2a3b6ea4cc
gfid:8385af8c-eefc-422d-acd6-b1579ca93662
gfid:9dc457dd-b2d3-4097-b5ec-c7b0b42d297e
Status: Connected
Number of entries: 4

Expected results:
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.223:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Additional Infos:
`#gluster volume heal volume1 info split-brain
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.110.223:/bricks/raid5array1/brick1
Status: Connected
Number of entries in split-brain: 0`

- The output of the gluster volume info command:
Volume Name: volume1
Type: Replicate
Volume ID: 8cb5d44b-a370-49d2-a0ed-ea5e4a9f6443
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.110.221:/bricks/raid5array1/brick1
Brick2: 192.168.110.222:/bricks/raid5array1/brick1
Brick3: 192.168.110.223:/bricks/raid5array1/brick1
Options Reconfigured:
auth.allow: 192.168.110.*
performance.cache-size: 512MB
nfs.disable: on
transport.address-family: inet

- The operating system / glusterfs version:
CentOS release 6.6 (Final)
glusterfs 3.12.1
2.6.32-504.1.3.el6.x86_64 #1 SMP

The text was updated successfully, but these errors were encountered:

pranithk · 2020-09-08T11:09:28Z

@digidax Sorry for the delay. Could you get the extended attributes of all of the files appearing in heal info on all the bricks and post it on the issue?
Example:
On all the bricks execute the following command and post the output. I am giving the command here for one of the files, you need to execute the same for each file.
getfattr -d -m. -e hex /pve_VMroot/images/191/vm-191-disk-2.raw

digidax · 2020-09-08T11:33:43Z

@pranithk I have moved all the virtual LXC container images from the GlusterFS storage to the ZFS pool of the Nodes and do a cross replication for security, additional to the backups.

I will do setup a test container and place the disk image again on ClusterFS storage. If the problem comes up again, I will do checking the attributes and do report here a.s.a.p.

digidax · 2020-09-08T11:47:37Z

Have created a test container on GlusterFS storage "volume1". Here is the status after creation:

gluster volume heal volume1 info
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.223:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

ls -la /bricks/raid5array1/brick1/pve_VMroot/images/1268                                      /
insgesamt 1165564
drwxr----- 2 root root          31  8. Sep 13:40 .
drwxr----- 3 root root          17  8. Sep 13:40 ..
-rw-r----- 2 root root 21474836480  8. Sep 13:43 vm-1268-disk-0.raw

everything fine at the moment

pranithk · 2020-09-08T12:42:05Z

Since we can't debug the bug until you observe it again, I would like to close this github issue. As soon as you find the gluser bug, please reopen this github issue with the xattrs requested. Thanks a lot for your help. Sorry again for the delay.

pranithk closed this as completed Sep 8, 2020

digidax mentioned this issue Nov 23, 2020

REOPEN: virtual images in replicated volume are not healed? #1456 #1831

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtual images in replicated volume are not healed? #1456

virtual images in replicated volume are not healed? #1456

digidax commented Aug 30, 2020 •

edited

pranithk commented Sep 8, 2020

digidax commented Sep 8, 2020 •

edited

digidax commented Sep 8, 2020 •

edited

pranithk commented Sep 8, 2020

virtual images in replicated volume are not healed? #1456

virtual images in replicated volume are not healed? #1456

Comments

digidax commented Aug 30, 2020 • edited

pranithk commented Sep 8, 2020

digidax commented Sep 8, 2020 • edited

digidax commented Sep 8, 2020 • edited

pranithk commented Sep 8, 2020

digidax commented Aug 30, 2020 •

edited

digidax commented Sep 8, 2020 •

edited

digidax commented Sep 8, 2020 •

edited