Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtual images in replicated volume are not healed? #1456

Closed
digidax opened this issue Aug 30, 2020 · 4 comments
Closed

virtual images in replicated volume are not healed? #1456

digidax opened this issue Aug 30, 2020 · 4 comments

Comments

@digidax
Copy link

digidax commented Aug 30, 2020

Description of problem:
Over three nodes is volume1 replicated, a Proxmox Server mounts 192.168.110.221 as primary and 192.168.110.222 as secondary (backup) server as FUSE mount:
192.168.110.221:volume1 on /mnt/pve/gluster1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
The heal info command shows on server .222 and .223 items to heal, all of them are the files of virtual images.
Since one week is so that situation, executing # gluster volume heal volume1 full doesn't have any effect. Is this normal because the files are everytime open and so in use because this are the images of virtual machines (LXC container)?

The exact command to reproduce the issue:
gluster volume heal volume1 info

The full output of the command that failed:
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
/pve_VMroot/images/191/vm-191-disk-2.raw
/pve_VMroot/images/172/vm-172-disk-0.raw
/pve_VMroot/images/183/vm-183-disk-0.raw
/pve_VMroot/images/184/vm-184-disk-0.raw
Status: Connected
Number of entries: 4

Brick 192.168.110.223:/bricks/raid5array1/brick1
gfid:22bcf75d-c04d-4a1f-94a7-437df3064584
gfid:e750f635-dade-497f-ab5b-fe2a3b6ea4cc
gfid:8385af8c-eefc-422d-acd6-b1579ca93662
gfid:9dc457dd-b2d3-4097-b5ec-c7b0b42d297e
Status: Connected
Number of entries: 4

Expected results:
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.223:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Additional Infos:
`#gluster volume heal volume1 info split-brain
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
Status: Connected
Number of entries in split-brain: 0

Brick 192.168.110.223:/bricks/raid5array1/brick1
Status: Connected
Number of entries in split-brain: 0`

- The output of the gluster volume info command:
Volume Name: volume1
Type: Replicate
Volume ID: 8cb5d44b-a370-49d2-a0ed-ea5e4a9f6443
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.110.221:/bricks/raid5array1/brick1
Brick2: 192.168.110.222:/bricks/raid5array1/brick1
Brick3: 192.168.110.223:/bricks/raid5array1/brick1
Options Reconfigured:
auth.allow: 192.168.110.*
performance.cache-size: 512MB
nfs.disable: on
transport.address-family: inet

- The operating system / glusterfs version:
CentOS release 6.6 (Final)
glusterfs 3.12.1
2.6.32-504.1.3.el6.x86_64 #1 SMP

@pranithk
Copy link
Member

pranithk commented Sep 8, 2020

@digidax Sorry for the delay. Could you get the extended attributes of all of the files appearing in heal info on all the bricks and post it on the issue?
Example:
On all the bricks execute the following command and post the output. I am giving the command here for one of the files, you need to execute the same for each file.
getfattr -d -m. -e hex /pve_VMroot/images/191/vm-191-disk-2.raw

@digidax
Copy link
Author

digidax commented Sep 8, 2020

@pranithk I have moved all the virtual LXC container images from the GlusterFS storage to the ZFS pool of the Nodes and do a cross replication for security, additional to the backups.

I will do setup a test container and place the disk image again on ClusterFS storage. If the problem comes up again, I will do checking the attributes and do report here a.s.a.p.

@digidax
Copy link
Author

digidax commented Sep 8, 2020

Have created a test container on GlusterFS storage "volume1". Here is the status after creation:

gluster volume heal volume1 info
Brick 192.168.110.221:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.222:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

Brick 192.168.110.223:/bricks/raid5array1/brick1
Status: Connected
Number of entries: 0

ls -la /bricks/raid5array1/brick1/pve_VMroot/images/1268                                      /
insgesamt 1165564
drwxr----- 2 root root          31  8. Sep 13:40 .
drwxr----- 3 root root          17  8. Sep 13:40 ..
-rw-r----- 2 root root 21474836480  8. Sep 13:43 vm-1268-disk-0.raw

everything fine at the moment

@pranithk
Copy link
Member

pranithk commented Sep 8, 2020

Since we can't debug the bug until you observe it again, I would like to close this github issue. As soon as you find the gluser bug, please reopen this github issue with the xattrs requested. Thanks a lot for your help. Sorry again for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants