New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug:1652548] Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding #937
Comments
Time: 20181126T05:04:34 Is this a hyperconverged setup? |
Time: 20181126T10:02:41 |
Time: 20181126T10:04:50 |
Time: 20181126T10:06:00 |
Time: 20181126T12:43:40 dd if=/dev/vda of=/dev/null BS=1M status=progress VM paused host /var/log/glusterfs/rhev-data-center-mnt-glusterSD-s20gfs.ovirt.prisma:_VOL__VMDATA.log [2018-11-26 12:26:04.176267] I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret 0 and op_errno 0 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27 |
Time: 20181127T20:09:22 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 |
Time: 20181213T17:07:08
|
Time: 20200206T11:42:09 Just wanted to understand if you are still seeing this issue? From the logs, it seems shard is merely logging the error it got from the layers below. The problem doesn't appear to be in shard translator. |
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
URL: https://bugzilla.redhat.com/1652548
Creator: marcoc at prismatelecomtesting
Time: 20181122T10:52:38
Description of problem:
Error reading some files.
I'm trying to export a vm from gluster volume because oVirt pause the VM because storage error but it's not possible due to "Stale file handle" errors.
I mounted the volume on another server:
s23gfs.ovirt:VOL_VMDATA on /mnt/VOL_VMDATA type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)
Trying to read the file with cp, rsync or qemu-img converter has the same result:
qemu-img convert -p -t none -T none -f qcow2 /mnt/VOL_VMDATA/d4f82517-5ce0-4705-a89f-5d3c81adf764/images/dbb038ee-2794-40e8-877a-a4806c47f11f/f81e0be9-db3e-48ac-876f-57b6f7cb3fe8 -O raw PLONE_active-raw.img
qemu-img: error while reading sector 2448441344: Stale file handle
Version-Release number of selected component (if applicable):
Gluster 3.12.15-1.el7
In mount log file I got many errors like:
[2018-11-20 03:20:24.471344] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 3558 failed. Base file gfid = 4feb4a7e-e1a3-4fa3-8d38-3b929bf52d14 [Stale file handle]
[2018-11-20 08:56:21.110258] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 541 failed. Base file gfid = 2c1b6402-87b0-45cd-bd81-2cd3f38dd530 [Stale file handle]
Is there a way to fix this? It's a distributed 2 - replicate 3 volume with sharding.
Thanks,
Marco
Additional info:
gluster volume info VOL_VMDATA
Volume Name: VOL_VMDATA
Type: Distributed-Replicate
Volume ID: 7bd4e050-47dd-481e-8862-cd6b76badddc
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick2: s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick3: s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick4: s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick5: s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick6: s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Options Reconfigured:
auth.allow: 192.168.50.,172.16.4.,192.168.56.203
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: enable
features.shard-block-size: 512MB
cluster.data-self-heal-algorithm: full
nfs.disable: on
transport.address-family: inet
gluster volume heal VOL_VMDATA info
Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0
Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0
Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0
Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0
Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0
Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0
gluster volume status VOL_VMDATA
Status of volume: VOL_VMDATA
Gluster process TCP Port RDMA Port Online Pid
Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3186
Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 5148
Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3792
Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3257
Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 4402
Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3231
Self-heal Daemon on localhost N/A N/A Y 4192
Self-heal Daemon on s25gfs.ovirt.prisma N/A N/A Y 63185
Self-heal Daemon on s24gfs.ovirt.prisma N/A N/A Y 39535
Self-heal Daemon on s20gfs.ovirt.prisma N/A N/A Y 2785
Self-heal Daemon on s23gfs.ovirt.prisma N/A N/A Y 765
Self-heal Daemon on s22.ovirt.prisma N/A N/A Y 5828
Task Status of Volume VOL_VMDATA
There are no active volume tasks
The text was updated successfully, but these errors were encountered: