[bug:1652548] Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding #937

gluster-ant · 2020-03-12T12:44:50Z

URL: https://bugzilla.redhat.com/1652548
Creator: marcoc at prismatelecomtesting
Time: 20181122T10:52:38

Description of problem:
Error reading some files.
I'm trying to export a vm from gluster volume because oVirt pause the VM because storage error but it's not possible due to "Stale file handle" errors.

I mounted the volume on another server:
s23gfs.ovirt:VOL_VMDATA on /mnt/VOL_VMDATA type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)

Trying to read the file with cp, rsync or qemu-img converter has the same result:

qemu-img convert -p -t none -T none -f qcow2 /mnt/VOL_VMDATA/d4f82517-5ce0-4705-a89f-5d3c81adf764/images/dbb038ee-2794-40e8-877a-a4806c47f11f/f81e0be9-db3e-48ac-876f-57b6f7cb3fe8 -O raw PLONE_active-raw.img

qemu-img: error while reading sector 2448441344: Stale file handle

Version-Release number of selected component (if applicable):
Gluster 3.12.15-1.el7

In mount log file I got many errors like:
[2018-11-20 03:20:24.471344] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 3558 failed. Base file gfid = 4feb4a7e-e1a3-4fa3-8d38-3b929bf52d14 [Stale file handle]
[2018-11-20 08:56:21.110258] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 541 failed. Base file gfid = 2c1b6402-87b0-45cd-bd81-2cd3f38dd530 [Stale file handle]

Is there a way to fix this? It's a distributed 2 - replicate 3 volume with sharding.

Thanks,
Marco

Additional info:

gluster volume info VOL_VMDATA

Volume Name: VOL_VMDATA
Type: Distributed-Replicate
Volume ID: 7bd4e050-47dd-481e-8862-cd6b76badddc
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick2: s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick3: s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick4: s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick5: s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Brick6: s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Options Reconfigured:
auth.allow: 192.168.50.,172.16.4.,192.168.56.203
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: enable
features.shard-block-size: 512MB
cluster.data-self-heal-algorithm: full
nfs.disable: on
transport.address-family: inet

gluster volume heal VOL_VMDATA info

Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDATA/brick
Status: Connected
Number of entries: 0

gluster volume status VOL_VMDATA

Status of volume: VOL_VMDATA
Gluster process TCP Port RDMA Port Online Pid

Brick s20gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3186
Brick s21gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 5148
Brick s22gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3792
Brick s23gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3257
Brick s24gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 4402
Brick s25gfs.ovirt.prisma:/gluster/VOL_VMDA
TA/brick 49153 0 Y 3231
Self-heal Daemon on localhost N/A N/A Y 4192
Self-heal Daemon on s25gfs.ovirt.prisma N/A N/A Y 63185
Self-heal Daemon on s24gfs.ovirt.prisma N/A N/A Y 39535
Self-heal Daemon on s20gfs.ovirt.prisma N/A N/A Y 2785
Self-heal Daemon on s23gfs.ovirt.prisma N/A N/A Y 765
Self-heal Daemon on s22.ovirt.prisma N/A N/A Y 5828

Task Status of Volume VOL_VMDATA

There are no active volume tasks

gluster-ant · 2020-03-12T12:44:52Z

Time: 20181126T05:04:34
sabose at redhat commented:
Can you please attach the gluster logs from /var/log/glusterfs from the host on which the VM is running? Please also attach the vdsm log /var/log/vdsm/vdsm.log

Is this a hyperconverged setup?

gluster-ant · 2020-03-12T12:44:53Z

Time: 20181126T10:02:41
marcoc at prismatelecomtesting commented:
Created attachment 1508529
vdsm.log

gluster-ant · 2020-03-12T12:44:53Z

Time: 20181126T10:04:50
marcoc at prismatelecomtesting commented:
Created attachment 1508530
glusterfs vms data volume mount log

gluster-ant · 2020-03-12T12:44:54Z

Time: 20181126T10:06:00
marcoc at prismatelecomtesting commented:
Hi Sahina,
no, it's not a hyperconverged setup. The Engine runs on a separate KVM server.

gluster-ant · 2020-03-12T12:44:55Z

Time: 20181126T12:43:40
marcoc at prismatelecomtesting commented:
Hi,
I tried running dd from the VM:

dd if=/dev/vda of=/dev/null BS=1M status=progress
13519000000 (14GB) copied, 14.056719s, 962 MB/s

VM paused

host /var/log/glusterfs/rhev-data-center-mnt-glusterSD-s20gfs.ovirt.prisma:_VOL__VMDATA.log

[2018-11-26 12:26:04.176267] I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret 0 and op_errno 0 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27
[2018-11-26 12:26:04.177778] I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret -1 and op_errno 2 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27
[2018-11-26 12:26:04.207594] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.208249] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.208668] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7
[2018-11-26 12:26:04.208700] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.208722] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819774: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.209947] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.210225] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7
[2018-11-26 12:26:04.210265] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819772: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db " repeated 2 times between [2018-11-26 12:26:04.207594] and [2018-11-26 12:26:04.213769]
[2018-11-26 12:26:04.214709] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.214767] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819773: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]" repeated 2 times between [2018-11-26 12:26:04.208700] and [2018-11-26 12:26:04.214761]
[2018-11-26 12:26:04.214801] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.215431] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.215436] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.215483] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819771: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.215482] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.215957] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.216003] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819751: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.216307] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819750: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]" repeated 2 times between [2018-11-26 12:26:04.215436] and [2018-11-26 12:26:04.216301]
[2018-11-26 12:26:04.218010] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.218610] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.218616] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.218651] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819753: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.219400] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.219923] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.219999] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819756: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.219990] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.220589] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.221134] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.221140] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.221174] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819742: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.221921] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.222407] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.222458] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819754: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.222451] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.223214] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.223747] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.223753] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.223789] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819755: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.224431] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.224905] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.224954] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819757: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.224948] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.225654] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
The message "W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7" repeated 2 times between [2018-11-26 12:26:04.210225] and [2018-11-26 12:26:04.226192]
[2018-11-26 12:26:04.226202] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.226283] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819758: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.227461] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819759: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.228805] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819760: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.229964] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819761: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.231246] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819762: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.232471] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819763: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.233704] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819766: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.234984] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819767: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
[2018-11-26 12:26:04.236171] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819768: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db" repeated 10 times between [2018-11-26 12:26:04.224905] and [2018-11-26 12:26:04.237655]
[2018-11-26 12:26:04.237795] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-1, gfid local = 7b295e4a-7a48-48ab-94ad-14fecb3c96db, gfid node = 421b2564-332f-4761-a85c-b1b86f9f23c7
[2018-11-26 12:26:04.237846] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819765: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db " repeated 10 times between [2018-11-26 12:26:04.225654] and [2018-11-26 12:26:04.239028]
[2018-11-26 12:26:04.239518] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.239569] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819769: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]" repeated 10 times between [2018-11-26 12:26:04.226202] and [2018-11-26 12:26:04.239562]
[2018-11-26 12:26:04.241144] W [MSGID: 109009] [dht-common.c:2210:dht_lookup_linkfile_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid different on data file on VOL_VMDATA-replicate-0, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.241697] W [MSGID: 109009] [dht-common.c:1949:dht_lookup_everywhere_cbk] 0-VOL_VMDATA-dht: /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27: gfid differs on subvolume VOL_VMDATA-replicate-0, gfid local = 421b2564-332f-4761-a85c-b1b86f9f23c7, gfid node = 7b295e4a-7a48-48ab-94ad-14fecb3c96db
[2018-11-26 12:26:04.241704] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk] 0-VOL_VMDATA-shard: Lookup on shard 27 failed. Base file gfid = 62e1c5d8-8533-4e6b-826e-030680043011 [Stale file handle]
[2018-11-26 12:26:04.241743] W [fuse-bridge.c:2318:fuse_readv_cbk] 0-glusterfs-fuse: 237819764: READ => -1 gfid=62e1c5d8-8533-4e6b-826e-030680043011 fd=0x7f5d8c14e680 (Stale file handle)
The message "I [MSGID: 109069] [dht-common.c:1474:dht_lookup_unlink_stale_linkto_cbk] 0-VOL_VMDATA-dht: Returned with op_ret -1 and op_errno 2 for /.shard/62e1c5d8-8533-4e6b-826e-030680043011.27" repeated 2 times between [2018-11-26 12:26:04.177778] and [2018-11-26 12:26:04.180779]

gluster-ant · 2020-03-12T12:44:56Z

Time: 20181127T20:09:22
marcoc at prismatelecomtesting commented:
These are the outputs of getfattr -d -m . -e hex on the file that gives error on each of the 3 replica servers:

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000058e63c48000b64cb
trusted.gfid=0x4feb4a7ee1a34fa38d383b929bf52d14
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x00000200000000000000000000000000000000009929f79e0000000000000000

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.VOL_VMDATA-client-3=0x000000000000000000000000
trusted.afr.VOL_VMDATA-client-5=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000058da63390009e5b2
trusted.gfid=0x4feb4a7ee1a34fa38d383b929bf52d14
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x00000200000000000000000000000000000000009929f79e0000000000000000

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.VOL_VMDATA-client-3=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000058dd284f000dee61
trusted.gfid=0x4feb4a7ee1a34fa38d383b929bf52d14
trusted.glusterfs.shard.block-size=0x0000000020000000
trusted.glusterfs.shard.file-size=0x00000200000000000000000000000000000000009929f79e0000000000000000

gluster-ant · 2020-03-12T12:44:57Z

Time: 20181213T17:07:08
stefano.stagnaro at gmail commented:
Here some of the tests we have done:

compared the extended attributes of all of the three replicas of the involved shard. Found identical attributes.
compared SHA512 message digest of all of the three replicas of the involved shard. Found identical digests.
tried to delete the shard from a replica set, one at a time, along with its hard link. Shard is always rebuilt correctly but error from client persists.

gluster-ant · 2020-03-12T12:44:58Z

Time: 20200206T11:42:09
kdhananj at redhat commented:
Hi,

Just wanted to understand if you are still seeing this issue? From the logs, it seems shard is merely logging the error it got from the layers below. The problem doesn't appear to be in shard translator.

stale · 2020-10-08T19:35:45Z

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

stale · 2020-10-23T20:32:14Z

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

gluster-ant added Migrated Type:Bug labels Mar 12, 2020

gluster-ant assigned KritikaDhananjay Mar 12, 2020

stale bot added the wontfix Managed by stale[bot] label Oct 8, 2020

stale bot closed this as completed Oct 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug:1652548] Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding #937

[bug:1652548] Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding #937

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

stale bot commented Oct 8, 2020

stale bot commented Oct 23, 2020

[bug:1652548] Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding #937

[bug:1652548] Error reading some files - Stale file handle - distribute 2 - replica 3 volume with sharding #937

Comments

gluster-ant commented Mar 12, 2020

qemu-img convert -p -t none -T none -f qcow2 /mnt/VOL_VMDATA/d4f82517-5ce0-4705-a89f-5d3c81adf764/images/dbb038ee-2794-40e8-877a-a4806c47f11f/f81e0be9-db3e-48ac-876f-57b6f7cb3fe8 -O raw PLONE_active-raw.img

gluster volume info VOL_VMDATA

gluster volume heal VOL_VMDATA info

gluster volume status VOL_VMDATA

Status of volume: VOL_VMDATA Gluster process TCP Port RDMA Port Online Pid

Task Status of Volume VOL_VMDATA

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

stale bot commented Oct 8, 2020

stale bot commented Oct 23, 2020

Status of volume: VOL_VMDATA
Gluster process TCP Port RDMA Port Online Pid