Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely filling up a Disperse volume results in unreadable/unhealable files that must be deleted. #2021

Closed
JeffByers-SF opened this issue Jan 19, 2021 · 7 comments
Labels
wontfix Managed by stale[bot]

Comments

@JeffByers-SF
Copy link

Completely filling up a Disperse volume results in unreadable
(EIO) and unhealable files that must be deleted. This is
unfortunate, because although the files had append write
failures, the first parts of the files could still be intact
and usable.

The problem first occurred on a volume being used as storage
for video surveillance cameras. The application failed to
clean up space, so completely filled the GlusterFS Disperse volume.

An earlier GlusterFS version was being used, but the problem
was created in a lab environment using the latest version using
the procedure below.

glusterfs 8.3
# uname -a
Linux Centos8x 4.18.0-240.1.1.el8_3.x86_64 #1 SMP Thu Nov 19 17:20:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# gluster volume info nas-disp-vol
Volume Name: nas-disp-vol
Type: Disperse
Volume ID: 350f7f4b-ef91-4ef2-bf5a-644c44d883f8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.0.1:/exports/nas-seg-sde
Brick2: 127.1.0.1:/exports/nas-seg-sdf
Brick3: 127.1.0.1:/exports/nas-seg-sdg
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

# /bin/mount -t glusterfs -o acl,log-level=WARNING,noatime 127.1.0.1:/nas-disp-vol /gluster_vol/nas-disp-vol/

# df -h /gluster_vol/nas-disp-vol/
Filesystem               Size  Used Avail Use% Mounted on
127.1.0.1:/nas-disp-vol  512G   12G  501G   3% /gluster_vol/nas-disp-vol
# df /gluster_vol/nas-disp-vol/
Filesystem              1K-blocks     Used Available Use% Mounted on
127.1.0.1:/nas-disp-vol 536608768 12294176 524314592   3% /gluster_vol/nas-disp-vol

# df -h /exports/nas-seg-sd[efg]/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sde        256G  3.4G  253G   2% /exports/nas-seg-sde
/dev/sdf        256G  3.4G  253G   2% /exports/nas-seg-sdf
/dev/sdg        256G  3.4G  253G   2% /exports/nas-seg-sdg
# df /exports/nas-seg-sd[efg]/
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sde       268304384 3462876 264841508   2% /exports/nas-seg-sde
/dev/sdf       268304384 3463448 264840936   2% /exports/nas-seg-sdf
/dev/sdg       268304384 3464024 264840360   2% /exports/nas-seg-sdg

## Initially allocate most of the space to save time:
# fallocate -l 495g /gluster_vol/nas-disp-vol/bigfiles/bigfile.1

## Use up all of the remaining space:
# export DIR=/gluster_vol/nas-disp-vol/eio_files; for ((dir = 1; dir < 10; dir++)); \
    do mkdir -p $DIR/dir.$dir; for ((try = 1;try < 30; try++)); \
    do (sleep 1; while cat /var/log/messages-20210118 >> $DIR/dir.$dir/file.$try; \
    do true; done)& done; done
    
[1] 432039
...
[261] 432553

# cat: write error: Input/output error
cat: write error: Input/output error
cat: write error: Input/output error
...
cat: write error: Input/output error
cat: write error: No space left on device
cat: write error: No space left on device
cat: write error: No space left on device
cat: write error: Input/output error
...
cat: write error: No space left on device

[1]   Done                    ( sleep 1; while cat /var/log/messages-20210118 >> $DIR/dir.$dir/file.$try; do
    true;
done )
...
[261]+  Done                    ( sleep 1; while cat /var/log/messages-20210118 >> $DIR/dir.$dir/file.$try; do
    true;
done )

## Confirm that all of the space is now in use:
# df -h /gluster_vol/nas-disp-vol/
Filesystem               Size  Used Avail Use% Mounted on
127.1.0.1:/nas-disp-vol  512G  512G     0 100% /gluster_vol/nas-disp-vol

# df /gluster_vol/nas-disp-vol/
Filesystem              1K-blocks      Used Available Use% Mounted on
127.1.0.1:/nas-disp-vol 536608768 536608768         0 100% /gluster_vol/nas-disp-vol

# df -h /exports/nas-seg-sd[efg]/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sde        256G  254G  2.6G 100% /exports/nas-seg-sde
/dev/sdf        256G  254G  2.6G 100% /exports/nas-seg-sdf
/dev/sdg        256G  254G  2.6G 100% /exports/nas-seg-sdg
# df /exports/nas-seg-sd[efg]/
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sde       268304384 265650408   2653976 100% /exports/nas-seg-sde
/dev/sdf       268304384 265642100   2662284 100% /exports/nas-seg-sdf
/dev/sdg       268304384 265656388   2647996 100% /exports/nas-seg-sdg

## Free up some space to Gluster has something to work with:
# rm -f /gluster_vol/nas-disp-vol/bigfiles/bigfile.1
# df -h /gluster_vol/nas-disp-vol/
Filesystem               Size  Used Avail Use% Mounted on
127.1.0.1:/nas-disp-vol  512G   17G  495G   4% /gluster_vol/nas-disp-vol
# df -h /exports/nas-seg-sd[efg]/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sde        256G  5.9G  251G   3% /exports/nas-seg-sde
/dev/sdf        256G  5.9G  251G   3% /exports/nas-seg-sdf
/dev/sdg        256G  5.9G  251G   3% /exports/nas-seg-sdg

## Find the files are are inaccessible due to EIO error:
# file /gluster_vol/nas-disp-vol/eio_files/dir.*/file.* 2>&1 |fgrep ERROR:
/gluster_vol/nas-disp-vol/eio_files/dir.1/file.11: ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.1/file.11' (Input/output error)
/gluster_vol/nas-disp-vol/eio_files/dir.1/file.16: ERROR: cannot open `/gluster_vol/nas-disp-vol/eio_files/dir.1/file.16' (Input/output error) 
/gluster_vol/nas-disp-vol/eio_files/dir.1/file.25: ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.1/file.25' (Input/output error)
/gluster_vol/nas-disp-vol/eio_files/dir.1/file.29: ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.1/file.29' (Input/output error)
/gluster_vol/nas-disp-vol/eio_files/dir.1/file.7:  ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.1/file.7' (Input/output error)
/gluster_vol/nas-disp-vol/eio_files/dir.2/file.10: ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.2/file.10' (Input/output error)
...
/gluster_vol/nas-disp-vol/eio_files/dir.9/file.28: ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.9/file.28' (Input/output error)
/gluster_vol/nas-disp-vol/eio_files/dir.9/file.6:  ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.9/file.6' (Input/output error)
/gluster_vol/nas-disp-vol/eio_files/dir.9/file.7:  ERROR: cannot read `/gluster_vol/nas-disp-vol/eio_files/dir.9/file.7' (Input/output error)

## Try to heal things:
# gluster volume heal nas-disp-vol full
# gluster volume heal nas-disp-vol info
Brick 127.1.0.1:/exports/nas-seg-sde
<gfid:34a8dcc1-cccb-473d-bcf5-9503b3df7b10>/file.2
/eio_files/dir.7/file.7
/eio_files/dir.4/file.22
/eio_files/dir.9/file.20
/eio_files/dir.4/file.6
/eio_files/dir.5/file.29
/eio_files/dir.5/file.11
<gfid:740c2c9e-b763-4f79-9d6c-7cbeaee5af8d>/file.8
/eio_files/dir.7/file.21
<gfid:5b8f4fa4-d671-414c-b9f4-aac47b6a4fed>/file.23
/eio_files/dir.7/file.8
/eio_files/dir.8/file.3
/eio_files/dir.5/file.12
/eio_files/dir.5/file.21
/eio_files/dir.4/file.28
/eio_files/dir.7/file.6
/eio_files/dir.6/file.1
/eio_files/dir.6/file.5
/eio_files/dir.3/file.8
/eio_files/dir.9/file.28
/eio_files/dir.1/file.29
/eio_files/dir.7/file.1
/eio_files/dir.8/file.29
/eio_files/dir.7/file.22
/eio_files/dir.6/file.9
<gfid:1d90609b-4151-4ab3-aa94-90dcbce494ca>/file.24
/eio_files/dir.9/file.21
/eio_files/dir.9/file.19
/eio_files/dir.4/file.24
/eio_files/dir.3/file.15
/eio_files/dir.8/file.13
<gfid:5b8f4fa4-d671-414c-b9f4-aac47b6a4fed>/file.22
/eio_files/dir.9/file.27
/eio_files/dir.6/file.13
/eio_files/dir.9/file.6
/eio_files/dir.3/file.22
/eio_files/dir.7/file.20
/eio_files/dir.1/file.16
/eio_files/dir.4/file.9
/eio_files/dir.7/file.27
/eio_files/dir.3/file.12
/eio_files/dir.6/file.19
/eio_files/dir.8/file.9
/eio_files/dir.8/file.5
/eio_files/dir.6/file.14
/eio_files/dir.4/file.1
<gfid:740c2c9e-b763-4f79-9d6c-7cbeaee5af8d>/file.9
/eio_files/dir.9/file.1
/eio_files/dir.7/file.10
/eio_files/dir.7/file.11
/eio_files/dir.2/file.10
/eio_files/dir.4/file.4
/eio_files/dir.1/file.11
/eio_files/dir.5/file.20
/eio_files/dir.2/file.29
<gfid:8bd5d08f-4c04-47ad-b170-e1893467bc3e>/file.26
/eio_files/dir.6/file.26
/eio_files/dir.9/file.7
/eio_files/dir.6/file.7
/eio_files/dir.6/file.21
/eio_files/dir.1/file.25
/eio_files/dir.4/file.29
/eio_files/dir.5/file.26
/eio_files/dir.1/file.7
/eio_files/dir.8/file.27
<gfid:1d90609b-4151-4ab3-aa94-90dcbce494ca>/file.26
/eio_files/dir.4/file.14
/eio_files/dir.7/file.2
/eio_files/dir.6/file.22
/eio_files/dir.8/file.14
<gfid:4ffe3203-5ca5-4419-9b32-9d5cb5d748c7>/file.29
/eio_files/dir.4/file.23
/eio_files/dir.5/file.24
/eio_files/dir.4/file.7
/eio_files/dir.5/file.19
/eio_files/dir.5/file.7
Status: Connected
Number of entries: 76

Brick 127.1.0.1:/exports/nas-seg-sdf
/eio_files/dir.6/file.9
/eio_files/dir.9/file.7
<gfid:34a8dcc1-cccb-473d-bcf5-9503b3df7b10>/file.2
/eio_files/dir.7/file.22
/eio_files/dir.8/file.3
/eio_files/dir.7/file.1
/eio_files/dir.7/file.8
/eio_files/dir.9/file.20
/eio_files/dir.7/file.27
/eio_files/dir.4/file.28
<gfid:5b8f4fa4-d671-414c-b9f4-aac47b6a4fed>/file.22
/eio_files/dir.5/file.12
/eio_files/dir.6/file.1
/eio_files/dir.9/file.21
/eio_files/dir.4/file.6
/eio_files/dir.9/file.19
/eio_files/dir.4/file.7
/eio_files/dir.2/file.29
/eio_files/dir.4/file.22
<gfid:4ffe3203-5ca5-4419-9b32-9d5cb5d748c7>/file.29
/eio_files/dir.5/file.11
/eio_files/dir.1/file.29
/eio_files/dir.5/file.21
/eio_files/dir.3/file.22
/eio_files/dir.4/file.24
/eio_files/dir.9/file.6
/eio_files/dir.8/file.13
/eio_files/dir.9/file.27
/eio_files/dir.4/file.4
/eio_files/dir.2/file.10
/eio_files/dir.7/file.10
/eio_files/dir.7/file.21
/eio_files/dir.6/file.19
/eio_files/dir.6/file.13
/eio_files/dir.3/file.15
/eio_files/dir.7/file.20
/eio_files/dir.5/file.29
/eio_files/dir.4/file.9
<gfid:1d90609b-4151-4ab3-aa94-90dcbce494ca>/file.26
/eio_files/dir.4/file.14
/eio_files/dir.7/file.11
<gfid:740c2c9e-b763-4f79-9d6c-7cbeaee5af8d>/file.8
/eio_files/dir.4/file.1
/eio_files/dir.9/file.1
/eio_files/dir.6/file.26
/eio_files/dir.6/file.5
/eio_files/dir.7/file.7
/eio_files/dir.8/file.9
/eio_files/dir.6/file.14
<gfid:740c2c9e-b763-4f79-9d6c-7cbeaee5af8d>/file.9
/eio_files/dir.3/file.8
/eio_files/dir.9/file.28
<gfid:5b8f4fa4-d671-414c-b9f4-aac47b6a4fed>/file.23
/eio_files/dir.6/file.22
/eio_files/dir.7/file.6
/eio_files/dir.7/file.2
/eio_files/dir.6/file.21
/eio_files/dir.3/file.12
/eio_files/dir.1/file.11
/eio_files/dir.8/file.14
/eio_files/dir.5/file.7
/eio_files/dir.4/file.23
/eio_files/dir.5/file.26
/eio_files/dir.6/file.7
/eio_files/dir.5/file.19
/eio_files/dir.8/file.5
/eio_files/dir.8/file.27
<gfid:1d90609b-4151-4ab3-aa94-90dcbce494ca>/file.24
<gfid:8bd5d08f-4c04-47ad-b170-e1893467bc3e>/file.26
/eio_files/dir.5/file.24
/eio_files/dir.8/file.29
/eio_files/dir.4/file.29
/eio_files/dir.5/file.20
/eio_files/dir.1/file.16
/eio_files/dir.1/file.25
/eio_files/dir.1/file.7
Status: Connected
Number of entries: 76

Brick 127.1.0.1:/exports/nas-seg-sdg
/eio_files/dir.4/file.1
/eio_files/dir.7/file.7
/eio_files/dir.9/file.21
/eio_files/dir.6/file.5
/eio_files/dir.6/file.9
/eio_files/dir.4/file.22
/eio_files/dir.5/file.21
/eio_files/dir.9/file.20
/eio_files/dir.5/file.11
<gfid:740c2c9e-b763-4f79-9d6c-7cbeaee5af8d>/file.8
/eio_files/dir.9/file.6
/eio_files/dir.7/file.1
/eio_files/dir.7/file.8
/eio_files/dir.1/file.29
/eio_files/dir.7/file.10
/eio_files/dir.2/file.29
/eio_files/dir.8/file.13
/eio_files/dir.4/file.6
/eio_files/dir.4/file.24
/eio_files/dir.3/file.15
/eio_files/dir.4/file.23
/eio_files/dir.9/file.27
/eio_files/dir.8/file.9
/eio_files/dir.5/file.12
/eio_files/dir.9/file.28
/eio_files/dir.3/file.22
/eio_files/dir.2/file.10
/eio_files/dir.9/file.19
/eio_files/dir.6/file.13
/eio_files/dir.8/file.14
/eio_files/dir.7/file.27
/eio_files/dir.7/file.22
<gfid:34a8dcc1-cccb-473d-bcf5-9503b3df7b10>/file.2
/eio_files/dir.5/file.29
/eio_files/dir.7/file.21
/eio_files/dir.4/file.4
/eio_files/dir.6/file.19
/eio_files/dir.7/file.11
<gfid:5b8f4fa4-d671-414c-b9f4-aac47b6a4fed>/file.22
/eio_files/dir.7/file.2
/eio_files/dir.7/file.20
/eio_files/dir.6/file.26
/eio_files/dir.4/file.9
<gfid:8bd5d08f-4c04-47ad-b170-e1893467bc3e>/file.26
/eio_files/dir.3/file.12
<gfid:4ffe3203-5ca5-4419-9b32-9d5cb5d748c7>/file.29
/eio_files/dir.6/file.1
/eio_files/dir.1/file.11
/eio_files/dir.8/file.3
/eio_files/dir.6/file.14
/eio_files/dir.4/file.28
<gfid:1d90609b-4151-4ab3-aa94-90dcbce494ca>/file.24
<gfid:5b8f4fa4-d671-414c-b9f4-aac47b6a4fed>/file.23
/eio_files/dir.4/file.29
/eio_files/dir.1/file.7
/eio_files/dir.9/file.7
/eio_files/dir.6/file.21
/eio_files/dir.5/file.7
<gfid:740c2c9e-b763-4f79-9d6c-7cbeaee5af8d>/file.9
/eio_files/dir.4/file.14
/eio_files/dir.9/file.1
/eio_files/dir.5/file.24
/eio_files/dir.6/file.7
/eio_files/dir.5/file.26
/eio_files/dir.6/file.22
/eio_files/dir.3/file.8
/eio_files/dir.1/file.25
<gfid:1d90609b-4151-4ab3-aa94-90dcbce494ca>/file.26
/eio_files/dir.5/file.19
/eio_files/dir.5/file.20
/eio_files/dir.4/file.7
/eio_files/dir.8/file.29
/eio_files/dir.8/file.5
/eio_files/dir.1/file.16
/eio_files/dir.7/file.6
/eio_files/dir.8/file.27
Status: Connected
Number of entries: 76

## Healing didnt help:
# file /gluster_vol/nas-disp-vol/eio_files/dir.*/file.* 2>&1 |fgrep ERROR: | wc -l
67

# gluster volume heal nas-disp-vol info|tail
/eio_files/dir.5/file.20
/eio_files/dir.4/file.7
/eio_files/dir.8/file.29
/eio_files/dir.8/file.5
/eio_files/dir.1/file.16
/eio_files/dir.7/file.6
/eio_files/dir.8/file.27
Status: Connected
Number of entries: 76

## See if stopping/starting the volume will help:
# gluster volume stop nas-disp-vol
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: nas-disp-vol: success
# gluster volume start nas-disp-vol
volume start: nas-disp-vol: success

## Nop!
# file /gluster_vol/nas-disp-vol/eio_files/dir.*/file.* 2>&1 |fgrep ERROR: | wc -l
67

# gluster volume heal nas-disp-vol info|head
Brick 127.1.0.1:/exports/nas-seg-sde
<gfid:f6942289-b181-426d-86cb-a1be0735de99>
/eio_files/dir.7/file.7
/eio_files/dir.4/file.22
/eio_files/dir.9/file.20
/eio_files/dir.4/file.6
/eio_files/dir.5/file.29
/eio_files/dir.5/file.11
<gfid:c1483db1-c7f3-42be-a1eb-6833dcdd2552>
/eio_files/dir.7/file.21

## Check a bad file:
# getfattr -m. -d -e hex /exports/nas-seg-sd*/eio_files/dir.7/file.7
getfattr: Removing leading '/' from absolute path names
# file: exports/nas-seg-sde/eio_files/dir.7/file.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.size=0x00000000013785d8
trusted.ec.version=0x00000000000000b400000000000000b4
trusted.gfid=0x76c7cf4c1b1a4448ae49e006d70e636b
trusted.gfid2path.aa639e8db8d5c96f=0x38373334393537622d333133302d346637652d613039652d3666623936636464613337362f66696c652e37
trusted.glusterfs.mdata=0x010000000000000000000000006006200d0000000024e8ac5f000000006006200d0000000024e8ac5f0000000060061f68000000002e12c736

# file: exports/nas-seg-sdf/eio_files/dir.7/file.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.size=0x00000000013785d8
trusted.ec.version=0x00000000000000b400000000000000b4
trusted.gfid=0x76c7cf4c1b1a4448ae49e006d70e636b
trusted.gfid2path.aa639e8db8d5c96f=0x38373334393537622d333133302d346637652d613039652d3666623936636464613337362f66696c652e37
trusted.glusterfs.mdata=0x010000000000000000000000006006200d00000000027308ba000000006006200d00000000027308ba0000000060061f68000000002e12c736

# file: exports/nas-seg-sdg/eio_files/dir.7/file.7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000010000000000000001
trusted.ec.size=0x00000000013785d8
trusted.ec.version=0x00000000000000b400000000000000b4
trusted.gfid=0x76c7cf4c1b1a4448ae49e006d70e636b
trusted.gfid2path.aa639e8db8d5c96f=0x38373334393537622d333133302d346637652d613039652d3666623936636464613337362f66696c652e37
trusted.glusterfs.mdata=0x010000000000000000000000006006200e0000000002789574000000006006200e00000000027895740000000060061f68000000002e12c736

# ls -l /exports/nas-seg-sd*/eio_files/dir.7/file.7
-rw-r--r--. 2 root root 10469376 Jan 18 15:55 /exports/nas-seg-sde/eio_files/dir.7/file.7
-rw-r--r--. 2 root root 10403840 Jan 18 15:55 /exports/nas-seg-sdf/eio_files/dir.7/file.7
-rw-r--r--. 2 root root 10491904 Jan 18 15:55 /exports/nas-seg-sdg/eio_files/dir.7/file.7

## Check another bad file:
# getfattr -m. -d -e hex /exports/nas-seg-sd*/eio_files/dir.4/file.6
getfattr: Removing leading '/' from absolute path names
# file: exports/nas-seg-sde/eio_files/dir.4/file.6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.size=0x00000000013785d8
trusted.ec.version=0x00000000000000b400000000000000b4
trusted.gfid=0x69222b676ae640bcb9c4c52d0f428a6d
trusted.gfid2path.23fcf9f3291da27a=0x31366130303133332d383465312d346236372d613266382d6139613063376434303335392f66696c652e36
trusted.glusterfs.mdata=0x010000000000000000000000006006200d0000000025626e17000000006006200d0000000025626e170000000060061f6900000000023340e6

# file: exports/nas-seg-sdf/eio_files/dir.4/file.6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.size=0x00000000013785d8
trusted.ec.version=0x00000000000000b400000000000000b4
trusted.gfid=0x69222b676ae640bcb9c4c52d0f428a6d
trusted.gfid2path.23fcf9f3291da27a=0x31366130303133332d383465312d346236372d613266382d6139613063376434303335392f66696c652e36
trusted.glusterfs.mdata=0x010000000000000000000000006006200c000000001854df6b000000006006200c000000001854df6b0000000060061f6900000000023340e6

# file: exports/nas-seg-sdg/eio_files/dir.4/file.6
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000020000000000000002
trusted.ec.size=0x00000000013785d8
trusted.ec.version=0x00000000000000b400000000000000b4
trusted.gfid=0x69222b676ae640bcb9c4c52d0f428a6d
trusted.gfid2path.23fcf9f3291da27a=0x31366130303133332d383465312d346236372d613266382d6139613063376434303335392f66696c652e36
trusted.glusterfs.mdata=0x010000000000000000000000006006200e00000000028a082a000000006006200e00000000028a082a0000000060061f6900000000023340e6

# ls -l /exports/nas-seg-sd*/eio_files/dir.4/file.6
-rw-r--r--. 2 root root 10338304 Jan 18 15:55 /exports/nas-seg-sde/eio_files/dir.4/file.6
-rw-r--r--. 2 root root 10272768 Jan 18 15:55 /exports/nas-seg-sdf/eio_files/dir.4/file.6
-rw-r--r--. 2 root root 10403840 Jan 18 15:55 /exports/nas-seg-sdg/eio_files/dir.4/file.6

## Check a good file:
# file /gluster_vol/nas-disp-vol/eio_files/dir.1/file.1
/gluster_vol/nas-disp-vol/eio_files/dir.1/file.1: ASCII text, with very long lines

# getfattr -m. -d -e hex /exports/nas-seg-sd*/eio_files/dir.1/file.1
getfattr: Removing leading '/' from absolute path names
# file: exports/nas-seg-sde/eio_files/dir.1/file.1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000000001557000
trusted.ec.version=0x00000000000000c700000000000000c7
trusted.gfid=0x5c903feeaf8049388bd6781ed9d5cbca
trusted.gfid2path.4c6a57034b963fb3=0x33303262656331312d393730372d343337312d383839632d3033613030333663393862302f66696c652e31
trusted.glusterfs.mdata=0x010000000000000000000000006006200d00000000357cbb92000000006006200d00000000357cbb920000000060061f670000000031f957a9

# file: exports/nas-seg-sdf/eio_files/dir.1/file.1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000000001557000
trusted.ec.version=0x00000000000000c700000000000000c7
trusted.gfid=0x5c903feeaf8049388bd6781ed9d5cbca
trusted.gfid2path.4c6a57034b963fb3=0x33303262656331312d393730372d343337312d383839632d3033613030333663393862302f66696c652e31
trusted.glusterfs.mdata=0x010000000000000000000000006006200d00000000357cbb92000000006006200d00000000357cbb920000000060061f670000000031f957a9

# file: exports/nas-seg-sdg/eio_files/dir.1/file.1
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x00000000000000000000000000000000
trusted.ec.size=0x0000000001557000
trusted.ec.version=0x00000000000000c700000000000000c7
trusted.gfid=0x5c903feeaf8049388bd6781ed9d5cbca
trusted.gfid2path.4c6a57034b963fb3=0x33303262656331312d393730372d343337312d383839632d3033613030333663393862302f66696c652e31
trusted.glusterfs.mdata=0x010000000000000000000000006006200d00000000357cbb92000000006006200d00000000357cbb920000000060061f670000000031f957a9

# ls -l /exports/nas-seg-sd*/eio_files/dir.1/file.1
-rw-r--r--. 2 root root 11188224 Jan 18 15:55 /exports/nas-seg-sde/eio_files/dir.1/file.1
-rw-r--r--. 2 root root 11188224 Jan 18 15:55 /exports/nas-seg-sdf/eio_files/dir.1/file.1
-rw-r--r--. 2 root root 11188224 Jan 18 15:55 /exports/nas-seg-sdg/eio_files/dir.1/file.1

## For good healing, the EIO files need to be deleted. This is
## unfortunate, as although the files had append write failures,
## the first part of the files would still be intact and usable.

## But in some cases, even after deleting all of the files,
healing never becomes successful:

# rm -rf /gluster_vol/nas-disp-vol/eio_files/dir.*
# gluster volume heal nas-disp-vol full
# gluster volume heal nas-disp-vol info
Brick 127.1.0.1:/exports/nas-seg-sde
<gfid:f6942289-b181-426d-86cb-a1be0735de99>
<gfid:c1483db1-c7f3-42be-a1eb-6833dcdd2552>
<gfid:0696beae-dff3-4abe-9b84-7c31194b3a24>
<gfid:ee4e25c7-19a0-44d6-9610-b5c7008b2f4e>
<gfid:58f0024e-932e-41c6-b1cc-b29ea97be730>
<gfid:a58fff89-bdc7-4017-a32d-26a0bc13df4c>
<gfid:950e0b66-f683-4a72-a8db-e08e3b85f448>
<gfid:8091aa0a-4b6e-45b2-ba42-223a9ddcdc93>
<gfid:2b4e10c2-174c-469b-9fee-a984f73a810f>
Status: Connected
Number of entries: 9

Brick 127.1.0.1:/exports/nas-seg-sdf
<gfid:f6942289-b181-426d-86cb-a1be0735de99>
<gfid:58f0024e-932e-41c6-b1cc-b29ea97be730>
<gfid:2b4e10c2-174c-469b-9fee-a984f73a810f>
<gfid:8091aa0a-4b6e-45b2-ba42-223a9ddcdc93>
<gfid:c1483db1-c7f3-42be-a1eb-6833dcdd2552>
<gfid:a58fff89-bdc7-4017-a32d-26a0bc13df4c>
<gfid:0696beae-dff3-4abe-9b84-7c31194b3a24>
<gfid:ee4e25c7-19a0-44d6-9610-b5c7008b2f4e>
<gfid:950e0b66-f683-4a72-a8db-e08e3b85f448>
Status: Connected
Number of entries: 9

Brick 127.1.0.1:/exports/nas-seg-sdg
<gfid:c1483db1-c7f3-42be-a1eb-6833dcdd2552>
<gfid:f6942289-b181-426d-86cb-a1be0735de99>
<gfid:58f0024e-932e-41c6-b1cc-b29ea97be730>
<gfid:950e0b66-f683-4a72-a8db-e08e3b85f448>
<gfid:2b4e10c2-174c-469b-9fee-a984f73a810f>
<gfid:ee4e25c7-19a0-44d6-9610-b5c7008b2f4e>
<gfid:0696beae-dff3-4abe-9b84-7c31194b3a24>
<gfid:a58fff89-bdc7-4017-a32d-26a0bc13df4c>
<gfid:8091aa0a-4b6e-45b2-ba42-223a9ddcdc93>
Status: Connected
Number of entries: 9

## Apparently when this happens, the volume needs to be
deleted and recreated to ever be healthy again?

**Mandatory info:**
# gluster volume info nas-disp-vol
Volume Name: nas-disp-vol
Type: Disperse
Volume ID: 350f7f4b-ef91-4ef2-bf5a-644c44d883f8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.0.1:/exports/nas-seg-sde
Brick2: 127.1.0.1:/exports/nas-seg-sdf
Brick3: 127.1.0.1:/exports/nas-seg-sdg
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

# gluster volume status nas-disp-vol
Status of volume: nas-disp-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 127.1.0.1:/exports/nas-seg-sde        49152     0          Y       442620
Brick 127.1.0.1:/exports/nas-seg-sdf        49153     0          Y       442636
Brick 127.1.0.1:/exports/nas-seg-sdg        49154     0          Y       442652
Self-heal Daemon on localhost               N/A       N/A        Y       442669

Task Status of Volume nas-disp-vol
------------------------------------------------------------------------------
There are no active volume tasks

No core files.

glusterfs 8.3
# uname -a
Linux Centos8x 4.18.0-240.1.1.el8_3.x86_64 #1 SMP Thu Nov 19 17:20:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

@stale
Copy link

stale bot commented Aug 17, 2021

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Aug 17, 2021
@JeffByers-SF
Copy link
Author

Just because nobody is able to work on this does not mean that this is not a valid issue. This should not be summarily closed by a bot!

@mohit84
Copy link
Contributor

mohit84 commented Aug 17, 2021

I will try to reproduce and update.

@stale stale bot removed the wontfix Managed by stale[bot] label Aug 17, 2021
@aspandey
Copy link
Member

This is an expected behavior.
When IO's were happening and when space was about to filled up, some write fops failed on one brick while
for other files it failed on some other brick. As this is a 2+1 ec volume redundancy is very low also (anyway in this case it would not helped).
So at the moment, there could be cases where a file could be having different xattrs (mainly trusted.ec.size and trusted.ec.version) on all the three bricks. This is causing EIO error.
Even if there are few files which could be healed, SHD would not be able to do so as there is no space left and it will also get ENOSPACE error.

This is not the right way of configuring bricks, you should have set a limit on bricks so that it could have given a warning before filling it UP.

Now, to resolve this you might have to create space on all the bricks (disks) which looks like not possible. So you may have to delete some data manually from backend. I would have deleted those files which are causing EIO.
This certainly means data loss but this is what we can do at this moment, according to my understanding.

@mohit84
Copy link
Contributor

mohit84 commented Aug 19, 2021

Thanks ashish for your response.

@stale
Copy link

stale bot commented Mar 17, 2022

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Mar 17, 2022
@stale
Copy link

stale bot commented Apr 3, 2022

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this as completed Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix Managed by stale[bot]
Projects
None yet
Development

No branches or pull requests

3 participants