[bug:1784402] storage.reserve ignored by self-heal so that bricks are 100% full #869

gluster-ant · 2020-03-12T12:19:00Z

URL: https://bugzilla.redhat.com/1784402
Creator: david.spisla at iternity
Time: 20191217T11:16:35

Created attachment 1645849
Gluster vo info and status, df -hT, heal info, logs of glfsheal and all related bricks

Description of problem:
Setup: 3-Node VMWare Cluster (2 Storage Nodes and 1 Arbiter Node), Distribute-Replica 2 Volume with 1 Arbiter brick per Replica-Tupel (see attached file for the detail configuration).

Version-Release number of selected component (if applicable):
Gluster FS v5.10

How reproducible:
Steps to Reproduce:

Mount volume from a dedicated client machine
Disable network of node 2
Write to node 1 in the volume until it is full. The storage.reserve limit of the local bricks should take effect and the bricks should therefore be +-1% empty.
Disable network of node 1
Enable network of node 2
Write to node 2 in the same volume, but write the data into another subfolder or use completely different data. Otherwise one would get an Split-brain error which is not the issue here. Also write data until the bricks reaches the storage.reserve limit.
Now the volume is filled up with twice the amount of data
Enable network of node 1

Actual results:
storage.reserve was ignored and all bricks are 100% full within a few seconds. All brick processes died. Volume not mountable and can not trigger heal.

Expected results:
self-heal process should be blocked by storage.reserve and brick processes still running and volume is accessible.

Additional info:
See attached file

The above scenario was not only reproduced on a VM Cluster. We could also monitor it on a real HW Cluster

gluster-ant · 2020-03-12T12:19:01Z

Time: 20191223T04:35:51
sankarshan.mukhopadhyay at gmail commented:
Question for the assigned maintainer/developer - (1) can this be reproduced in a newer release (2) is this something that was known for this specific release as reported? Please review (2) in terms of how, if at all, a recovery sequence can be made available so as to not cause this space exhaustion issue.

gluster-ant · 2020-03-12T12:19:02Z

Time: 20191223T11:09:17
ravishankar at redhat commented:
I think this behaviour is peculiar to arbiter volumes (as opposed to replica 3) as arbiter does not store data. If it had been a normal replica 3, then step-6 in the description would have failed because node 3 would have been full. Mohit, what is your take on the bug?

gluster-ant · 2020-03-12T12:19:02Z

Time: 20191223T12:15:28
moagrawa at redhat commented:
storage.reserve restriction check is applicable only for an external client not for an internal client.
I think it is an internal client responsibility before writing the data to check disk space.

gluster-ant · 2020-03-12T12:19:03Z

Time: 20191224T04:46:03
ravishankar at redhat commented:
That would be a leak in abstraction for an option that is per brick specific. It looks like you added the check for internal clients via BZ 1506083 but I can't find any specific problem in the BZ. One problem is that if we subject writes from self-heal also to the same check, then with the case described in this bug, heals would never be able to complete. But that is not any different than the case where this option is not enabled but the I/O was pumped till the disk was full. So maybe we should allow internal clients as well?

gluster-ant · 2020-03-12T12:19:04Z

Time: 20191224T04:49:24
ravishankar at redhat commented:
(In reply to Ravishankar N from comment #4)

So maybe we should allow internal clients as well?
Sorry I mean we should not allow internal clients as well.

gluster-ant · 2020-03-12T12:19:05Z

Time: 20191224T05:12:13
moagrawa at redhat commented:
We can't ignore fops for internal client otherwise there was no requirement to implement this feature.
We restricted for an internal client because the feature was primarily implemented for rebalance daemon.
At the time of adding a brick rebalance daemon needs some space at the backend for rebalancing the data so
we put a check to ignore the internal client.

stale · 2020-10-08T22:05:18Z

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

stale · 2020-10-23T22:44:41Z

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

gluster-ant added Migrated Type:Bug labels Mar 12, 2020

stale bot added the wontfix Managed by stale[bot] label Oct 8, 2020

stale bot closed this as completed Oct 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug:1784402] storage.reserve ignored by self-heal so that bricks are 100% full #869

[bug:1784402] storage.reserve ignored by self-heal so that bricks are 100% full #869

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

stale bot commented Oct 8, 2020

stale bot commented Oct 23, 2020

[bug:1784402] storage.reserve ignored by self-heal so that bricks are 100% full #869

[bug:1784402] storage.reserve ignored by self-heal so that bricks are 100% full #869

Comments

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

gluster-ant commented Mar 12, 2020

stale bot commented Oct 8, 2020

stale bot commented Oct 23, 2020