mount: add retry for read only case #4416
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem are we solving?
When the disk space corresponding to a volume server is less than "minfreespace" (default 1%), "isDiskSpaceLow" will be set to true, and all volumes corresponding to the node are read only.
However, it takes 5 seconds for the volume to become read only until the master perceives "read only" through the heartbeat information. During this period, if the master allocates fid to the volume on this node, the write will fail.
According to my analysis, #4381 , #3628 , #3345 failed to write because of this reason.
How are we solving the problem?
We add retry to mount for this situation, so that the master can assign fids to other nodes to avoid write failures.
How is the PR tested?
env: more than 4 volume server nodes, ReplicaPlacement=002
When writing data through mount, use large files to fill up the disk corresponding to a volume server. At this time, all volumes on the node are read only. At this time, the mount will retry, and the write will succeed after the master assigns the fid to other nodes.
Checks