Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount: add retry for read only case #4416

Merged
merged 3 commits into from
Apr 21, 2023

Conversation

wusongANKANG
Copy link
Contributor

What problem are we solving?

When the disk space corresponding to a volume server is less than "minfreespace" (default 1%), "isDiskSpaceLow" will be set to true, and all volumes corresponding to the node are read only.

However, it takes 5 seconds for the volume to become read only until the master perceives "read only" through the heartbeat information. During this period, if the master allocates fid to the volume on this node, the write will fail.

According to my analysis, #4381#3628 , #3345 failed to write because of this reason.

How are we solving the problem?

We add retry to mount for this situation, so that the master can assign fids to other nodes to avoid write failures.

How is the PR tested?

env: more than 4 volume server nodes, ReplicaPlacement=002

When writing data through mount, use large files to fill up the disk corresponding to a volume server. At this time, all volumes on the node are read only. At this time, the mount will retry, and the write will succeed after the master assigns the fid to other nodes.

Checks

  • I have added unit tests if possible.
  • I will add related wiki document changes and link to this PR after merging.

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>
@chrislusf
Copy link
Collaborator

Nice finding!

For this fix, the application logic is leaked into the util function. Is there any less hacky way to fix this?

@wusongANKANG
Copy link
Contributor Author

@chrislusf At present it seems that the only way to solve this problem is by retrying.

In order to avoid affecting the original util function logic, would it be better to write a new util retry function for retrying mount upload data?

@chrislusf
Copy link
Collaborator

you can create a retry function with a list of errors that should be retried.

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>
@wusongANKANG
Copy link
Contributor Author

@chrislusf added

@chrislusf
Copy link
Collaborator

the retry function can take a list of errors as input parameter. If these errors are encountered, the operation should retry.

the recent change hard coded the errors, which is not so nice.

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>
@wusongANKANG
Copy link
Contributor Author

@chrislusf error list added

@chrislusf chrislusf merged commit 19245dd into seaweedfs:master Apr 21, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants