mount: add retry for read only case #4416

wusongANKANG · 2023-04-21T04:21:27Z

What problem are we solving?

When the disk space corresponding to a volume server is less than "minfreespace" (default 1%), "isDiskSpaceLow" will be set to true, and all volumes corresponding to the node are read only.

However, it takes 5 seconds for the volume to become read only until the master perceives "read only" through the heartbeat information. During this period, if the master allocates fid to the volume on this node, the write will fail.

According to my analysis, #4381 ， #3628 , #3345 failed to write because of this reason.

How are we solving the problem?

We add retry to mount for this situation, so that the master can assign fids to other nodes to avoid write failures.

How is the PR tested?

env: more than 4 volume server nodes, ReplicaPlacement=002

When writing data through mount, use large files to fill up the disk corresponding to a volume server. At this time, all volumes on the node are read only. At this time, the mount will retry, and the write will succeed after the master assigns the fid to other nodes.

Checks

I have added unit tests if possible.
I will add related wiki document changes and link to this PR after merging.

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>

chrislusf · 2023-04-21T05:11:45Z

Nice finding!

For this fix, the application logic is leaked into the util function. Is there any less hacky way to fix this?

wusongANKANG · 2023-04-21T05:29:51Z

@chrislusf At present it seems that the only way to solve this problem is by retrying.

In order to avoid affecting the original util function logic, would it be better to write a new util retry function for retrying mount upload data?

chrislusf · 2023-04-21T05:34:21Z

you can create a retry function with a list of errors that should be retried.

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>

wusongANKANG · 2023-04-21T05:44:23Z

@chrislusf added

chrislusf · 2023-04-21T05:54:36Z

the retry function can take a list of errors as input parameter. If these errors are encountered, the operation should retry.

the recent change hard coded the errors, which is not so nice.

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>

wusongANKANG · 2023-04-21T06:51:06Z

@chrislusf error list added

mount: add retry for read only case

834dcca

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>

add new util retry function for mount

eb8e0d4

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>

change error list param

c46df76

Signed-off-by: Wusong Wang <wangwusong@virtaitech.com>

chrislusf merged commit 19245dd into seaweedfs:master Apr 21, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mount: add retry for read only case #4416

mount: add retry for read only case #4416

wusongANKANG commented Apr 21, 2023

chrislusf commented Apr 21, 2023

wusongANKANG commented Apr 21, 2023

chrislusf commented Apr 21, 2023

wusongANKANG commented Apr 21, 2023

chrislusf commented Apr 21, 2023

wusongANKANG commented Apr 21, 2023

mount: add retry for read only case #4416

mount: add retry for read only case #4416

Conversation

wusongANKANG commented Apr 21, 2023

What problem are we solving?

How are we solving the problem?

How is the PR tested?

Checks

chrislusf commented Apr 21, 2023

wusongANKANG commented Apr 21, 2023

chrislusf commented Apr 21, 2023

wusongANKANG commented Apr 21, 2023

chrislusf commented Apr 21, 2023

wusongANKANG commented Apr 21, 2023