Skip to content

Commit

Permalink
storageServer: Shorten NFS timeouts
Browse files Browse the repository at this point in the history
Change the default timeout to 10 seconds, and number of retransmissions
to 3. This should result in 60 seconds timeout before NFS request will
fail, similar to the behaviour in multipath.

According to nfs(5), NFS will retry a request after timeo deciseconds
(timeo / 10 seonds). After each retransmission, the timeout is increased
by timeo value (up to maximum of 600 seconds). After retrans retires,
the NFS client will fail with "server not responding" message.

This is the expected failure flow:

00:00   retry 1 (10 seconds timeout)
00:10   retry 2 (20 seconds timeout)
00:30   retry 3 (30 seconds timeout)
01:00   request fail

In the past we were using timeout=600, retrans=6, which resulted in 21
minutes timeout:

00:00   retry 1 (60 seconds timeout)
01:00   retry 2 (120 seconds timeout)
03:00   retry 3 (180 seconds timeout)
06:00   retry 4 (240 seconds timeout)
10:00   retry 5 (300 seconds timeout)
15:00   retry 6 (360 seconds timeout)
21:00   request fail

Testing show that storage monitor was blocked on storage for 270 seconds
instead of the expected 60 seconds. So this is an improvement compared
with 20-30 minutes seen with previous settings.

VM running on the blocked NFS storage failed to resume after unblocking
storage. This typically works with block storage.

So this looks like an improvement, but more work may be needed.

Change-Id: I29ad896e3e14e8a00edcdbec53226388281fec46
Bug-Url: https://bugzilla.redhat.com/1569926
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
  • Loading branch information
nirs committed Apr 23, 2020
1 parent 3ff3049 commit 672a98b
Showing 1 changed file with 20 additions and 1 deletion.
21 changes: 20 additions & 1 deletion lib/vdsm/storage/storageServer.py
Expand Up @@ -378,8 +378,27 @@ def version(self):
# Return -1 to signify the version has not been negotiated yet
return -1

def __init__(self, id, export, timeout=600, retrans=6, version=None,
def __init__(self, id, export, timeout=100, retrans=3, version=None,
extraOptions=""):
"""
According to nfs(5), NFS will retry a request after 100 deciseconds (10
seconds). After each retransmission, the timeout is increased by timeo
value (up to maximum of 600 seconds). After retrans retires, the NFS
client will fail with "server not responding" message.
With the default configuration we expect failures in 60 seconds, which
is about 3 times longer than multipath timeout (20 seconds) for block
storage.
00:00 retry 1 (10 seconds timeout)
00:10 retry 2 (20 seconds timeout)
00:30 retry 3 (30 seconds timeout)
01:00 request fail
WARNNING: timeout value must not be smaller than sanlock iotimeout (10
seconds). Using smaller value may cause sanlock to fail to renew
leases.
"""
self._remotePath = normpath(export)
options = self.DEFAULT_OPTIONS[:]
self._timeout = timeout
Expand Down

0 comments on commit 672a98b

Please sign in to comment.