Refactor client RPC timeouts #14965

kisunji · 2022-10-12T18:26:51Z

Description

This PR has two objectives:

Fix an issue identified in rpc_hold_timeout has been reused for client RPC call timeout #14732 where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration rpc_client_timeout was created.
Refactor some implementation from the original PR Add timeout to Client RPC calls #11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.

Testing & Reproduction steps

Added test cases
Manually tested with a client-server setup where I ran the following queries against the client concurrently and confirmed deadlines were respected:
- Long non-blocking query (KVS.Get modified to sleep on a certain key)
- Short non-blocking query (KVS.Get)
- Blocking query (KVS.Get)

PR Checklist

updated test coverage
external facing docs updated
not a security concern

kisunji · 2022-10-13T20:58:41Z

agent/pool/pool.go

@@ -364,7 +364,7 @@ func (p *ConnPool) dial(
 	tlsRPCType RPCType,
 ) (net.Conn, HalfCloser, error) {
 	// Try to dial the conn
-	d := &net.Dialer{LocalAddr: p.SrcAddr, Timeout: p.Timeout}
+	d := &net.Dialer{LocalAddr: p.SrcAddr, Timeout: DefaultDialTimeout}


This was changed in #11500 but I reverted it back to DefaultDialTimeout to be consistent with other usages.

kisunji · 2022-10-13T21:27:39Z

agent/pool/pool.go

-func (c *TimeoutConn) Read(b []byte) (int, error) {
-	timeout := c.DefaultTimeout
-	// Apply timeout to first read then zero it out
-	if c.FirstReadTimeout > 0 {
-		timeout = c.FirstReadTimeout
-		c.FirstReadTimeout = 0
-	}
-	var deadline time.Time
-	if timeout > 0 {
-		deadline = time.Now().Add(timeout)
-	}
-	if err := c.Conn.SetReadDeadline(deadline); err != nil {
-		return 0, err
-	}
-	return c.Conn.Read(b)
-}


This type was introduced in #11500 but while debugging timeout issues I found this difficult to understand.

Now, instead of a wrapped net.Conn which sets a deadline per Read call, we simply set the deadline during the rpc call

kisunji · 2022-10-13T21:29:01Z

agent/pool/pool.go

+type BlockableQuery interface {
+	// BlockingTimeout returns duration > 0 if the query is blocking.
+	// Otherwise returns 0 for non-blocking queries.
+	BlockingTimeout(maxQueryTime, defaultQueryTime time.Duration) time.Duration


I found that narrowly defining this interface was better than giving RPCInfo a Timeout method and forcing every type to return a timeout (in most cases is rpc_hold_timeout)

kisunji · 2022-10-13T21:36:29Z

agent/consul/client_test.go

+		var out struct{}
+		err := c1.RPC("Long.Wait", &structs.NodeSpecificRequest{}, &out)
+		require.Error(t, err)
+		require.Contains(t, err.Error(), "rpc error making call: i/o deadline reached")
+		// We use structs.KVSRequest, which does not implement pool.BlockableQuery
+		// and should have no timeouts defined.
+		require.NoError(t, c1.RPC("Long.Wait", &structs.KVSRequest{}, &out))


I wanted to sanity check that the setting the deadline on the stream connection did not persist when the client was cached in the connection pool.

huikang

Do we need to add the new field to api?

kisunji · 2022-10-14T12:51:58Z

Do we need to add the new field to api?

No, the new fields are config and don't need to be in api

kisunji · 2022-10-14T12:57:14Z

agent/pool/pool.go

@@ -31,7 +32,7 @@ type muxSession interface {

 // streamClient is used to wrap a stream with an RPC client
 type StreamClient struct {
-	stream *TimeoutConn
+	stream net.Conn


This is simply reverting a change from #11500

banks

Overall this looks good.

Couple of small fixes noted inline but should be quick!

agent/consul/client.go

agent/pool/pool.go

banks

Amazing job @kisunji 🥳

Thanks for patiently going through those extra bits to add reloading etc. This seems solid now!

agent/pool/pool.go

agent/consul/server.go

Co-authored-by: Paul Banks <banks@banksco.de>

Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts. (cherry picked from commit 29a297d)

Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts. (cherry picked from commit 29a297d) Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts. (cherry picked from commit 29a297d)

freddygv · 2022-10-18T21:33:55Z

@kisunji can #14732 be closed now?

kisunji · 2022-10-18T21:36:23Z

@kisunji can #14732 be closed now?

Yeah I'm handling the backporting. Will close when everything is completely merged

Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts. (cherry picked from commit 29a297d)

kisunji force-pushed the kisunji/NET-1092 branch 7 times, most recently from 76d85fc to 03c53f9 Compare October 13, 2022 17:59

vercel bot deployed to Preview – consul October 13, 2022 18:08 View deployment

kisunji force-pushed the kisunji/NET-1092 branch from 03c53f9 to ed4c9ca Compare October 13, 2022 18:43

vercel bot deployed to Preview – consul October 13, 2022 18:48 View deployment

kisunji force-pushed the kisunji/NET-1092 branch from ed4c9ca to 41a7c7a Compare October 13, 2022 19:00

vercel bot deployed to Preview – consul October 13, 2022 19:06 View deployment

kisunji marked this pull request as ready for review October 13, 2022 20:30

kisunji requested a review from a team as a code owner October 13, 2022 20:30

Refactor client RPC timeouts

40e5e49

kisunji force-pushed the kisunji/NET-1092 branch from 41a7c7a to 40e5e49 Compare October 13, 2022 20:45

vercel bot deployed to Preview – consul October 13, 2022 20:51 View deployment

Add changelog

93800b6

kisunji commented Oct 13, 2022

View reviewed changes

Remove unused type

22ca463

kisunji commented Oct 13, 2022

View reviewed changes

kisunji requested review from a team, DanStough and dhiaayachi and removed request for a team October 13, 2022 21:30

kisunji commented Oct 13, 2022

View reviewed changes

huikang reviewed Oct 13, 2022

View reviewed changes

kisunji commented Oct 14, 2022

View reviewed changes

banks requested changes Oct 18, 2022

View reviewed changes

agent/consul/client.go Show resolved Hide resolved

agent/pool/pool.go Outdated Show resolved Hide resolved

agent/pool/pool.go Outdated Show resolved Hide resolved

agent/pool/pool.go Outdated Show resolved Hide resolved

PR feedback

e061482

kisunji requested a review from banks October 18, 2022 15:54

banks approved these changes Oct 18, 2022

View reviewed changes

agent/pool/pool.go Outdated Show resolved Hide resolved

agent/consul/server.go Show resolved Hide resolved

kisunji and others added 3 commits October 18, 2022 13:36

Update agent/pool/pool.go

9325aae

Co-authored-by: Paul Banks <banks@banksco.de>

Add server reload test

ea02dd6

gofmt

b4190ab

kisunji merged commit 29a297d into main Oct 18, 2022

kisunji deleted the kisunji/NET-1092 branch October 18, 2022 19:05

This was referenced Oct 18, 2022

Backport of Refactor client RPC timeouts into release/1.11.x #15038

Merged

Backport of Refactor client RPC timeouts into release/1.12.x #15039

Merged

Backport of Refactor client RPC timeouts into release/1.13.x #15040

Merged

kisunji mentioned this pull request Oct 18, 2022

Remove unused methods from template #15046

Merged

quinndiggitypolymath mentioned this pull request Oct 21, 2022

Consul connection error on port 8300 #14464

Closed

blockmar mentioned this pull request Nov 3, 2022

Large number of error message RPC deadlines after the introduction of rpc_client_timeout #15246

Closed

mbrulatout mentioned this pull request Apr 3, 2023

Apply 1.12.5-criteo8 changes onto 1.12.9 criteo-forks/consul#151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor client RPC timeouts #14965

Refactor client RPC timeouts #14965

kisunji commented Oct 12, 2022 •

edited

kisunji Oct 13, 2022

kisunji Oct 13, 2022

kisunji Oct 13, 2022

kisunji Oct 13, 2022

huikang left a comment

kisunji commented Oct 14, 2022

kisunji Oct 14, 2022

banks left a comment

banks left a comment

freddygv commented Oct 18, 2022

kisunji commented Oct 18, 2022

Refactor client RPC timeouts #14965

Refactor client RPC timeouts #14965

Conversation

kisunji commented Oct 12, 2022 • edited

Description

Testing & Reproduction steps

PR Checklist

kisunji Oct 13, 2022

Choose a reason for hiding this comment

kisunji Oct 13, 2022

Choose a reason for hiding this comment

kisunji Oct 13, 2022

Choose a reason for hiding this comment

kisunji Oct 13, 2022

Choose a reason for hiding this comment

huikang left a comment

Choose a reason for hiding this comment

kisunji commented Oct 14, 2022

kisunji Oct 14, 2022

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment

freddygv commented Oct 18, 2022

kisunji commented Oct 18, 2022

kisunji commented Oct 12, 2022 •

edited