Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDS - client connection failure and retries #5594

Closed
k-raval opened this issue Aug 18, 2022 · 0 comments · Fixed by #5601
Closed

XDS - client connection failure and retries #5594

k-raval opened this issue Aug 18, 2022 · 0 comments · Fixed by #5601
Assignees

Comments

@k-raval
Copy link

k-raval commented Aug 18, 2022

What version of gRPC are you using? - v1.48.0

What version of Go are you using (go version)? - 1.18.3

What operating system (Linux, Windows, …) and version? - Linux

What did you do?

My setup details:
Istio-XDS with Traffic policy - Load balancer with consistent hash to route to server.
Client and Server running as separate pods on k8s cluster.

Issue:

  1. Server side does xds.NewGRPCServer()
  2. Client side connects using URI - xds:///chatservice:
  3. Client and server are able to exchange messages.
  4. Server is terminated manually.
  5. Client side loses the connection and retries once, but that attempt fails and then it never retries again.
  6. Server side is restarted and now available.
  7. Client side does not retry and never connects.
  8. If Client calls Dial again, then it is able to reconnect. But not automatically.
  9. It is observed that if I turn off XDS, the GRPC client is able to reconnect automatically to the server.

Am I missing any configuration which would let client retry ? Without any configuration, when XDS is disabled, plain GRPC client does retry on its own and gets connected again successfully.

Thanks.
-kartik.

Logs of client side for reference:
2022/08/17 12:04:38 INFO: [core] [Channel #1 SubChannel #5] Subchannel Connectivity change to IDLE

2022/08/17 12:04:38 INFO: [transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

2022/08/17 12:04:38 INFO: [xds] [ring-hash-lb 0xc000839880] handle SubConn state change: 0xc00007fb20, IDLE

2022/08/17 12:04:38 INFO: [xds] [priority-lb 0xc000495490] Balancer state update from locality priority-0-0, new state: {ConnectivityState:IDLE Picker:0xc000840960}

2022/08/17 12:04:38 WARNING: [xds] ciu, cn, cu: priority-0-0, priority-0-0, priority-0-0

2022/08/17 12:04:38 INFO: [xds] [xds-cluster-manager-lb 0xc0003efb00] Balancer state update from locality cluster:outbound|16000||chatservice.default.svc.cluster.local, new state: {Connec

tivityState:IDLE Picker:0xc000840960}

2022/08/17 12:04:38 INFO: [xds] [xds-cluster-manager-lb 0xc0003efb00] Child pickers: map[cluster:outbound|16000||chatservice.default.svc.cluster.local:picker:0xc000840960,state:IDLE,state

ToAggregate:IDLE]

2022/08/17 12:04:38 INFO: [core] [Channel #1] Channel Connectivity change to IDLE

2022/08/17 12:04:38 INFO: [xds] [priority-lb 0xc000495490] switching to ("priority-0-0", 0) in syncPriority

2022/08/17 12:04:38 Send - Msg-3

2022/08/17 12:04:38 INFO: [core] [Channel #1 SubChannel #5] Subchannel Connectivity change to CONNECTING

2022/08/17 12:04:38 INFO: [core] [Channel #1 SubChannel #5] Subchannel picks a new address "172.20.0.170:16000" to connect

2022/08/17 12:04:38 INFO: [xds] [ring-hash-lb 0xc000839880] handle SubConn state change: 0xc00007fb20, CONNECTING

2022/08/17 12:04:38 INFO: [xds] [priority-lb 0xc000495490] Balancer state update from locality priority-0-0, new state: {ConnectivityState:CONNECTING Picker:0xc00004f1d0}

2022/08/17 12:04:38 WARNING: [xds] ciu, cn, cu: priority-0-0, priority-0-0, priority-0-0

2022/08/17 12:04:38 INFO: [xds] [xds-cluster-manager-lb 0xc0003efb00] Balancer state update from locality cluster:outbound|16000||chatservice.default.svc.cluster.local, new state: {ConnectivityState:CONNECTING Picker:0xc00004f1d0}

2022/08/17 12:04:38 INFO: [xds] [xds-cluster-manager-lb 0xc0003efb00] Child pickers: map[cluster:outbound|16000||chatservice.default.svc.cluster.local:picker:0xc00004f1d0,state:CONNECTING,stateToAggregate:CONNECTING]

2022/08/17 12:04:38 INFO: [core] [Channel #1] Channel Connectivity change to CONNECTING

2022/08/17 12:04:38 INFO: [xds] [priority-lb 0xc000495490] switching to ("priority-0-0", 0) in syncPriority

2022/08/17 12:04:38 WARNING: [core] [Channel #1 SubChannel #5] grpc: addrConn.createTransport failed to connect to {

"Addr": "172.20.0.170:16000",

"ServerName": "chatservice:16000",

"Attributes": {},

"BalancerAttributes": {},

"Type": 0,

"Metadata": null

}. Err: connection error: desc = "transport: Error while dialing dial tcp 172.20.0.170:16000: connect: connection refused"

2022/08/17 12:04:38 INFO: [core] [Channel #1 SubChannel #5] Subchannel Connectivity change to TRANSIENT_FAILURE

2022/08/17 12:04:38 INFO: [xds] [ring-hash-lb 0xc000839880] handle SubConn state change: 0xc00007fb20, TRANSIENT_FAILURE

2022/08/17 12:04:38 INFO: [xds] [priority-lb 0xc000495490] Balancer state update from locality priority-0-0, new state: {ConnectivityState:TRANSIENT_FAILURE Picker:0xc00004f3b0}

2022/08/17 12:04:38 WARNING: [xds] ciu, cn, cu: priority-0-0, priority-0-0, priority-0-0

2022/08/17 12:04:38 INFO: [xds] [xds-cluster-manager-lb 0xc0003efb00] Balancer state update from locality cluster:outbound|16000||chatservice.default.svc.cluster.local, new state: {ConnectivityState:TRANSIENT_FAILURE Picker:0xc00004f3b0}

2022/08/17 12:04:38 INFO: [xds] [xds-cluster-manager-lb 0xc0003efb00] Child pickers: map[cluster:outbound|16000||chatservice.default.svc.cluster.local:picker:0xc00004f3b0,state:TRANSIENT_FAILURE,stateToAggregate:TRANSIENT_FAILURE]

2022/08/17 12:04:38 INFO: [core] [Channel #1] Channel Connectivity change to TRANSIENT_FAILURE

2022/08/17 12:04:38 INFO: [xds] [priority-lb 0xc000495490] switching to ("priority-0-0", 0) in syncPriority

2022/08/17 12:04:38 Error when calling SayHello: rpc error: code = Unavailable desc = last connection error: connection error: desc = "transport: Error while dialing dial tcp 172.20.0.170:16000: connect: connection refused"

2022/08/17 12:04:39 INFO: [core] [Channel #1 SubChannel #5] Subchannel Connectivity change to IDLE

2022/08/17 12:04:39 INFO: [xds] [ring-hash-lb 0xc000839880] handle SubConn state change: 0xc00007fb20, IDLE

@zasweq zasweq added the P2 label Aug 23, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants