Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused #4728

moschlar · 2021-12-13T12:33:37Z

Environmental Info:
K3s Version:
k3s version v1.21.7+k3s1 (ac70570)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux rancher-02 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux

Cluster Configuration:

2 servers, embedded etcd

Describe the bug:
Regularily seeing
{"level":"warn","ts":"2021-12-13T13:33:11.054+0100","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."} in the k3s log.

The text was updated successfully, but these errors were encountered:

brandond · 2021-12-13T18:53:50Z

Is this error preventing the node from starting up, or is it just filling the logs on an otherwise functional cluster member?

Note that 2 servers isn't a supported configuration when using etcd - you must have have an odd number in order to meet quorum requirements: https://etcd.io/docs/v3.5/faq/#why-an-odd-number-of-cluster-members

Can you confirm that you still run into this when using a supported number of servers?

cc @briandowns it looks like we might be leaking an etcd client somewhere? Assuming the cluster is working despite this message, it suggests that there's still a grpc client around trying to keepalive to the temporary etcd instance that gets set up to extract the bootstrap data.

moschlar · 2021-12-14T09:18:31Z

Hi @brandond - thanks for your response!
(I know about the requirement for the odd number of nodes, this error just occured to me in the process of setting it up and I noted it at that stage).

The cluster is otherwise functional!

Now I have a 3 node cluster and the error is still regularly (roughly every two minutes) occurring on the node I bootstrapped first.

If you want me to run any diagnostics on that node, I'd be happy to - just tell me which ones ;-)

Regards,
Moritz

brandond · 2021-12-14T21:50:52Z

Thanks!

Can you include the output of kubectl get nodes -o yaml ? I'm curious if there's anything in your configuration that contributes to this.

brandond · 2021-12-14T22:21:49Z

I can confirm that I see this message repeating on only the first node of a two-node etcd cluster after stopping and restarting both nodes.

{"level":"warn","ts":"2021-12-14T22:20:05.591Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."}
{"level":"warn","ts":"2021-12-14T22:20:33.093Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."}
{"level":"warn","ts":"2021-12-14T22:21:08.413Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."}

bguzman-3pillar · 2022-01-03T22:20:49Z

Steps to reproduce:
Using version v1.23.1-rc1+k3s1
Create a HA 3 server nodes
Once all are ready, stop all and re start first the first node (main) then the other 2 nodes.
$ journalctl -eu k3s
Getting this output now, the above error message is no longer displayed :

Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.915642    6631 event.go:294] "Event occurred" object="ip-172-31-6-106" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node ip-172-31-6-106 event: Re>
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.915780    6631 event.go:294] "Event occurred" object="ip-172-31-12-106" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node ip-172-31-12-106 event: >
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.916147    6631 event.go:294] "Event occurred" object="ip-172-31-13-19" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node ip-172-31-13-19 event: Re>
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.925461    6631 shared_informer.go:247] Caches are synced for GC
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.951420    6631 shared_informer.go:247] Caches are synced for daemon sets
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.958982    6631 shared_informer.go:247] Caches are synced for TTL
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.964016    6631 shared_informer.go:247] Caches are synced for service account
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.975717    6631 shared_informer.go:247] Caches are synced for namespace
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.027923    6631 shared_informer.go:247] Caches are synced for resource quota
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.042458    6631 shared_informer.go:247] Caches are synced for resource quota
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.089135    6631 shared_informer.go:247] Caches are synced for endpoint
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.094479    6631 shared_informer.go:247] Caches are synced for endpoint_slice_mirroring
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.108823    6631 shared_informer.go:247] Caches are synced for endpoint_slice
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.442547    6631 shared_informer.go:247] Caches are synced for garbage collector
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.442578    6631 garbagecollector.go:155] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.485066    6631 shared_informer.go:247] Caches are synced for garbage collector
Jan 03 21:44:55 ip-172-31-12-106 k3s[6631]: I0103 21:44:55.778748    6631 scope.go:110] "RemoveContainer" containerID="8e8bec50452dae72cf88717162249ee86e83c76dd4ac0402ecf8d2c31eb27061"
Jan 03 21:44:55 ip-172-31-12-106 k3s[6631]: I0103 21:44:55.786589    6631 scope.go:110] "RemoveContainer" containerID="13ffe125438c05a46bda60e67787695cc5c8c618e316b5dce3abc6a378683600"

brandond added this to the v1.23.0+k3s1 milestone Dec 14, 2021

brandond added this to To Triage in Development [DEPRECATED] via automation Dec 14, 2021

brandond self-assigned this Dec 14, 2021

brandond moved this from To Triage to Peer Review in Development [DEPRECATED] Dec 14, 2021

brandond mentioned this issue Dec 14, 2021

Close etcd clients to avoid leaking GRPC connections #4745

Merged

brandond moved this from Peer Review to To Test in Development [DEPRECATED] Dec 18, 2021

rancher-max self-assigned this Jan 3, 2022

rancher-max closed this as completed Jan 3, 2022

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR Jan 3, 2022

brandond mentioned this issue Mar 10, 2022

Close additional leaked GPRC clients #5254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused #4728

Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused #4728

moschlar commented Dec 13, 2021

brandond commented Dec 13, 2021 •

edited

moschlar commented Dec 14, 2021

brandond commented Dec 14, 2021

brandond commented Dec 14, 2021

bguzman-3pillar commented Jan 3, 2022

Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused #4728

Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused #4728

Comments

moschlar commented Dec 13, 2021

brandond commented Dec 13, 2021 • edited

moschlar commented Dec 14, 2021

brandond commented Dec 14, 2021

brandond commented Dec 14, 2021

bguzman-3pillar commented Jan 3, 2022

brandond commented Dec 13, 2021 •

edited