Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused #4728

Closed
moschlar opened this issue Dec 13, 2021 · 5 comments
Closed

Comments

@moschlar
Copy link

Environmental Info:
K3s Version:
k3s version v1.21.7+k3s1 (ac70570)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux rancher-02 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux

Cluster Configuration:

2 servers, embedded etcd

Describe the bug:
Regularily seeing
{"level":"warn","ts":"2021-12-13T13:33:11.054+0100","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399 <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."} in the k3s log.

@brandond
Copy link
Contributor

brandond commented Dec 13, 2021

Is this error preventing the node from starting up, or is it just filling the logs on an otherwise functional cluster member?

Note that 2 servers isn't a supported configuration when using etcd - you must have have an odd number in order to meet quorum requirements: https://etcd.io/docs/v3.5/faq/#why-an-odd-number-of-cluster-members

Can you confirm that you still run into this when using a supported number of servers?

cc @briandowns it looks like we might be leaking an etcd client somewhere? Assuming the cluster is working despite this message, it suggests that there's still a grpc client around trying to keepalive to the temporary etcd instance that gets set up to extract the bootstrap data.

@moschlar
Copy link
Author

Hi @brandond - thanks for your response!
(I know about the requirement for the odd number of nodes, this error just occured to me in the process of setting it up and I noted it at that stage).

The cluster is otherwise functional!

Now I have a 3 node cluster and the error is still regularly (roughly every two minutes) occurring on the node I bootstrapped first.

If you want me to run any diagnostics on that node, I'd be happy to - just tell me which ones ;-)

Regards,
Moritz

@brandond
Copy link
Contributor

Thanks!

Can you include the output of kubectl get nodes -o yaml ? I'm curious if there's anything in your configuration that contributes to this.

@brandond
Copy link
Contributor

I can confirm that I see this message repeating on only the first node of a two-node etcd cluster after stopping and restarting both nodes.

{"level":"warn","ts":"2021-12-14T22:20:05.591Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."}
{"level":"warn","ts":"2021-12-14T22:20:33.093Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."}
{"level":"warn","ts":"2021-12-14T22:21:08.413Z","caller":"grpclog/grpclog.go:60","msg":"grpc: addrConn.createTransport failed to connect to {http://127.0.0.1:2399  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2399: connect: connection refused\". Reconnecting..."}

@brandond brandond added this to the v1.23.0+k3s1 milestone Dec 14, 2021
@brandond brandond added this to To Triage in Development [DEPRECATED] via automation Dec 14, 2021
@brandond brandond self-assigned this Dec 14, 2021
@brandond brandond moved this from To Triage to Peer Review in Development [DEPRECATED] Dec 14, 2021
@brandond brandond moved this from Peer Review to To Test in Development [DEPRECATED] Dec 18, 2021
@rancher-max rancher-max self-assigned this Jan 3, 2022
@bguzman-3pillar
Copy link

Steps to reproduce:
Using version v1.23.1-rc1+k3s1
Create a HA 3 server nodes
Once all are ready, stop all and re start first the first node (main) then the other 2 nodes.
$ journalctl -eu k3s
Getting this output now, the above error message is no longer displayed :

Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.915642    6631 event.go:294] "Event occurred" object="ip-172-31-6-106" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node ip-172-31-6-106 event: Re>
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.915780    6631 event.go:294] "Event occurred" object="ip-172-31-12-106" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node ip-172-31-12-106 event: >
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.916147    6631 event.go:294] "Event occurred" object="ip-172-31-13-19" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node ip-172-31-13-19 event: Re>
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.925461    6631 shared_informer.go:247] Caches are synced for GC
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.951420    6631 shared_informer.go:247] Caches are synced for daemon sets
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.958982    6631 shared_informer.go:247] Caches are synced for TTL
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.964016    6631 shared_informer.go:247] Caches are synced for service account
Jan 03 21:44:21 ip-172-31-12-106 k3s[6631]: I0103 21:44:21.975717    6631 shared_informer.go:247] Caches are synced for namespace
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.027923    6631 shared_informer.go:247] Caches are synced for resource quota
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.042458    6631 shared_informer.go:247] Caches are synced for resource quota
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.089135    6631 shared_informer.go:247] Caches are synced for endpoint
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.094479    6631 shared_informer.go:247] Caches are synced for endpoint_slice_mirroring
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.108823    6631 shared_informer.go:247] Caches are synced for endpoint_slice
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.442547    6631 shared_informer.go:247] Caches are synced for garbage collector
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.442578    6631 garbagecollector.go:155] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
Jan 03 21:44:22 ip-172-31-12-106 k3s[6631]: I0103 21:44:22.485066    6631 shared_informer.go:247] Caches are synced for garbage collector
Jan 03 21:44:55 ip-172-31-12-106 k3s[6631]: I0103 21:44:55.778748    6631 scope.go:110] "RemoveContainer" containerID="8e8bec50452dae72cf88717162249ee86e83c76dd4ac0402ecf8d2c31eb27061"
Jan 03 21:44:55 ip-172-31-12-106 k3s[6631]: I0103 21:44:55.786589    6631 scope.go:110] "RemoveContainer" containerID="13ffe125438c05a46bda60e67787695cc5c8c618e316b5dce3abc6a378683600"

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants