Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Fetching CACerts from Master Node (Linux 5.15.84-v8+) #206

Closed
ztnel opened this issue Jan 8, 2023 · 3 comments
Closed

Timeout Fetching CACerts from Master Node (Linux 5.15.84-v8+) #206

ztnel opened this issue Jan 8, 2023 · 3 comments

Comments

@ztnel
Copy link

ztnel commented Jan 8, 2023

Context

I have a Turingpi Cluster running 7 RPi3+ Compute modules. Each of them are freshly installed with 64 bit RaspberryPi OS (5.15.84-v8+):

$ ansible -a 'uname -a' all
node2 | CHANGED | rc=0 >>
Linux node2 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
node4 | CHANGED | rc=0 >>
Linux node4 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
node1 | CHANGED | rc=0 >>
Linux node1 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
node3 | CHANGED | rc=0 >>
Linux node3 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
node5 | CHANGED | rc=0 >>
Linux node5 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
node6 | CHANGED | rc=0 >>
Linux node6 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
master | CHANGED | rc=0 >>
Linux master 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux

Here is my hosts.ini for my deployment:

[master]
192.168.2.93

[node]
192.168.2.[94:99]

[k3s_cluster:children]
master
node

Issue

After running the site.yml playbook. The task to spin up the services on the nodes halt indefinitely:

TASK [k3s/node : Enable and check K3s service] **************************************************************************************************************************************************************************************************************************************************
Sunday 08 January 2023  11:27:39 -0500 (0:00:04.668)       0:03:45.715 ******** 

After investigating the nodes service logs for k3s-node I discovered continuous timeouts in the transaction to get the CA certs from the master node (via load balancer):

node@node1:~ $ journalctl -fu k3s-node.service
...
Jan 08 11:39:45 node1 k3s[2662]: time="2023-01-08T11:39:45-05:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

Diagnostics

I then tried to curl the cacerts endpoint from the node1 system and discovered the curl command is stalling here:

node@node1:~ $ curl -vvv -k https://127.0.0.1:6444/cacerts
*   Trying 127.0.0.1:6444...
* Connected to 127.0.0.1 (127.0.0.1) port 6444 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):

Seems like the master node has a server issue. To double check I verified port 6444 is listening on the master node:

node@master:~ $ netstat -lupt
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp      356      0 master:sge-qmaster      0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10010            0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10249            0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10248            0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10259            0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10258            0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10257            0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:10256            0.0.0.0:*               LISTEN      -                   
tcp        0      0 master:ipp              0.0.0.0:*               LISTEN      -                   
tcp6       0      0 [::]:10251              [::]:*                  LISTEN      -                   
tcp6       0      0 [::]:10250              [::]:*                  LISTEN      -                   
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      -                   
tcp6       0      0 localhost:ipp           [::]:*                  LISTEN      -                   
tcp6     825      0 [::]:6443               [::]:*                  LISTEN      -                   
udp        0      0 0.0.0.0:bootpc          0.0.0.0:*                           -                   
udp        0      0 0.0.0.0:631             0.0.0.0:*                           -                   
udp        0      0 0.0.0.0:mdns            0.0.0.0:*                           -                   
udp        0      0 0.0.0.0:8472            0.0.0.0:*                           -                   
udp        0      0 0.0.0.0:47457           0.0.0.0:*                           -                   
udp6       0      0 [::]:mdns               [::]:*                              -                   
udp6       0      0 [::]:59701              [::]:*                              -                   

This is the extent of my troubleshooting as I lack expertise on the inner workings of k3s. Any guidance would be appreciated. Happy to provide more information if required. I'm also willing to dedicate my setup to solve this issue so I will leave it in the aforementioned state.

@slim-bean
Copy link

slim-bean commented Jan 25, 2023

I ran into what I believe is the same problem, for me it was a network connectivity issue between the agent nodes and the master node.

The error message I think is misleading because it has "localhost" in the connection, but when I unblocked the agents access to the master node on port 6443 it started working

@ztnel
Copy link
Author

ztnel commented Jan 27, 2023

Yea I think the requests on the nodes go through a load balancer hosted locally before getting forwarded to the master node. I'm not sure what would cause a connection problem between my nodes.

@dereknola
Copy link
Member

Typically issues around port 6443 are related to firewall. It is recommended to disable firewalls or provide minimal openings. See https://docs.k3s.io/advanced#ubuntu--debian. Additionally, tracking #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants