Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Flannel networking tasks fails on Ubuntu 20.04 #115

Closed
NiftyMist opened this issue Sep 28, 2021 · 6 comments
Closed

Configure Flannel networking tasks fails on Ubuntu 20.04 #115

NiftyMist opened this issue Sep 28, 2021 · 6 comments

Comments

@NiftyMist
Copy link

I'm following along in the Ansible for Kubernetes book to stand up a 5 node cluster. The cluster is running on Ubuntu 20.04 across the board. Node 1 (master) completes this task just fine, however all 4 worker nodes fail on this task with the following:

Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

The only override vars I'm using are as follow:

# Kubernetes configuration.
kubernetes_version: '1.20'
kubernetes_allow_pods_on_master: false
kubernetes_apiserver_advertise_address: '10.0.0.10'
kubernetes_kubelet_extra_args: '--node-ip={{ ansible_host }}'

I will add any further info here as I continue to troubleshoot this issue.

@NiftyMist
Copy link
Author

I have verified that all nodes have the same certificate-authority-data in their /root/.kube/config.

@NiftyMist
Copy link
Author

A shot in the dark, but I added a full package update and reboot to see if that solved the issue. It was unsuccessful.

---
- hosts: kube
  become: true
  handlers:
    - name: reboot
      reboot:

  pre_tasks:
    # adding to see if updating all packages will resolve the issue of 
    # Configure Flannel networking task failing on worker nodes.
    - name: update all packages # noqa 403
      apt:
        name: '*'
        state: latest
        update_cache: true
      notify: reboot

    # ensure handlers are flushed before moving on to geerlingguys roles. 
    - name: flush handlers
      meta: flush_handlers

  # Geerlingguy's roles per Ansible for Kubernetes page 77 (2021Sep30).
  roles:
    - geerlingguy.security
    - geerlingguy.docker
    - geerlingguy.swap
    - geerlingguy.kubernetes

@NiftyMist
Copy link
Author

I'm thinking maybe this issue should be in https://github.com/geerlingguy/ansible-for-kubernetes?

@NiftyMist
Copy link
Author

I ran an ansible ad-hoc command to get all of the /etc/kubernetes/admin.conf files from my nodes so I could inspect them all on my local machine:

ansible -m fetch -a "src=/etc/kubernetes/admin.conf dest=/tmp/fetch" -i inventory/hosts.yml all -b

I did a diff across all the files and saw that I was mistaken. The certificate-authority-data was different across the board. As a quick test I copied the certificate-authority-data from my master node's admin.conf on my local to my second node's admin.conf on my local and then pushed that file back out to node02. I sshed to node02 and switched to the root user. Then just a kubectl get nodes and boom, no certificate errors. However, I did get an error about not being a logged in user.

root@node05:~# kubectl get nodes
error: You must be logged in to the server (Unauthorized)

@NiftyMist
Copy link
Author

Replaced all worker nodes with the exact same /etc/kubernetes/admin.conf I fetched to my local from node01 one with a quick script:

#!/bin/bash
for i in 2 3 4 5
do
ansible -m copy -a "src=/tmp/fetch/node01/etc/kubernetest/admin.conf dest=/etc/kubernetes/admin.conf" -i inventory/hosts.yml all -b --limit node0$1
done

Then ran the playbook again and was met with a completed execution but still only seeing node01 when I check on all the nodes:

root@node01:~# kubectl get nodes
NAME                        STATUS   ROLES    AGE   VERSION
node01.test.local       Ready      <none>    45h     v1.20.11

@NiftyMist
Copy link
Author

NiftyMist commented Sep 30, 2021

I completely missed the kubernetes_role in the inventory on page 74 of Ansible for Kubernetes. I delete and redeployed my nodes in my test environment. I modified my inventory like so:

all:
  children:
    kube:
      children:
        kubemaster:
        kubeworker:
    kubemaster:
      hosts:
        node01:
    kubeworker:
      hosts:
        node0[2:5]:

inventory/group_vars/kubemaster.yml

---
# Kubernetes master configuration.
kubernetes_role: master

inventory/group_vars/kubeworker.yml

---
# Kubernetes worker configuration.
kubernetes_role: node

I reran the playbook and logged back in node one and I could see all of the worker nodes in the cluster! 🎉

root@node01:~# kubectl get nodes
NAME                    STATUS   ROLES                       AGE   VERSION
node01.test.local   Ready      control-plane,master       60s   v1.20.11
node02.test.local   Ready      <none>                       32s   v1.20.11
node03.test.local   Ready      <none>                       33s   v1.20.11
node04.test.local   Ready      <none>                       33s   v1.20.11
node05.test.local   Ready      <none>                       31s   v1.20.11

Sorry for the confusion and opening up a ticket unnecessarily. Thanks for all the work you do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant