Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml #6250

Closed
jperville opened this issue Jun 9, 2020 · 7 comments · Fixed by #7275
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jperville
Copy link
Contributor

jperville commented Jun 9, 2020

Bug description

When deploying k8s cluster with kubespray on proxy environments, the "kubeadm | Initialize first master" task does not pass ansible proxy environment variables to kubeadm init, resulting in kubeadm init generating files in /etc/kubernetes/manifests/kube-*.yaml with broken proxy environment variables.

In particular, the no_proxy value injected by kubeadm init does not include the kube_pod_subnet and kube_service_addresses ranges, resulting in the apiserver trying to use the proxy to contact services/pods running on the internal cluster. This breaks use of custom apiservices such as the metrics server or the prometheus adapter.

How to reproduce

First, setup the inventory:

  • in inventory/allinone/group_vars/all/all.yml , setup at least http_proxy=http://10.0.3.84:3128 and https_proxy=http://10.0.3.84:3128 (replace with the actual http proxy URL)
  • in inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml , configure metrics_server_enabled=true

Before provisioning the target nodes, pre-populate the /etc/profile.d/proxy.sh file on each node so that shell commands are aware of the http proxy :

cat <<EOF | tee /etc/profile.d/proxy.sh
export HTTP_PROXY=http://10.0.3.195:3128
export http_proxy=http://10.0.3.195:3128
export HTTPS_PROXY=http://10.0.3.195:3128
export https_proxy=http://10.0.3.195:3128
export NO_PROXY="127.0.0.1,localhost,192.168.220.100"
export no_proxy="127.0.0.1,localhost,192.168.220.100"
EOF

Finally, run ansible to provision the k8s cluster as usual. Provisioning should succeed (see https://gist.github.com/jperville/ea721eb1d3bf877345fc91fbcda88a58#file-02-ansible-log ).

However if we check the output of kubectl get apiservices we can see that the metrics server is broken.

$ kubectl get apiservices | grep -v True
NAME                                   SERVICE                      AVAILABLE                      AGE
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   13m

$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       addonmanager.kubernetes.io/mode=Reconcile
Annotations:  API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2020-06-09T14:36:53Z
  Resource Version:    1126
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 0168b932-eb82-4c17-9e2c-6fe4c59cf402
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-06-09T14:36:53Z
    Message:               failing or missing response from https://10.233.39.159:443/apis/metrics.k8s.io/v1beta1: Get https://10.233.39.159:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

The reason is that kube-apiserver tries to go through the https_proxy to contact the apiservice pod using its cluster ip . Since the ip is not listed in the kube-apiserver no_proxy environment variable the trafic goes nowhere.

This can be seen by checking the no_proxy environment variable value in /etc/kubernetes/manifests/kube-*.yaml :

[root@kapitan ~]# fgrep -i -A1  no_proxy /etc/kubernetes/manifests/kube-*.yaml 
    - name: NO_PROXY
      value: 127.0.0.1,localhost,192.168.220.100
--
    - name: no_proxy
      value: 127.0.0.1,localhost,192.168.220.100

To compare, the docker/crio systemd override has the proper value (which includes the kube_pod_subnet and kube_kube_service_addresses ranges) for no_proxy:

Environment="HTTP_PROXY=http://10.0.3.195:3128" "HTTPS_PROXY=http://10.0.3.195:3128" "NO_PROXY=192.168.220.100,kapitan,kapitan.cluster.local,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18"

Environment

  • Cloud provider or hardware configuration:

Running ansible from Ubuntu 18.04 workstation to provision k8s on a Vagrant VM running Centos 7.

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.18.0-25-generic x86_64
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Version of Ansible (ansible --version):
ansible 2.9.9
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/julien/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]
  • Version of Python (python --version): Python 2.7.17

Kubespray version (commit) (git rev-parse --short HEAD): 81292f9c

Network plugin used: calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): See https://gist.github.com/jperville/ea721eb1d3bf877345fc91fbcda88a58#file-01-ansible-vars

Command used to invoke ansible:

ansible-playbook -i ../inventory/allinone/hosts.yml cluster.yml --user root --extra-vars @../inventory/allinone/kapitan_vars.yml

Output of ansible run:
See https://gist.github.com/jperville/ea721eb1d3bf877345fc91fbcda88a58#file-02-ansible-log

@jperville jperville added the kind/bug Categorizes issue or PR as related to a bug. label Jun 9, 2020
@jperville
Copy link
Contributor Author

I worked around the issue with the following patch which I use on my private fork but I welcome a proper PR for this issue.

$ git diff HEAD^
diff --git a/roles/kubernetes/master/tasks/kubeadm-setup.yml b/roles/kubernetes/master/tasks/kubeadm-setup.yml
index d3412855..cb840811 100644
--- a/roles/kubernetes/master/tasks/kubeadm-setup.yml
+++ b/roles/kubernetes/master/tasks/kubeadm-setup.yml
@@ -150,6 +150,12 @@
   failed_when: kubeadm_init.rc != 0 and "field is immutable" not in kubeadm_init.stderr
   environment:
     PATH: "{{ bin_dir }}:{{ ansible_env.PATH }}"
+    http_proxy: "{{ http_proxy | default('') }}"
+    HTTP_PROXY: "{{ http_proxy | default('') }}"
+    https_proxy: "{{ https_proxy | default('') }}"
+    HTTPS_PROXY: "{{ https_proxy | default('') }}"
+    no_proxy: "{{ no_proxy | default('') }}"
+    NO_PROXY: "{{ no_proxy | default('') }}"
   notify: Master | restart kubelet
 
 - name: set kubeadm certificate key

@jperville jperville changed the title kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yml kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml Jun 9, 2020
@includerandom
Copy link

includerandom commented Jul 3, 2020

Great job! I have the same issue, but your workaround doesn't help me. I don't use proxy, but have the same message from metrics-server, like you. Have you any idea, how to fix this?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2020
@jperville
Copy link
Contributor Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 29, 2021
@champtar
Copy link
Contributor

/remove-lifecycle rotten
I recently did some work on the proxy side, but it'll not work if you have PROXY* vars in the shell env

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants