kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml #6250

jperville · 2020-06-09T14:52:41Z

Bug description

When deploying k8s cluster with kubespray on proxy environments, the "kubeadm | Initialize first master" task does not pass ansible proxy environment variables to kubeadm init, resulting in kubeadm init generating files in /etc/kubernetes/manifests/kube-*.yaml with broken proxy environment variables.

In particular, the no_proxy value injected by kubeadm init does not include the kube_pod_subnet and kube_service_addresses ranges, resulting in the apiserver trying to use the proxy to contact services/pods running on the internal cluster. This breaks use of custom apiservices such as the metrics server or the prometheus adapter.

How to reproduce

First, setup the inventory:

in inventory/allinone/group_vars/all/all.yml , setup at least http_proxy=http://10.0.3.84:3128 and https_proxy=http://10.0.3.84:3128 (replace with the actual http proxy URL)
in inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml , configure metrics_server_enabled=true

Before provisioning the target nodes, pre-populate the /etc/profile.d/proxy.sh file on each node so that shell commands are aware of the http proxy :

cat <<EOF | tee /etc/profile.d/proxy.sh
export HTTP_PROXY=http://10.0.3.195:3128
export http_proxy=http://10.0.3.195:3128
export HTTPS_PROXY=http://10.0.3.195:3128
export https_proxy=http://10.0.3.195:3128
export NO_PROXY="127.0.0.1,localhost,192.168.220.100"
export no_proxy="127.0.0.1,localhost,192.168.220.100"
EOF

Finally, run ansible to provision the k8s cluster as usual. Provisioning should succeed (see https://gist.github.com/jperville/ea721eb1d3bf877345fc91fbcda88a58#file-02-ansible-log ).

However if we check the output of kubectl get apiservices we can see that the metrics server is broken.

$ kubectl get apiservices | grep -v True
NAME                                   SERVICE                      AVAILABLE                      AGE
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   13m

$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       addonmanager.kubernetes.io/mode=Reconcile
Annotations:  API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2020-06-09T14:36:53Z
  Resource Version:    1126
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 0168b932-eb82-4c17-9e2c-6fe4c59cf402
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2020-06-09T14:36:53Z
    Message:               failing or missing response from https://10.233.39.159:443/apis/metrics.k8s.io/v1beta1: Get https://10.233.39.159:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

The reason is that kube-apiserver tries to go through the https_proxy to contact the apiservice pod using its cluster ip . Since the ip is not listed in the kube-apiserver no_proxy environment variable the trafic goes nowhere.

This can be seen by checking the no_proxy environment variable value in /etc/kubernetes/manifests/kube-*.yaml :

[root@kapitan ~]# fgrep -i -A1  no_proxy /etc/kubernetes/manifests/kube-*.yaml 
    - name: NO_PROXY
      value: 127.0.0.1,localhost,192.168.220.100
--
    - name: no_proxy
      value: 127.0.0.1,localhost,192.168.220.100

To compare, the docker/crio systemd override has the proper value (which includes the kube_pod_subnet and kube_kube_service_addresses ranges) for no_proxy:

Environment="HTTP_PROXY=http://10.0.3.195:3128" "HTTPS_PROXY=http://10.0.3.195:3128" "NO_PROXY=192.168.220.100,kapitan,kapitan.cluster.local,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18"

Environment

Cloud provider or hardware configuration:

Running ansible from Ubuntu 18.04 workstation to provision k8s on a Vagrant VM running Centos 7.

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.18.0-25-generic x86_64
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Version of Ansible (ansible --version):

ansible 2.9.9
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/julien/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]

Version of Python (python --version): Python 2.7.17

Kubespray version (commit) (git rev-parse --short HEAD): 81292f9c

Network plugin used: calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): See https://gist.github.com/jperville/ea721eb1d3bf877345fc91fbcda88a58#file-01-ansible-vars

Command used to invoke ansible:

ansible-playbook -i ../inventory/allinone/hosts.yml cluster.yml --user root --extra-vars @../inventory/allinone/kapitan_vars.yml

Output of ansible run:
See https://gist.github.com/jperville/ea721eb1d3bf877345fc91fbcda88a58#file-02-ansible-log

The text was updated successfully, but these errors were encountered:

jperville · 2020-06-09T14:55:40Z

I worked around the issue with the following patch which I use on my private fork but I welcome a proper PR for this issue.

$ git diff HEAD^
diff --git a/roles/kubernetes/master/tasks/kubeadm-setup.yml b/roles/kubernetes/master/tasks/kubeadm-setup.yml
index d3412855..cb840811 100644
--- a/roles/kubernetes/master/tasks/kubeadm-setup.yml
+++ b/roles/kubernetes/master/tasks/kubeadm-setup.yml
@@ -150,6 +150,12 @@
   failed_when: kubeadm_init.rc != 0 and "field is immutable" not in kubeadm_init.stderr
   environment:
     PATH: "{{ bin_dir }}:{{ ansible_env.PATH }}"
+    http_proxy: "{{ http_proxy | default('') }}"
+    HTTP_PROXY: "{{ http_proxy | default('') }}"
+    https_proxy: "{{ https_proxy | default('') }}"
+    HTTPS_PROXY: "{{ https_proxy | default('') }}"
+    no_proxy: "{{ no_proxy | default('') }}"
+    NO_PROXY: "{{ no_proxy | default('') }}"
   notify: Master | restart kubelet
 
 - name: set kubeadm certificate key

includerandom · 2020-07-03T06:17:28Z

Great job! I have the same issue, but your workaround doesn't help me. I don't use proxy, but have the same message from metrics-server, like you. Have you any idea, how to fix this?

fejta-bot · 2020-10-01T14:21:57Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

jperville · 2020-10-01T14:31:14Z

/remove-lifecycle stale

fejta-bot · 2020-12-30T14:31:50Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-01-29T15:15:12Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

champtar · 2021-01-29T15:25:34Z

/remove-lifecycle rotten
I recently did some work on the proxy side, but it'll not work if you have PROXY* vars in the shell env

jperville added the kind/bug Categorizes issue or PR as related to a bug. label Jun 9, 2020

jperville changed the title ~~kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yml~~ kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml Jun 9, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 30, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 29, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 29, 2021

champtar mentioned this issue Feb 9, 2021

Ensure kubeadm doesn't use proxy #7275

Merged

k8s-ci-robot closed this as completed in #7275 Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml #6250

kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml #6250

jperville commented Jun 9, 2020 •

edited

Loading

jperville commented Jun 9, 2020

includerandom commented Jul 3, 2020 •

edited

Loading

fejta-bot commented Oct 1, 2020

jperville commented Oct 1, 2020

fejta-bot commented Dec 30, 2020

fejta-bot commented Jan 29, 2021

champtar commented Jan 29, 2021

kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml #6250

kubeadm injects broken no_proxy environment variables into /etc/kubernetes/manifests/kube-*.yaml #6250

Comments

jperville commented Jun 9, 2020 • edited Loading

Bug description

How to reproduce

Environment

jperville commented Jun 9, 2020

includerandom commented Jul 3, 2020 • edited Loading

fejta-bot commented Oct 1, 2020

jperville commented Oct 1, 2020

fejta-bot commented Dec 30, 2020

fejta-bot commented Jan 29, 2021

champtar commented Jan 29, 2021

jperville commented Jun 9, 2020 •

edited

Loading

includerandom commented Jul 3, 2020 •

edited

Loading