Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI plugin fails to start in LXD cluster #55151

Closed
ecejjar opened this issue Nov 6, 2017 · 5 comments
Closed

CNI plugin fails to start in LXD cluster #55151

ecejjar opened this issue Nov 6, 2017 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@ecejjar
Copy link

ecejjar commented Nov 6, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

I deployed a kubernetes 1.8.2 cluster manually within a couple of Ubuntu 16.04 LXD containers (kubemaster1 and kubeworker1), following the procedure described in this guide. I had to do a few tweaks to default configurations in order to bring it up:

  1. LXD container config, I used the following:
    config: 
      boot.autostart: "true" 
      linux.kernel_modules: bridge,br_netfilter,ip_tables,ip6_tables,netlink_diag,nf_nat,overlay 
      raw.lxc: |- 
        lxc.aa_profile = unconfined 
        lxc.cgroup.devices.allow = a 
        lxc.mount.auto=proc:rw sys:rw cgroup:ro 
        lxc.cap.drop = 
      security.nesting: "true" 
      security.privileged: "true" 
      limits.memory.swap: "false" 
    devices: 
      eth0: 
        nictype: bridged 
        parent: lxdbr0 
        type: nic 
      root: 
        path: / 
        pool: default 
        type: disk
  1. Docker version, 1.17.06 or above from 'test' apt repo (1.17.03 from 'stable' does not work due to this issue )

  2. Kubelet service configuration, I had to add a couple of arguments:
    [Service]
    Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --fail-swap-on=false –cgroup-driver=cgroupfs"
    Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"

(fail-swap-on is required since even if the containers are set-up to not use swap, swap-related paths still show up in containers' file systems and kubelet stat's those paths to detect swap usage)

With those settings I can successfully run 'kubeadm init --skip-preflight-checks --pod-network-cidr=' in kubemaster1 container to initialize the master, after which the kubelet service starts and I can use the kubectl tool to interact with the master.

Now when I try to install a network plugin (e.g. flannel), all network-related pods fail to start (kube-flannel, kube-proxy, kube-dns). The reason is they are unable to mount the default ServiceAccount token, as seen in pod logs obtained with 'kubectl describe':

$ kubectl describe -n kube-system po/kube-proxy-b8gsb
Name:           kube-proxy-b8gsb
Namespace:      kube-system
Node:           kubemaster1/10.63.78.145
Start Time:     Sun, 05 Nov 2017 17:53:49 +0000
Labels:         controller-revision-hash=1272499655
                k8s-app=kube-proxy
                pod-template-generation=3
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"kube-system","name":"kube-proxy","uid":"b7a32261-bca4-11e7-9a83-02424f90a469","api...
Status:         Pending
IP:             10.63.78.145
Created By:     DaemonSet/kube-proxy
Controlled By:  DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:  
    Image:         gcr.io/google_containers/kube-proxy-amd64:v1.8.2
    Image ID:      
    Port:          <none>
    Command:
      /usr/local/bin/kube-proxy
      --kubeconfig=/var/lib/kube-proxy/kubeconfig.conf
      --cluster-cidr=10.244.0.0/16
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-bfk49 (ro)
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  xtables-lock:
    Type:  HostPath (bare host directory volume)
    Path:  /run/xtables.lock
  kube-proxy-token-bfk49:
    Type:        Secret (a volume populated by a Secret)[ERASE THIS LINE AND ADD YOUR MESSAGE] 
    SecretName:  kube-proxy-token-bfk49
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.alpha.kubernetes.io/notReady:NoExecute
                 node.alpha.kubernetes.io/unreachable:NoExecute
                 node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason                 Age   From                  Message
  ----     ------                 ----  ----                  -------
  Normal   SuccessfulMountVolume  14s   kubelet, kubemaster1  MountVolume.SetUp succeeded for volume "xtables-lock"
  Warning  FailedMount            14s   kubelet, kubemaster1  MountVolume.SetUp failed for volume "kube-proxy-token-bfk49" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49
Output: Running scope as unit run-r83a0a73327a94e409ef80c5bf5f31595.scope.
mount: only root can use "--types" option (effective UID is 1000000)
  Normal   SuccessfulMountVolume  14s  kubelet, kubemaster1  MountVolume.SetUp succeeded for volume "kube-proxy"
  Warning  FailedMount            14s  kubelet, kubemaster1  MountVolume.SetUp failed for volume "kube-proxy-token-bfk49" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49
Output: Running scope as unit run-rcf7a91cad03948eb96514c7cfc49595e.scope.
mount: only root can use "--types" option (effective UID is 1000000)
  Warning  FailedMount  12s  kubelet, kubemaster1  MountVolume.SetUp failed for volume "kube-proxy-token-bfk49" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49
Output: Running scope as unit run-ra4f31cc3de3241bc92efeac47669fd4d.scope.
mount: only root can use "--types" option (effective UID is 1000000)
  Warning  FailedMount  10s  kubelet, kubemaster1  MountVolume.SetUp failed for volume "kube-proxy-token-bfk49" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49
Output: Running scope as unit run-rd03fa7ba6ba146138513dde46a882ead.scope.
mount: only root can use "--types" option (effective UID is 1000000)
  Warning  FailedMount  6s  kubelet, kubemaster1  MountVolume.SetUp failed for volume "kube-proxy-token-bfk49" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/46eac8a9-c252-11e7-9a83-02424f90a469/volumes/kubernetes.io~secret/kube-proxy-token-bfk49
Output: Running scope as unit run-r7f68f54770ca4d5f90748d45f3563a0d.scope.
mount: only root can use "--types" option (effective UID is 1000000)

Indeed, if I try the 'mount -t' command at kubemaster1 shell prompt it fails with the same error message. It seems that's the proper behavior for a container, even a privileged one, since mounting file systems is a very dangerous operation that only the host's root (UID 0) should be able to perform.

I can disable default ServiceAccount token mounts in the respective DaemonSets' configurations, but then pods won't start since they don't find the token files in the mount path the code expects them to be.

What you expected to happen:

CNI-related pods start and pod networking becomes operative. This must indeed be possible, since e.g. Canonical's Juju is supposed to be able to deploy kubernetes to a cluster of LXD containers (though it is not in my environment).

The kubelet should be able to find out it is running inside an LXD container (or it should be possible to tell it through a command-line argument or configuration parameter) so that it does not use 'mount -t' to mount ServiceAccount tokens in that environment (I tried with the '--containerized' argument and kubelet won't even start inside the LXD container).

How to reproduce it (as minimally and precisely as possible):

Create two LXD containers (e.g. kubemaster1 and kubeworker1) then follow the steps described above to install the kubernetes cluster and the networking plugin in kubemaster1.

Anything else we need to know?:
A seemingly related problem was detected and commented in this issue.

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:38:10Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:
    HP ZBook 15 portable workstation
    16GB RAM
    512GB HDD
    Intel Core i7 vPro chipset

  • OS (e.g. from /etc/os-release):
    Host:
    NAME="Ubuntu"
    VERSION="16.04.3 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.3 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial

LXD container:
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

  • Kernel (e.g. uname -a):
    Host:
    Linux elx74401d27 4.4.0-97-generic Add some documentation #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

LXD container:
Linux kubemaster1 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Server:
Version: 17.10.0-ce
API version: 1.33 (minimum version 1.12)
Go version: go1.8.3
Git commit: f4ffd25
Built: Tue Oct 17 19:02:56 2017
OS/Arch: linux/amd64
Experimental: false

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 6, 2017
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 6, 2017
@ecejjar
Copy link
Author

ecejjar commented Nov 7, 2017

/sig cluster-ops

@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 7, 2017
@ktsakalozos
Copy link
Contributor

Probably this will make no difference, but here is the lxc profile used by Canonical's Kubernetes for lxd deployments: https://github.com/conjure-up/spells/blob/master/canonical-kubernetes/steps/00_process-providertype/lxd-profile.yaml

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 2, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants