Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet can't mount the volume with --cloud-provider=external which is required by CCM #71018

Closed
yifan-gu opened this issue Nov 14, 2018 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@yifan-gu
Copy link
Contributor

yifan-gu commented Nov 14, 2018

What happened:

  • Running CCM with --cloud-provider=aws.
  • According to the doc, the kubelet needs to run with --cloud-provider=external.

However then kubelet failed to mount the EBS volumes:

Mounting command: mount
Mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-1c/vol-0c1b6f36d694c79f2 /var/lib/kubelet/pods/227ce278-e7c0-11e8-ae8b-06346f5010a2/volumes/kubernetes.io~aws-ebs/pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2
Output: mount: special device /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-1c/vol-0c1b6f36d694c79f2 does not exist
E1114 03:49:28.077708    1548 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/aws-ebs/227ce278-e7c0-11e8-ae8b-06346f5010a2-pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2\" (\"227ce278-e7c0-11e8-ae8b-06346f5010a2\")" failed. No retries permitted until 2018-11-14 03:50:32.077674047 +0000 UTC m=+481.316888022 (durationBeforeRetry 1m4s). Error: "MountVolume.SetUp failed for volume \"pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2\" (UniqueName: \"kubernetes.io/aws-ebs/227ce278-e7c0-11e8-ae8b-06346f5010a2-pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2\") pod \"prometheus-prometheus-0\" (UID: \"227ce278-e7c0-11e8-ae8b-06346f5010a2\") : mount failed: exit status 32\nMounting command: mount\nMounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-1c/vol-0c1b6f36d694c79f2 /var/lib/kubelet/pods/227ce278-e7c0-11e8-ae8b-06346f5010a2/volumes/kubernetes.io~aws-ebs/pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2\nOutput: mount: special device /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-1c/vol-0c1b6f36d694c79f2 does not exist\n\n"
E1114 03:50:27.196235    1548 kubelet.go:1616] Unable to mount volumes for pod "prometheus-prometheus-0_kube-system(227ce278-e7c0-11e8-ae8b-06346f5010a2)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"prometheus-prometheus-0". list of unmounted volumes=[prometheus-storage]. list of unattached volumes=[prometheus-storage config config-out prometheus-prometheus-rulefiles-0 prometheus-token-kqf4l]; skipping pod
E1114 03:50:27.196294    1548 pod_workers.go:186] Error syncing pod 227ce278-e7c0-11e8-ae8b-06346f5010a2 ("prometheus-prometheus-0_kube-system(227ce278-e7c0-11e8-ae8b-06346f5010a2)"), skipping: timeout expired waiting for volumes to attach or mount for pod "kube-system"/"prometheus-prometheus-0". list of unmounted volumes=[prometheus-storage]. list of unattached volumes=[prometheus-storage config config-out prometheus-prometheus-rulefiles-0 prometheus-token-kqf4l]
E1114 03:50:32.178373    1548 mount_linux.go:151] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -o bind /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-1c/vol-0c1b6f36d694c79f2 /var/lib/kubelet/pods/227ce278-e7c0-11e8-ae8b-06346f5010a2/volumes/kubernetes.io~aws-ebs/pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2
Output: mount: special device /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-1c/vol-0c1b6f36d694c79f2 does not exist
E1114 03:50:32.178464    1548 aws_ebs.go:419] Mount of disk /var/lib/kubelet/pods/227ce278-e7c0-11e8-ae8b-06346f5010a2/volumes/kubernetes.io~aws-ebs/pvc-227b754f-e7c0-11e8-ae8b-06346f5010a2 failed: mount failed: exit status 32

What you expected to happen:
kubelet should still be able to mount volumes.

How to reproduce it (as minimally and precisely as possible):

  • Run CCM with --cloud-provider=aws
  • Run kubelet with --cloud-provider=external
  • Create an EBS storage class
  • Create a pod that requires a persistent volume.

Anything else we need to know?:
My CCM yaml:

apiVersion: v1
kind: Pod
metadata:
  name: cloud-controller-manager
  namespace: kube-system
  labels:
    k8s-app: cloud-controller-manager
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
  containers:
  - name: cloud-controller-manager
    image: k8s.gcr.io/hyperkube:v1.11.2
    command:
    - ./hyperkube
    - cloud-controller-manager
    - --kubeconfig=/etc/kubernetes/kubeconfig
    - --leader-elect=true
    - --use-service-account-credentials
    - --profiling=false
    - --cloud-provider=aws
    - --cloud-config=/etc/kubernetes/cloud-config.ini
    - --configure-cloud-routes=false
    - --allocate-node-cidrs=true
    - --cluster-cidr=172.16.0.0/16
    - --feature-gates=ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true,ExperimentalCriticalPodAnnotation=true,Initializers=true
    livenessProbe:
      httpGet:
        path: /healthz
        port: 10253
      initialDelaySeconds: 15
      timeoutSeconds: 1
    volumeMounts:
    - name: etc-kubernetes
      mountPath: /etc/kubernetes
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - name: etc-kubernetes
    hostPath:
      path: /etc/kubernetes

Environment:

  • Kubernetes version (use kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:16Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    AWS

  • OS (e.g. from /etc/os-release):

cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1855.4.0
VERSION_ID=1855.4.0
BUILD_ID=2018-09-11-0003
PRETTY_NAME="Container Linux by CoreOS 1855.4.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
  • Kernel (e.g. uname -a):
uname -a
Linux ip-10-3-7-91.us-west-1.compute.internal 4.14.67-coreos #1 SMP Mon Sep 10 23:14:26 UTC 2018 x86_64 Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz GenuineIntel GNU/Linux

@kubernetes/sig-aws-misc @kubernetes/sig-storage-bugs
/cc @Quentin-M

/kind bug

@k8s-ci-robot k8s-ci-robot added sig/aws kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Nov 14, 2018
@yifan-gu
Copy link
Contributor Author

yifan-gu commented Nov 14, 2018

Kinda similar to #70921 , but in my case the --cloud-provider flag needs to be set to external according to the doc.

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Dec 4, 2018

This issue is preventing us from switching to the cloud controller manager. Is anyone looking into this?
I also confirmed that this happens on 1.12.2 too.

@gnufied
Copy link
Member

gnufied commented Dec 4, 2018

I do not think typical volume features will work with an external cloud controller manager. Unless you are using CSI+EBS driver(https://github.com/kubernetes-sigs/aws-ebs-csi-driver), in-tree EBS volumes won't work without cloudprovider configuration with controller-manager.

@gnufied
Copy link
Member

gnufied commented Dec 4, 2018

In a nutshell - it is a known issue that, if you are using an external CCM and don't have cloudprovider configured with controller-manager, none of the volume features will work as expected.

That is why sig-storage is working on CSI, which allows external drivers to support attach/detach/provisioning etc.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2019
@feiskyer
Copy link
Member

feiskyer commented Mar 20, 2019

In a nutshell - it is a known issue that, if you are using an external CCM and don't have cloudprovider configured with controller-manager, none of the volume features will work as expected.

@andrewsykim I think this is true today? CSI is still the only solution for this?

@msau42
Copy link
Member

msau42 commented Mar 20, 2019

I'm not sure I understand why mount/unmount would be dependent on cloud provider. @gnufied can you clarify?

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 19, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

6 participants