Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to attach storage to pod due to timeout issues #54

Closed
errorsandwarnings opened this issue Apr 1, 2018 · 33 comments
Closed

Unable to attach storage to pod due to timeout issues #54

errorsandwarnings opened this issue Apr 1, 2018 · 33 comments

Comments

@errorsandwarnings
Copy link

Hi,

Using longhorn as storage system. I am able to attach using longhorn UI to host but the pods can not use the storage volume. Same issue with OpenEBS storage driver too.


kubectl describe pod jenkins
Name:         jenkins
Namespace:    default
Node:         kub1/<IP>
Start Time:   Sun, 01 Apr 2018 09:48:49 +0530
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:
Containers:
  jenkins:
    Container ID:
    Image:          jenkins/jenkins:lts
    Image ID:
    Port:           80/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/jenkins_home from jenkins-home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vhpq2 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  jenkins-home:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  jenkins-data
    ReadOnly:   false
  default-token-vhpq2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vhpq2
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age              From               Message
  ----     ------                 ----             ----               -------
  Warning  FailedScheduling       6m               default-scheduler  PersistentVolumeClaim is not bound: "jenkins-data" (repeated 3 times)
  Normal   Scheduled              6m               default-scheduler  Successfully assigned jenkins to kub1
  Normal   SuccessfulMountVolume  6m               kubelet, kub1      MountVolume.SetUp succeeded for volume "default-token-vhpq2"
  Warning  FailedMount            1m (x2 over 4m)  kubelet, kub1      Unable to mount volumes for pod "jenkins_default(c767552b-3563-11e8-a4b1-027e1d654f4a)": timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins". list of unattached/unmounted volumes=[jenkins-home]


Storage Class

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn
provisioner: rancher.io/longhorn
parameters:
  numberOfReplicas: "2"
  staleReplicaTimeout: "30"
  fromBackup: ""
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jenkins-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10G
---
apiVersion: v1
kind: Pod
metadata:
  name: jenkins
  namespace: default
spec:
  containers:
  - name: jenkins
    image: jenkins/jenkins:lts
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: jenkins-home
      mountPath: "/var/jenkins_home"
    ports:
    - containerPort: 80
  volumes:
  - name: jenkins-home
    persistentVolumeClaim:
      claimName: jenkins-data
@yasker
Copy link
Member

yasker commented Apr 1, 2018

It sounds exactly like the installation of Flexvolume driver wasn't done properly. We've covered that in our troubleshooting section. Can you check https://github.com/rancher/longhorn#volume-can-be-attacheddetached-from-ui-but-kubernetes-podstatefulset-etc-cannot-use-it to see if it helps?

Also, what's your Kubernetes version and guest os version? And can you check the log of longhorn-manager instances? As well as longhorn-flexvolume-driver and longhorn-flexvolume-driver-deployer?

@errorsandwarnings
Copy link
Author

Kubernetes Version - 1.9.5 ( Also confirmed on 1.8.0 not working )
Guest Os Version - Debian GNU/Linux 9 (4.9.0)

longhorn-flexvolume-driver-deployer

[INFO[0000] cannot find volumePluginDir key in node config, assume it's default 
INFO[0000] Install Flexvolume to Kubernetes nodes directory /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ ]

longhorn-flexvolume-driver

Dependency checking
+ echo Dependency checking
++ nsenter --mount=/host/proc/1/ns/mnt -- nsenter -t 1 -n findmnt --version
+ OUT='findmnt from util-linux 2.29.2'
++ nsenter --mount=/host/proc/1/ns/mnt -- nsenter -t 1 -n curl --version
+ OUT='curl 7.52.1 (x86_64-pc-linux-gnu) libcurl/7.52.1 OpenSSL/1.0.2l zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) libssh2/1.7.0 nghttp2/1.18.1 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL '
++ nsenter --mount=/host/proc/1/ns/mnt -- nsenter -t 1 -n blkid -v
+ OUT='blkid from util-linux 2.29.2  (libblkid 2.29.2, 22-Feb-2017)'
+ exit 0
Detecting backend service IP for longhorn-backend
Backend service IP for longhorn-backend is 10.43.108.150
Flexvolume driver installed

@errorsandwarnings
Copy link
Author

longhorn-manager


ERROR: logging before flag.Parse: I0401 02:15:37.675726       1 controller_utils.go:1041] Waiting for caches to sync for longhorn datastore controller
ERROR: logging before flag.Parse: I0401 02:15:37.876334       1 controller_utils.go:1048] Caches are synced for longhorn datastore controller
INFO[0000] Start Longhorn volume controller             
ERROR: logging before flag.Parse: I0401 02:15:37.876927       1 controller_utils.go:1041] Waiting for caches to sync for longhorn engines controller
INFO[0000] Start Longhorn replica controller            
ERROR: logging before flag.Parse: I0401 02:15:37.876993       1 controller_utils.go:1041] Waiting for caches to sync for longhorn replicas controller
INFO[0000] Start Longhorn engine controller             
ERROR: logging before flag.Parse: I0401 02:15:37.877026       1 controller_utils.go:1041] Waiting for caches to sync for longhorn engines controller
ERROR: logging before flag.Parse: I0401 02:15:37.931490       1 controller.go:407] Starting provisioner controller 917cfafd-3552-11e8-9136-027e1df8cb12!
INFO[0000] Listening on 10.42.226.133:9500              
ERROR: logging before flag.Parse: I0401 02:15:37.977393       1 controller_utils.go:1048] Caches are synced for longhorn engines controller
ERROR: logging before flag.Parse: I0401 02:15:37.980143       1 controller_utils.go:1048] Caches are synced for longhorn engines controller
ERROR: logging before flag.Parse: I0401 02:15:37.980154       1 controller_utils.go:1048] Caches are synced for longhorn replicas controller
ERROR: logging before flag.Parse: I0401 02:17:18.425012       1 controller.go:1084] scheduleOperation[lock-provision-default/jenkins-data[cd605352-3552-11e8-a4b1-027e1d654f4a]]
ERROR: logging before flag.Parse: I0401 02:17:18.472007       1 leaderelection.go:156] attempting to acquire leader lease...
ERROR: logging before flag.Parse: I0401 02:17:18.495054       1 leaderelection.go:178] successfully acquired lease to provision for pvc default/jenkins-data
ERROR: logging before flag.Parse: I0401 02:17:18.495166       1 controller.go:1084] scheduleOperation[provision-default/jenkins-data[cd605352-3552-11e8-a4b1-027e1d654f4a]]
DEBU[0100] Created volume pvc-cd605352-3552-11e8-a4b1-027e1d654f4a 
INFO[0100] provisioner: created volume %vpvc-cd605352-3552-11e8-a4b1-027e1d654f4a 
ERROR: logging before flag.Parse: I0401 02:17:18.520361       1 controller.go:817] volume "pvc-cd605352-3552-11e8-a4b1-027e1d654f4a" for claim "default/jenkins-data" created
ERROR: logging before flag.Parse: I0401 02:17:18.560743       1 controller.go:834] volume "pvc-cd605352-3552-11e8-a4b1-027e1d654f4a" for claim "default/jenkins-data" saved
ERROR: logging before flag.Parse: I0401 02:17:18.560775       1 controller.go:870] volume "pvc-cd605352-3552-11e8-a4b1-027e1d654f4a" provisioned for claim "default/jenkins-data"
ERROR: logging before flag.Parse: I0401 02:17:20.520154       1 leaderelection.go:198] stopped trying to renew lease to provision for pvc default/jenkins-data, task succeeded
ERROR: logging before flag.Parse: I0401 02:40:17.965458       1 controller.go:1084] scheduleOperation[delete-pvc-cd605352-3552-11e8-a4b1-027e1d654f4a[cd757a95-3552-11e8-a4b1-027e1d654f4a]]
ERROR: logging before flag.Parse: I0401 02:40:17.976903       1 controller.go:1051] deletion of volume "pvc-cd605352-3552-11e8-a4b1-027e1d654f4a" ignored: ignored because Not owned by current node
INFO[1480] Longhorn engine longhorn-system/pvc-cd605352-3552-11e8-a4b1-027e1d654f4a-e has been deleted 
INFO[1480] Longhorn replica longhorn-system/pvc-cd605352-3552-11e8-a4b1-027e1d654f4a-r-3d17d0dd has been deleted 
INFO[1480] Longhorn volume longhorn-system/pvc-cd605352-3552-11e8-a4b1-027e1d654f4a has been deleted 
INFO[1480] Longhorn replica longhorn-system/pvc-cd605352-3552-11e8-a4b1-027e1d654f4a-r-4794fcd9 has been deleted 
ERROR: logging before flag.Parse: I0401 04:18:49.695131       1 controller.go:1084] scheduleOperation[lock-provision-default/jenkins-data[c74fd3cb-3563-11e8-a4b1-027e1d654f4a]]
ERROR: logging before flag.Parse: I0401 04:18:49.776078       1 leaderelection.go:156] attempting to acquire leader lease...
ERROR: logging before flag.Parse: I0401 04:18:52.045440       1 leaderelection.go:163] stopped trying to acquire lease to provision for pvc default/jenkins-data, task succeeded

@yasker
Copy link
Member

yasker commented Apr 1, 2018

@errorsandwarnings For 1.9.5, you need to follow https://github.com/rancher/longhorn#volume-can-be-attacheddetached-from-ui-but-kubernetes-podstatefulset-etc-cannot-use-it to check what's the path used by kubelet.

Especially:

User can find the correct directory by running ps aux|grep kubelet on the host and check the --volume-plugin-dir parameter. If there is none, the default /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ will be used.

@errorsandwarnings
Copy link
Author

@yasker : I do not think it is the same issue. I did check that one.

@errorsandwarnings
Copy link
Author

ps aux | grep kubelet

kubelet --kubeconfig=/etc/kubernetes/ssl/kubeconfig --allow-privileged=true --register-node=true --cloud-provider=rancher --healthz-bind-address=0.0.0.0 --cluster-dns=10.43.0.10 --fail-swap-on=false --cluster-domain=cluster.local --network-plugin=cni --cni-conf-dir=/etc/cni/managed.d --anonymous-auth=false --client-ca-file=/etc/kubernetes/ssl/ca.pem --pod-infra-container-image=rancher/pause-amd64:3.0 --cgroup-driver=cgroupfs --hostname-override kub1

@yasker
Copy link
Member

yasker commented Apr 1, 2018

@errorsandwarnings it's indeed sounds like the flexvolume plugin wasn't called at all. The flexvolume plugin will perform attach/detach operation on behavior of kubernetes. If the volume can be manually attach/detach through UI but cannot be attach/detach by kubernetes, it's most likely due to kubelet cannot find flexvolume plugin. Can you check the log for kubelet see if it reports error on finding longhorn flexvolume plugin?

@errorsandwarnings
Copy link
Author

@yasker : Where will I find the kubelet log in rancher, Tried to find everywhere. I can not find it anywhere.

@yasker
Copy link
Member

yasker commented Apr 1, 2018

Is kubelet deployed as a container? If not, try systemctl status kubelet.

Which version of Rancher you're using?

@errorsandwarnings
Copy link
Author

I am using Rancher 1.6.5

systemctl status kubelet
Unit kubelet.service could not be found.

Using Rancher UI - I went to kubelets and from their logs it shows below.


01/04/2018 10:39:31E0401 05:09:31.107293    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:31E0401 05:09:31.503045    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:31E0401 05:09:31.902416    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:32E0401 05:09:32.304236    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:32E0401 05:09:32.539632    7988 kubelet.go:1630] Unable to mount volumes for pod "jenkins_default(c767552b-3563-11e8-a4b1-027e1d654f4a)": timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins". list of unattached/unmounted volumes=[jenkins-home]; skipping pod
01/04/2018 10:39:32E0401 05:09:32.539720    7988 pod_workers.go:186] Error syncing pod c767552b-3563-11e8-a4b1-027e1d654f4a ("jenkins_default(c767552b-3563-11e8-a4b1-027e1d654f4a)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"jenkins". list of unattached/unmounted volumes=[jenkins-home]
01/04/2018 10:39:32E0401 05:09:32.702711    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:33E0401 05:09:33.103032    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:33E0401 05:09:33.503437    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:33E0401 05:09:33.901322    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:34E0401 05:09:34.302168    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:34E0401 05:09:34.703447    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:35E0401 05:09:35.102584    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:35E0401 05:09:35.503859    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:35E0401 05:09:35.907996    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:36E0401 05:09:36.302586    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:36E0401 05:09:36.701586    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:37E0401 05:09:37.102665    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matched
01/04/2018 10:39:37E0401 05:09:37.502611    7988 desired_state_of_world_populator.go:286] Failed to add volume "jenkins-home" (specName: "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a") for pod "c767552b-3563-11e8-a4b1-027e1d654f4a" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-c74fd3cb-3563-11e8-a4b1-027e1d654f4a" err=no volume plugin matche

@yasker
Copy link
Member

yasker commented Apr 1, 2018

OK, Rancher is deploying kubelet as a container, and it lacks necessary bind mounts to make Flexvolume driver working.

You need to bind-mount /usr/libexec/kubernetes/kubelet-plugins: /usr/libexec/kubernetes/kubelet-plugins into the kubelet container to make Flexvolume driver working.

You may not need to bind-mount /dev since kubelet container should already have that.

@errorsandwarnings
Copy link
Author

@yasker : I use Rancher in combination with cloud plugins to add hosts on fly. If I do this manually then how am I going to scale up dynamically. I will have to do this for each host I add in future then. Right ? That seems to require a fix.

@errorsandwarnings
Copy link
Author

@yasker : How come this is working for everyone ? I am pretty much on a default rancher settings.

@yasker
Copy link
Member

yasker commented Apr 1, 2018

@errorsandwarnings

For now, you can update the binding at https://github.com/rancher/rancher-catalog/blob/v1.6-development/infra-templates/k8s/45/docker-compose.yml.tpl#L69:26

Sorry that we missed it on 1.6. We will provide a guideline on how to enable Flexvolume on 1.6 soon.

@errorsandwarnings
Copy link
Author

@yasker : Waiting for you to provide me a solution better than stopping these kubelet manually in each node and starting again with commands. How to do that with hundreds of nodes ?

errorsandwarnings added a commit to errorsandwarnings/rancher-catalog that referenced this issue Apr 1, 2018
This is related to issue in longhorn longhorn/longhorn#54 
Affects other storage drivers too like OpenEBS
@yasker
Copy link
Member

yasker commented Apr 1, 2018

@errorsandwarnings Just a reminder that the yaml file is the latest one, you may want to use your own version as the base rather than the latest one to prevent unintended upgrades.

@errorsandwarnings
Copy link
Author

@yasker : Thanks, Adding a pull request for this to be merged with master. rancher/rancher-catalog#1117

@errorsandwarnings
Copy link
Author

@yasker : Got it.

@errorsandwarnings
Copy link
Author

errorsandwarnings commented Apr 1, 2018

@yasker : I tried copying rancher.io~longhorn dir to the /var/lib/kubelet/volumeplugins and upgrading kubelets by the --volume-plugin-dir=/var/lib/kubelet/volumeplugins

Now what I see is another issue


Events:
  Type     Reason                 Age   From               Message
  ----     ------                 ----  ----               -------
  Normal   Scheduled              50s   default-scheduler  Successfully assigned jenkins-2 to kub2
  Normal   SuccessfulMountVolume  49s   kubelet, kub2      MountVolume.SetUp succeeded for volume "default-token-vhpq2"
  Warning  FailedMount            49s   kubelet, kub2      MountVolume.SetUp failed for volume "pvc-0192ada7-3589-11e8-a4b1-027e1d654f4a" : mount command failed, status: Failure, reason: create volume fail: fail to parse size error parsing size 'null': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
  Normal   SuccessfulMountVolume  22s   kubelet, kub2      MountVolume.SetUp succeeded for volume "pvc-0192ada7-3589-11e8-a4b1-027e1d654f4a"
  Normal   Pulling                21s   kubelet, kub2      pulling image "jenkins/jenkins:lts"

@errorsandwarnings
Copy link
Author

@yasker

After changing the properties of yaml by removing "" for size value it works. But permission error are coming on the volume directory.

For instance jenkins container fails with the below error.

     
touch: cannot touch '/var/jenkins_home/copy_reference_file.log': Permission denied
Can not write to /var/jenkins_home/copy_reference_file.log. Wrong volume permissions?

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jenkins-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: jenkins
  namespace: default
spec:
  containers:
  - name: jenkins
    image: jenkins/jenkins
    imagePullPolicy: IfNotPresent
    volumeMounts:
    - name: jenkins-home
      mountPath: "/var/jenkins_home"
    ports:
    - containerPort: 80
  volumes:
  - name: jenkins-home
    persistentVolumeClaim:
      claimName: jenkins-data


@yasker
Copy link
Member

yasker commented Apr 1, 2018

Can you log in to the pod and check if the directory is writeable? It's maybe a Jenkins issue.

@yasker
Copy link
Member

yasker commented Apr 1, 2018

Also you can try the latest driver at: https://raw.githubusercontent.com/yasker/longhorn-manager/work/deploy/02-components/04-driver.yaml

Remind that you would need to update the FLEXVOLUME_DIR in the file to /var/lib/kubelet/volumeplugins.

After updated FLEXVOLUME_DIR, run:

kubectl delete -f 04-driver.yaml
kubectl create -f 04-driver.yaml

This will only upgrade the driver.

@wattwood
Copy link

wattwood commented Apr 3, 2018

I, too, am running into this issue.
Rancher: 1.6.15
Kubernetes: 1.9.0
-- Details:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.5-rancher1", GitCommit:"f11c6299ce2b927c3e34ea2afdf57cd08596802f", GitTreeState:"clean", BuildDate:"2018-03-20T16:40:55Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Is there a simple way to resolve this issue, or do I need to wait until the next release of Rancher? I am running the rancher container.

@yasker
Copy link
Member

yasker commented Apr 3, 2018

@wattwood You can workaround it for now:

  1. Add "--volume-plugin-dir=/var/lib/kubelet/volumeplugins" to the kubelet configuration, then restart the kubelet. (or add it to https://github.com/rancher/rancher-catalog/blob/v1.6-development/infra-templates/k8s/45/docker-compose.yml.tpl#L32 )
  2. Download the longhorn.yaml, add value: "/var/lib/kubelet/volumeplugins" in here https://github.com/rancher/longhorn/blob/master/deploy/longhorn.yaml#L304 , then redeploy the longhorn-flexvolume-driver-deployer only.

@wattwood
Copy link

wattwood commented Apr 3, 2018

@yasker I am on a baremetal installation with rancher:server running on a VM as a docker container. With this in mind, not using GKE, how would I modify the volume plugin directory setting, and, why move it away from the default?

Right now, the default location has a file:
/usr/libexec/kubernetes/kubelet-plugins/volume/exec/rancher.io~longhorn/longhorn

If Kubernetes is configured to use that by default, while the output (error) is the same, is this a different issue? Do I still need to mount the default location since it's not showing up in the kubelet?

@wattwood
Copy link

wattwood commented Apr 3, 2018

I found where to modify it.

@wattwood
Copy link

wattwood commented Apr 3, 2018

Alright, my kubelet is updated:
"command": [ 15 items
"kubelet",
"--kubeconfig=/etc/kubernetes/ssl/kubeconfig",
"--allow-privileged=true",
"--register-node=true",
"--cloud-provider=rancher",
"--healthz-bind-address=0.0.0.0",
"--cluster-dns=10.43.0.10",
"--fail-swap-on=false",
"--cluster-domain=cluster.local",
"--network-plugin=cni",
"--cni-conf-dir=/etc/cni/managed.d",
"--anonymous-auth=false",
"--client-ca-file=/etc/kubernetes/ssl/ca.pem",
"--pod-infra-container-image=rancher/pause-amd64:3.0",
"--volume-plugin-dir=/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
],
"dataVolumes": [ 10 items
"/run:/run:rprivate",
"/var/run:/var/run:rprivate",
"/sys:/sys:ro,rprivate",
"/var/lib/docker:/var/lib/docker:rprivate",
"/var/lib/kubelet:/var/lib/kubelet:shared",
"/var/log/containers:/var/log/containers:rprivate",
"/var/log/pods:/var/log/pods:rprivate",
"rancher-cni-driver:/etc/cni:ro",
"rancher-cni-driver:/opt/cni:ro",
"/dev:/host/dev:rprivate"
],

I still have the error. Should I now switch to /var/lib/kubelet/volumeplugins? The folder doesn't exist on my worker nodes.

4/3/2018 4:50:06 PME0403 22:50:06.383842 20770 desired_state_of_world_populator.go:286] Failed to add volume "volv" (specName: "pvc-56cd3c9f-3776-11e8-b95f-02d1e7c5d723") for pod "4f3bcb8a-3790-11e8-b95f-02d1e7c5d723" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-56cd3c9f-3776-11e8-b95f-02d1e7c5d723" err=no volume plugin matched
4/3/2018 4:50:06 PME0403 22:50:06.784412 20770 desired_state_of_world_populator.go:286] Failed to add volume "volv" (specName: "pvc-56cd3c9f-3776-11e8-b95f-02d1e7c5d723") for pod "4f3bcb8a-3790-11e8-b95f-02d1e7c5d723" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-56cd3c9f-3776-11e8-b95f-02d1e7c5d723" err=no volume plugin matched
4/3/2018 4:50:06 PME0403 22:50:06.960782 20770 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
4/3/2018 4:50:07 PME0403 22:50:07.184386 20770 desired_state_of_world_populator.go:286] Failed to add volume "volv" (specName: "pvc-56cd3c9f-3776-11e8-b95f-02d1e7c5d723") for pod "4f3bcb8a-3790-11e8-b95f-02d1e7c5d723" to desiredStateOfWorld. err=failed to get Plugin from volumeSpec for volume "pvc-56cd3c9f-3776-11e8-b95f-02d1e7c5d723" err=no volume plugin matched

I also updated longhorn-flexvolume-driver-deployer to include:
- name: FLEXVOLUME_DIR
value: "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"

@yasker
Copy link
Member

yasker commented Apr 3, 2018

@wattwood You need to change directory to /var/lib/kubelet/volumeplugins since /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ wasn't bind-mounted into kubelet container.

@errorsandwarnings
Copy link
Author

@wattwood : Wrong command.
use --volume-plugin-dir=/var/lib/kubelet/volumeplugins

@errorsandwarnings
Copy link
Author

@wattwood : Same goes for flexvolume DIR.

@wattwood
Copy link

wattwood commented Apr 4, 2018

@errorsandwarnings & @yasker: It all fell into place on why it wasn't a good idea to add the missing mount to the kubelet. I did need to make sure in the longhorn-flexvolume-driver-deployer that the value had a / at the end, otherwise K8S threw a JSON error:
value: "/var/lib/kubelet/volumeplugins/"

4/3/2018 10:51:42 PME0404 04:51:42.449075 64840 driver-call.go:237] Failed to unmarshal output for command: mount, output: "", error: unexpected end of JSON input

@yasker
Copy link
Member

yasker commented Apr 4, 2018

@wattwood

Can you manually attach the volume to the host through longhorn UI? The error means something is wrong with the mount call.

What guest OS you're using?

Can you post the log of longhorn-manager?

errorsandwarnings added a commit to errorsandwarnings/rancher-catalog that referenced this issue Apr 11, 2018
This is related to issue in longhorn longhorn/longhorn#54
Affects other storage drivers too like OpenEBS

Adding changes by Review from @yasker
errorsandwarnings added a commit to errorsandwarnings/rancher-catalog that referenced this issue Apr 11, 2018
This is related to issue in longhorn longhorn/longhorn#54
Affects other storage drivers too like OpenEBS

Adding changes by Review from @yasker
@yasker
Copy link
Member

yasker commented Apr 13, 2018

PR to the Rancher 1.6 Kubernetes has been merged. Close this issue.

@yasker yasker closed this as completed Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants