New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod is not able to mount openebs PVC #1688
Comments
Thanks @rtnpro for creating the issue with these detailed steps! Could you run the following commands on both the nodes - node0 and node1 and share the output please.
|
@kmova Here you go:
However, kubespray runs
Here's the container running on the node:
|
This shows that iscsiadm is able to login from node0 and kubelet is running directly on the node. Couple of more trouble shooting steps from
|
@kmova Here you go: [root@node0 ~]# iscsiadm -m node session
[root@node0 ~]# iscsiadm -m node -l
[root@node0 ~]# iscsiadm -m node session
[root@node0 ~]# iscsiadm -m node -u
[root@node0 ~]# iscsiadm -m node session
kubectl logs -f -n redis pvc-a036d681-8fd4-11e8-ad96-de1a202c9007-ctrl-6c77c545d9-zsl9v pvc-a036d681-8fd4-11e8-ad96-de1a202c9007-ctrl-con
kubectl logs -f -n redis pvc-a036d681-8fd4-11e8-ad96-de1a202c9007-ctrl-6c77c545d9-zsl9v maya-volume-exporter
|
@kmova I will try out version |
@kmova I updated the openebs setup to
kubectl describe storageclass openebs-db
|
I may have to setup this locally to further debug. I will use the gist you provided above. |
@kmova Here's the output for
|
I just built a new cluster and had exactly the same iscsiadm errors in my pods. On my system I had set my default storage class to use xfs. It turned out that my new node did not have xfsprogs installed, so it failed to create the filesystem - which then failed to mount. The crucial clue for my situation appeared in my node's dmesg. It might be worth looking to see if you have anything similar in your node's dmesg.
|
Thanks @halcyon for sharing the update. |
I'd like to add that my comment was based on version |
Anyone have a fix for this? I am having this exact same errors using Rancher 2.13, K8s 1.11.5, Docker 18.6 and Openebs 0.8.
|
Hi @rhugga, @BalaBalaYi, @halcyon, @rtnpro here is the blog for setting up openebs on custom rancher cluster and let us know if it is helping you. Most probably docker version seems to be a problem, since rancher only support following version of docker
|
Hi @rhugga, @BalaBalaYi, @halcyon, @rtnpro i was able to reproduce this in my lab. And here is what i have done to fix it with the help from @sonasingh46 @shan. Kubelet is running with this config:
iscsi-initiator-utils installed on all nodes using the command below, but that doesn't by default install the iscsi_tcp driver
Seperately installed iscsi_tcp on all worker nodes Here is how it looks after loading iscsi_tcp
After this; After above steps:
Here is the description of a percona app deployed with 3 replicas:
|
I was using CAS..... what is the correct storage class for cstor pool? The storage class doesn't seem like it was my problem. I get past the lun allocation and it fails when trying to mount it. Outside of the storage class, my config is exactly the same as yours. Not sure how/why yours is working.
|
Can you please give more detail on your environment because I'm still seeing the exact same issues.
|
I got into this state when I lost a node in my cluster...I'm confused -- I thought this was the exact problem OpenEBS was supposed to solve? |
Hi @kmova, thanks very much for replying! node master
node1 iZ2zef6lmymbaovo4qdl84Z
node2 iZ2ze9ockjuosa8l5k8aydZ
Thanks very much. |
@Yaxian .. from the above, I can confirm that the iSCSI is installed and the required kernel modules are loaded on the host. In the iscsid logs, I am still missing From the logs provided, above - I see that some of the volumes are logged in. Did this mount error start occurring recently? Were there any recent changes/upgrades done to cluster? How is the K8s cluster built? kubeadm? rancher? Would it be possible to connect the cluster to https://mayaonline.io - to help with debugging? If not, I will reach out for more logs and configuration. |
@kmova Thanks!I don't know how to check the logs of |
@Yaxian you can run |
@utkarshmani1997 Thanks very much. Here is the pv logs
|
@Yaxian is it possible to get complete logs since above logs are not helping us. |
@Yaxian Thank you for the logs, just looked at the controller logs, replica got disconnected but controller could not remove it. This is a known issue: #2403 . BTW which version of openebs you are using, recently we have made few improvements in Can you provide us stack trace of controller if possible ? |
@utkarshmani1997 Thank you for replying. The version is 'openebs-0.8.1', installed by helm. Here is the goroutine.log. Thank you again. |
@Yaxian did you try restarting controller, just let me know if that is helping you. |
Sorry, I made a mistake. The version is
I will try update to |
No problem @Yaxian |
Hi @utkarshmani1997 , I have updated the ctrl verion to
and delete the ctrl pod. Here is the goroutine.log. Should I delete the deployment, svc, pv, pvc of Thank you. |
@Yaxian - Did you upgrade using the steps mentioned here: https://docs.openebs.io/docs/next/upgrade.html (You don't need to delete anything. The steps mentioned in the above docs, will update the 0.8.0 components to 0.8.1) |
Yes.
|
hi @Yaxian .. sorry for delayed response. I notice that this got missed. |
@Yaxian can you please join the slack by following below https://github.com/openebs/openebs/blob/master/community/README.md |
Thanks @Yaxian for helping resolve this issue. (Thanks @halcyon for the tip above.) The issue got resolved after installing
The kubelet had the following errors during the mount process:
And dmesg was showing errors like:
|
It may happen when the value of |
The steps were derived from the issue. openebs/openebs#1688
@Yaxian The resolution step is also documented. https://docs.openebs.io/docs/next/troubleshooting.html#unable-to-mount-xfs-volume |
Thanks guys who did all the troubleshooting. In my case, I have found that if iscsi tool is not pre installed on node, even after installing the tool, the controller pod fails to run on port 3260. root@pvc-20149e4c-292e-4d8f-b9f3-139c2a03b2a3-ctrl-5bfb77ccf5-5gbjt:/# netstat -tulpn |
Please, I have the same issue. When I run [12497.347553] blk_update_request: I/O error, dev sdc, sector 70920 op 0x1:(WRITE) flags 0x800 phys_seg 6 prio class 0
[12497.354007] Buffer I/O error on dev sdc, logical block 8865, lost async page write
[12497.358397] Buffer I/O error on dev sdc, logical block 8866, lost async page write
[12497.362752] Buffer I/O error on dev sdc, logical block 8867, lost async page write
[12497.367255] Buffer I/O error on dev sdc, logical block 8868, lost async page write
[12497.371691] Buffer I/O error on dev sdc, logical block 8869, lost async page write
[12498.344212] sd 4:0:0:0: [sdc] tag#6 timing out command, waited 180s
[12498.348436] sd 4:0:0:0: [sdc] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12498.348446] sd 4:0:0:0: [sdc] tag#6 CDB: Write(10) 2a 00 00 04 00 00 00 00 10 00
[12498.348454] blk_update_request: I/O error, dev sdc, sector 262144 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[12498.354100] buffer_io_error: 1 callbacks suppressed
[12498.354113] Buffer I/O error on dev sdc, logical block 32768, lost async page write
[12498.358328] Buffer I/O error on dev sdc, logical block 32769, lost async page write
[12499.344950] sd 4:0:0:0: [sdc] tag#7 timing out command, waited 180s
[12499.350506] sd 4:0:0:0: [sdc] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12499.350514] sd 4:0:0:0: [sdc] tag#7 CDB: Write(10) 2a 00 00 0c 00 00 00 00 10 00
[12499.350519] blk_update_request: I/O error, dev sdc, sector 786432 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[12499.357107] Buffer I/O error on dev sdc, logical block 98304, lost async page write
[12499.361597] Buffer I/O error on dev sdc, logical block 98305, lost async page write
[12500.346264] sd 4:0:0:0: [sdc] tag#8 timing out command, waited 180s
[12500.351507] sd 4:0:0:0: [sdc] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12500.351518] sd 4:0:0:0: [sdc] tag#8 CDB: Write(10) 2a 00 00 14 00 00 00 00 10 00
[12500.351528] blk_update_request: I/O error, dev sdc, sector 1310720 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[12500.358211] Buffer I/O error on dev sdc, logical block 163840, lost async page write
[12500.363707] Buffer I/O error on dev sdc, logical block 163841, lost async page write
[12501.347321] sd 4:0:0:0: [sdc] tag#9 timing out command, waited 180s
[12501.352639] sd 4:0:0:0: [sdc] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12501.352656] sd 4:0:0:0: [sdc] tag#9 CDB: Write(10) 2a 00 00 1c 00 00 00 00 10 00
[12501.352671] blk_update_request: I/O error, dev sdc, sector 1835008 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[12501.359322] Buffer I/O error on dev sdc, logical block 229376, lost async page write
[12501.364827] Buffer I/O error on dev sdc, logical block 229377, lost async page write
[12502.349078] sd 4:0:0:0: [sdc] tag#10 timing out command, waited 180s
[12502.354925] sd 4:0:0:0: [sdc] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12502.354949] sd 4:0:0:0: [sdc] tag#10 CDB: Write(10) 2a 00 00 24 00 00 00 00 10 00
[12502.354960] blk_update_request: I/O error, dev sdc, sector 2359296 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[12502.364084] Buffer I/O error on dev sdc, logical block 294912, lost async page write
[12502.370920] Buffer I/O error on dev sdc, logical block 294913, lost async page write
[12503.369999] sd 4:0:0:0: [sdc] tag#1 timing out command, waited 180s
[12503.389548] sd 4:0:0:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12503.389558] sd 4:0:0:0: [sdc] tag#1 CDB: Write(10) 2a 00 00 00 00 00 00 0a 00 00
[12503.389572] blk_update_request: I/O error, dev sdc, sector 0 op 0x1:(WRITE) flags 0x4800 phys_seg 320 prio class 0
[12503.394808] Buffer I/O error on dev sdc, logical block 0, lost async page write
[12503.399033] Buffer I/O error on dev sdc, logical block 1, lost async page write
[12503.403316] Buffer I/O error on dev sdc, logical block 2, lost async page write
[12503.407690] Buffer I/O error on dev sdc, logical block 3, lost async page write
[12503.411903] Buffer I/O error on dev sdc, logical block 4, lost async page write
[12503.416210] Buffer I/O error on dev sdc, logical block 5, lost async page write
[12503.421254] Buffer I/O error on dev sdc, logical block 6, lost async page write
[12503.425704] Buffer I/O error on dev sdc, logical block 7, lost async page write
[12503.429887] Buffer I/O error on dev sdc, logical block 8, lost async page write
[12503.433855] Buffer I/O error on dev sdc, logical block 9, lost async page write
[12504.387007] sd 4:0:0:0: [sdc] tag#16 timing out command, waited 180s
[12504.391792] sd 4:0:0:0: [sdc] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[12504.391810] sd 4:0:0:0: [sdc] tag#16 CDB: Write(10) 2a 00 00 00 0a 00 00 0a 00 00
[12504.391820] blk_update_request: I/O error, dev sdc, sector 2560 op 0x1:(WRITE) flags 0x4800 phys_seg 320 prio class 0 Any idea ? |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
What happened:
During deployment of a pod with RWO PVC from openebs storageclass, it got stuck in
ContainerCreating
stage for ever, because it cannot mount openebs volume.Please note that
10.233.27.8
is IP of a service:I am able to access
10.233.27.8:3260
from myk8s
worker nodes:What you expected to happen:
The pod should have been able to properly mount openebs volume and start.
How to reproduce it (as minimally and precisely as possible):
kubespray
to provision myk8s
cluster on baremetal nodes running CentOS 7.4, which runskubelet
binary on the worker nodes. This the kubespray config I used for setting up myk8s
cluster: https://gist.github.com/rtnpro/14943d261e88d01685d7400cc6880ed5.iscsi-initiator-utils
as mentioned here: https://blog.openebs.io/using-openebs-as-kubernetes-persistent-volume-daccae4bdce2iscisd
service is running as per https://blog.openebs.io/using-openebs-as-kubernetes-persistent-volume-daccae4bdce2 or is disabled as per Use OpenEBS with Rancher 2 and PostgreSQL #1450 (comment)Anything else we need to know?:
kubectl get nodes -o wide
kubectl get pods --all-namespaces -o wide
kubectl get services --all-namespaces -o wide
kubectl get sc
kubectl get pv
kubectl get pvc --all-namespaces
cat /etc/os-release
Kernel
Build from Scaleway:
4.4.127-mainline-rev1
Installed helm app:
stable/redis
with followingvalues.yaml
:Openebs storageclass and storagepool defs
The text was updated successfully, but these errors were encountered: