Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CephFS Provisioner Input/Output Error #345

Closed
tmysl opened this issue Sep 8, 2017 · 23 comments · Fixed by #773

Comments

@tmysl
Copy link

commented Sep 8, 2017

We are having issues with the cephfs provisioner against a few of our kube clusters:

The Provisioner runs correctly and I am able to create a PVC and mount it in my container but when i perform any operations against the mount I am presented with an I/O error.

kubectl exec -it test-pod touch /mnt/test --namespace=sre touch: cannot touch '/mnt/test': Input/output error

The behavior is visible outside the container as well, this is on a node with the cephfs pool mounted within the directories the provisioner created:

ls: cannot access /data/cephfs/volumes/kubernetes/kubernetes-dynamic-pvc-3041dba2-9415-11e7-862f-169051d230b2: Input/output error
total 0
drwxr-xr-x 1 root root 1 Sep  7 21:40 .
drwxr-xr-x 1 root root 2 Sep  7 21:40 ..
d????????? ? ?    ?    ?            ? kubernetes-dynamic-pvc-3041dba2-9415-11e7-862f-169051d230b2


[root@ceph-client001 tmysliwiec]# ls -la /data/cephfs/volumes/kubernetes/kubernetes-dynamic-pvc-3041dba2-9415-11e7-862f-169051d230b2
ls: cannot access /data/cephfs/volumes/kubernetes/kubernetes-dynamic-pvc-3041dba2-9415-11e7-862f-169051d230b2: Input/output error

I'm able to perform operations agains the cephfs everywhere else, and the cluster has been healthy and running mounts on tons of other non-kubernetes nodes. Not sure what would cause this behavior. Let me know if theres anything i can do to resolve this, thanks in advance!

@kairen

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2017

I also got the same error, more details as below:

$ kubectl describe po/fs-mount-pod
...
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  41s		41s		1	default-scheduler			Normal		Scheduled		Successfully assigned fs-mount-pod to node4
  41s		41s		1	kubelet, node4				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-txx5l"
  41s		8s		7	kubelet, node4				Warning		FailedMount		MountVolume.SetUp failed for volume "pvc-a45a87c3-95e1-11e7-b116-448a5b81d4bd" : CephFS: mount failed: mount failed: exit status 5
Mounting command: mount
Mounting arguments: 172.20.3.92:6789,172.20.3.93:6789,172.20.3.95:6789:/volumes/kubernetes/kubernetes-dynamic-pvc-a468a609-95e1-11e7-a436-fa8092874d5d /var/lib/kubelet/pods/1eeb3320-95e2-11e7-b116-448a5b81d4bd/volumes/kubernetes.io~cephfs/pvc-a45a87c3-95e1-11e7-b116-448a5b81d4bd ceph [name=kubernetes-dynamic-user-a468a63c-95e1-11e7-a436-fa8092874d5d,secret=AQBZwbRZtOLULhAAiFMjKIb8Q35OlS1cpJaSrw==]
Output: mount error 5 = Input/output error
@tmysl

This comment has been minimized.

Copy link
Author

commented Sep 12, 2017

@kairen the mount completes successfully on my end, the I/O errors are when I try to read/write to the mount dir. Historically when i've seen mount error 5 with ceph it was the result of a failure to access your MDS servers. check if you can reach them from within your container.

@playworker

This comment has been minimized.

Copy link

commented Sep 18, 2017

I'm experiencing the same error as @kairen - externally from K8s I can mount all of the directory tree but not the final kubernetes-dynamic-pvc-... directory. Ubuntu 16.04, Kernel 4.4.0

@rootfs

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

@tmysl @playworker can you mount manually the top level directory, i.e.

mount -o ceph 172.20.3.92:6789,172.20.3.93:6789,172.20.3.95:6789:/volumes/ /mnt -o name=kubernetes-dynamic-user-a468a63c-95e1-11e7-a436-fa8092874d5d,secret=AQBZwbRZtOLULhAAiFMjKIb8Q35OlS1cpJaSrw==

@playworker

This comment has been minimized.

Copy link

commented Sep 18, 2017

@rootfs yes I can, same with the /volumes/kubernetes directory, it's only the final kubernetes-dynamic-pvc-... directory that I can't mount manually

@rootfs

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

@playworker there is a known limitation 3rd bullet here

I haven't tested newer kernel yet

@playworker

This comment has been minimized.

Copy link

commented Sep 18, 2017

Thanks @rootfs, I'm not particularly familiar with CephFS - what does the 3rd bullet mean? It reads to me like it's has security implications rather than functional implications.
I guess we're talking about this limitation: "Currently each Ceph user created by the provisioner has allow r MDS cap to permit CephFS mount."?

@rootfs

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

see "Path Restriction" here

@playworker

This comment has been minimized.

Copy link

commented Sep 18, 2017

@rootfs I'm sorry, I still don't understand why you think that each user being given the "allow r" MDS cap limitation is the cause of my issue. The mount command has the user specified, when the provisioner created that user I assume they have been given the "allow r" capability and therefore should be able to mount the path specified?

@rootfs

This comment has been minimized.

Copy link
Member

commented Sep 18, 2017

@playworker can you post your ceph auth for user kubernetes-dynamic-user-a468a63c-95e1-11e7-a436-fa8092874d5d

@playworker

This comment has been minimized.

Copy link

commented Sep 18, 2017

My particular issue was to do with namespace support in the CephFS kernel driver, this wasn't introduced until kernel versions 4.7/4.8 (not sure exactly which), upgrading the Kubernetes workers to kernel version 4.10 has fixed the problem - thanks for your help everyone :)

@tmysl

This comment has been minimized.

Copy link
Author

commented Sep 19, 2017

Thanks Guys for all your comments - we were indeed running on centos 7/kernel 3.10.* - upon upgrading the kernel to mainline on one node and setting up a taint & toleration we were able to get the provisioner and accompanying test-pod to mount and allow us to manipulate the dir.

@Adriien-M

This comment has been minimized.

Copy link

commented Nov 8, 2017

I tried to update my ubuntu kernel from 4.4.0 to 4.10.0 (sudo apt install linux-image-4.10.0-28-generic) and after a reboot, the error is gone, everything works fine from now on :)

openstack-gerrit pushed a commit to openstack/openstack-helm-infra that referenced this issue Jan 9, 2018
Gate: Deploy HWE kernel on ubuntu hosts
This PS deploys the HWE kernel on Ubuntu Hosts, which is required
for CephFS:
 * kubernetes-incubator/external-storage#345

Change-Id: I2ebd46eadf5a4c7a857d42302f388511691ab0db
@wongma7

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2018

resolved, kernel related
/area ceph/cephfs

@ktpktr0

This comment has been minimized.

Copy link

commented Aug 1, 2018

I also encountered this problem, but I don't know how to solve this problem. Is it necessary to upgrade the ceph version to mimic? I am using "quay.io/external_storage/cephfs-provisioner:latest", I see someone modifying cephfs_provisoner.py to remove the namespace part, I don't know if it can be solved, and I don't know how to modify this image

My k8s version is 1.10.3, the ceph version is luminous, using centos7.4

@ktpktr0

This comment has been minimized.

Copy link

commented Aug 1, 2018

@tmysl Hello, can you tell me the specific steps? Thank you

@ktpktr0

This comment has been minimized.

Copy link

commented Aug 6, 2018

Upgrade the linux kernel to 4.10 and above or upgrade the ceph to mimic version

@kinzess

This comment has been minimized.

Copy link

commented Aug 14, 2018

Already upgrade ceph to mimic,but no help.

ceph version

ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)

i will try upgrade the kernel to 4.10 now.

@majorinche

This comment has been minimized.

Copy link

commented Aug 22, 2018

@kinzess how about the result now?

@kinzess

This comment has been minimized.

Copy link

commented Aug 22, 2018

@majorinche first i upgrade my centos7 kernel from 3.10.* to 4.17.* (kernel-ml-4.17.14-1.el7.elrepo.x86_64.rpm), all PGs lost.
then upgrade to 4.18.* (kernel-ml-4.18.3-1.el7.elrepo.x86_64.rpm), all PGs back.
everything works fine now.

@majorinche

This comment has been minimized.

Copy link

commented Aug 22, 2018

ok, i just add -disable-ceph-namespace-isolation=true in the deployment, Input/output error just gone
- args:
- '-id=cephfs-provisioner-1'
- '-disable-ceph-namespace-isolation=true'
command:
- /usr/local/bin/cephfs-provisioner

@chenqiao3199

This comment has been minimized.

Copy link

commented Oct 17, 2018

@majorinche my 'Input/output error' just gone, using your way. thanks you very much!!!

@swenzi

This comment has been minimized.

Copy link

commented Nov 27, 2018

my 'Input/output error' just gone, using your way. thanks you very much!!!

csk-bsi added a commit to BSI-Business-Systems-Integration-AG/external-storage that referenced this issue Jan 4, 2019
Fixed CephFS Provisioner Input/Output Error
- This fix might be obsolete after switching to newer linux kernel => 
4.0
- kubernetes-incubator#345
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.