Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to resolve server nfs-server.kubeflow.svc.cluster.local: Name or service not known for pipelines #2443

Closed
dredwilliams opened this issue Feb 10, 2019 · 3 comments · Fixed by #2446

Comments

@dredwilliams
Copy link

Installed latest kubeflow on private kubernetes 1.13.3 cluster with flannel networking.

Got to the 'kfctl.sh apply k8s' step in the getting started guide, and everything starts up except for the minio pod, which doesn't start because it's storage cannot be mounted. kubectl (and kubelet logs) give the error message above:

Feb 10 15:24:59 compute6 systemd: Started Kubernetes transient mount for /var/lib/kubelet/pods/e8c69ac7-2d71-11e9-ba34-000acd2a54d1/volumes/kubernetes.io~nfs/nfs-pv.
Feb 10 15:24:59 compute6 kubelet: E0210 15:24:59.275316    3905 mount_linux.go:152] Mount failed: exit status 32
Feb 10 15:24:59 compute6 kubelet: Mounting command: systemd-run
Feb 10 15:24:59 compute6 kubelet: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/e8c69ac7-2d71-11e9-ba34-000acd2a54d1/volumes/kubernetes.io~nfs/nfs-pv --scope 
Feb 10 15:24:59 compute6 kubelet: Output: Running scope as unit run-31270.scope.
Feb 10 15:24:59 compute6 kubelet: mount.nfs: Failed to resolve server nfs-server.kubeflow.svc.cluster.local: Name or service not known

I've tried provisioning the three PVCs dynamically, and manually, but it seems to make no difference. When I look at the PVs using kubectl, the nfs-pv shows as bound to the nfs-pvc as it should, but the system doesn't seem to be able to mount it to the minio pod -- leaving it in the ContainerCreating state.

Eventually, other pods in the 'ml-pipeline-*' set start dying and restarting, but I am assuming it is a side effect of not being able to mount the nfs-pv ...

Thoughts?

@jlewi jlewi changed the title Failed to resolve server nfs-server.kubeflow.svc.cluster.local: Name or service not known Failed to resolve server nfs-server.kubeflow.svc.cluster.local: Name or service not known for pipelines Feb 11, 2019
@jlewi
Copy link
Contributor

jlewi commented Feb 11, 2019

@dredwilliams I think this issue is limited to pipelines; so you should be able to use other parts of Kubeflow without resolving this issue.

@IronPan @vicaire This looks like an issue with pipelines. Is NFS required by minio? Is pipelines installing an NFS server?

@IronPan
Copy link
Member

IronPan commented Feb 11, 2019

The nfs shouldn't be needed unless explicitly specified. I'll send a fix shortly.

@dredwilliams
Copy link
Author

That update took care of the issue -- thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants