volumes/nfs example: service name instead hardcoded IP #44528

z1nkum · 2017-04-15T15:10:07Z

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.6", GitCommit:"114f8911f9597be669a747ab72787e0bd74c9359", GitTreeState:"clean", BuildDate:"2017-03-28T13:36:31Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: GCP

What happened:
I've tried this great nfs example here

I try to use service name instead of hardcoded IP in PersistentVolume configuration and it is not work with "Failed to resolve server nfs-server.default.svc.cluster.local: Temporary failure in name resolution" in pod status. At the same time I can ping this FQDN (and just host nfs-server) from other containers.

I saw annotation in README.md:

In the future, we'll be able to tie these together using the service names, but for now, you have to hardcode the IP.

So, what is your plans for this feature?

The text was updated successfully, but these errors were encountered:

wongma7 · 2017-04-17T17:53:10Z

Old issue here: #8735 . The message in the README is about as old. I would be interested in this too :)

saad-ali · 2017-04-21T22:34:59Z

CC @kubernetes/sig-storage-feature-requests

fejta-bot · 2017-12-24T17:31:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-23T18:19:25Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

jtgans · 2018-02-04T07:48:57Z

Ran into this, myself, while attempting to configure my storageclasses to speak to a heketi glusterfs pod by service name. When I rebooted my cluster, the cluster IP changed, which broke my glusterfs storage solution.

msau42 · 2018-02-04T17:40:55Z

I believe the issue is the node's resolv.conf needs to be configured to point to Kubernetes' dns service.

This is a host configuration that needs to be done per deployment. I believe we do it automatically for GCE/GKE but I'm unsure about other environments. cc @jingxu97

jtgans · 2018-02-04T18:42:05Z

So how would I setup the resolve.conf files on each node to resolve to kube-dns? The service for it is setup as a ClusterIP service, and in a CNI environment like flannel, that becomes non accessible from outside of a container. Can I edit the service to set it up as a NodePort and then direct resolve.conf to every node in the cluster?

…

On Feb 4, 2018 9:41 AM, "Michelle Au" ***@***.***> wrote: I believe the issue is the node's resolv.conf needs to be configured to point to Kubernetes' dns service. This is a host configuration that needs to be done per deployment. I believe we do it automatically for GCE/GKE but I'm unsure about other environments. cc @jingxu97 <https://github.com/jingxu97> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#44528 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAkVBXltTyCzpb8DYn0gh9DheDhoy9xJks5tRevQgaJpZM4M-X4Q> .

msau42 · 2018-02-05T19:20:45Z

Hm maybe @kubernetes/sig-network-misc knows something that can be done here.

The problem is volume mounts are done by kubelet, so the nfs server IP/hostname needs to be accessible to kubelet's network.

jtgans · 2018-02-05T19:32:12Z

Which on a baremetal deployment should be the node's/host's network, correct?

msau42 · 2018-02-05T19:34:12Z

Correct. So if your nfs server is being provided by a Pod, then you need the node/host network to be able to access the pod's network, which like you pointed out, could be tricky depending on how you've configured your network.

jtgans · 2018-02-05T19:45:44Z

Okay, so I see two solutions to this problem, then:

Adjust the Service for kube-dns from ClusterIP to NodePort and adjust the node's /etc/resolv.conf to point to the local IPs to get name resolution working.
Adjust the Service for NFS or GlusterFS (in my case) from ClusterIP to NodePort, and then change the StorageClass to point to one of the node's static IPs in the node's subnet.

Of the two, the first seems like the more generic solution for getting name resolution working across the cluster, but may have unintended side-effects if things are setup to expect it as a ClusterIP. The second solves this direct problem. I'll try the second option when I get home tonight and report back.

We may want to update the public facing docs to mention that StorageClass definitions use the kubelet's network so that others don't run aground when trying to set this up.

yaumeg · 2018-02-20T23:08:58Z

Did you have a chance to experiment a bit ? For solution 2), Nodeport define ports in the 30000-32767 range, so it seems also necessary to modify nfs PV default ports ? (2049 & 111)

I have a baremetal setup with flannel, and editing resolv.conf doesn't work, because like you pointed out, my nodes don't have access to container's network.

jtgans · 2018-02-25T03:09:53Z

Unfortunately, I can't /edit/ the restUrl for my storageclass because "updating parameters is illegal"... I may end up losing data by deleting the storage class and recreating it with the right url.

jtgans · 2018-02-25T03:17:15Z

Fortunately my analysis was wrong -- didn't lose any data at all, thankfully.

So what I've done is update my Services to be NodePorts, exposed them on port 32708, and set my resturls in the storageclasses to http://<random-node-ip-from-cluster>:32708. This allows things to continue to work, but there are two major downsides now:

My REST endpoint is available from outside of the cluster.
If the node that I chose in the rest URL falls over, my storage provisioner becomes unreachable because I can't edit the resturls once I've set them.
My REST endpoint is avialable from outside of the cluster.
My REST endpoint is avialable from outside of the cluster.

But things work at least.

MrHohn · 2018-02-25T03:42:00Z

When I rebooted my cluster, the cluster IP changed, which broke my glusterfs storage solution.

@jtgans Sorry that I might be missing the point, why rebooting cluster changes the cluster IP? Did the NFS service got deleted and recreated? If so, what about giving the NFS service a fixed IP in manifest and keeping it as type=ClusterIP?

jtgans · 2018-02-25T16:18:27Z

I actually didn't realize that ClusterIP services could specify their IPs in the definition. I'll give this a try today.

Pretty sure I didn't recreate the service post reboot, but it's been a while since I restarted the cluster.

mtricolici · 2018-03-14T13:17:10Z

use full service name, it works fine:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-zuzu
spec:
  capacity:
    storage: 1Mi
  accessModes:
    - ReadWriteMany
  nfs:
    server: nfs-server.build.svc.cluster.local
    path: "/"

fejta-bot · 2018-04-13T13:27:57Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

digglife · 2018-06-15T09:01:27Z

@mtricolici Tried. Didn't work.

sjerdo · 2018-08-06T09:04:54Z

@mtricolici @digglife 'build' should be replaced by the namespace.
E.g. when using no namespace (default), the value for server should be nfs-server.default.svc.cluster.local

PaulMazzuca · 2018-08-22T15:38:50Z

I have also tried using the service name, and that did not work. Can you clarify the kubectl command that gets the "full" service name that resolves?

PaulMazzuca · 2018-08-22T15:51:05Z

Take that back. I just got it to work using "{service-name}.{namespace}.svc.cluster.local". I did not realize that svc.cluster.local was always the same.

wsourdin · 2019-05-15T16:59:01Z

I just tried {service-name}.{namespace}.svc.cluster.local on EKS and it doesn't work.

e.g. service name nfs-service on default namespace

I got

mount.nfs: Failed to resolve server nfs-service.default.svc.cluster.local: Name or service not known

They are still no fix or clean workaround available?

bmbferreira · 2019-05-21T13:28:00Z

I just tried {service-name}.{namespace}.svc.cluster.local on EKS and it doesn't work.

e.g. service name nfs-service on default namespace

I got

mount.nfs: Failed to resolve server nfs-service.default.svc.cluster.local: Name or service not known

They are still no fix or clean workaround available?

I'm also having this issue on EKS.

MikeDaniel18 · 2019-05-25T10:15:29Z

Also having this issue on Digital Ocean.

rjohnson3 · 2019-07-02T17:08:30Z

/reopen

k8s-ci-robot · 2019-07-02T17:08:37Z

@rjohnson3: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kendru · 2019-07-10T21:31:00Z

Also having this issue on GKE

noeliajimenezg · 2019-07-30T08:57:01Z

Same issue on OpenStack.

linclaus · 2019-09-12T08:33:34Z

Same issue on NFS

will-beta · 2019-09-17T08:18:09Z

I just tried {service-name}.{namespace}.svc.cluster.local on EKS and it doesn't work.

e.g. service name nfs-service on default namespace

I got

mount.nfs: Failed to resolve server nfs-service.default.svc.cluster.local: Name or service not known

They are still no fix or clean workaround available?

I'm also having this issue on AKS.

k8s-ci-robot · 2019-09-17T08:18:34Z

@will-beta: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cl4u2 · 2019-11-04T15:25:20Z

Same issue on CDK.

pentago · 2020-02-17T21:18:25Z

From what I just learned this is the issue only on non-GKE Kubernetes. Can't wait for upstream fix.so we can get proper service DNS name resolution on all providers.

Any progress perhaps?

colearendt · 2020-02-23T10:54:58Z

Worth noting that the example highlighted here is now at: https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs

pentago · 2020-02-23T21:08:53Z

Yeah, regardless, the question still applies as there was no upstream solution at the time of the docs writing.

What does GKE does that allow for this different behavior from upstream Kubernetes?

Marek00Malik · 2020-11-23T10:32:46Z

has there been any work on this ?

h0jeZvgoxFepBQ2C · 2020-12-15T09:15:45Z

@saad-ali @thockin Could you reopen this issue please?

It's not possible for non-maintainers to reopen it.

h0jeZvgoxFepBQ2C · 2020-12-15T09:25:40Z

Did anyone found a solution for this? It really surprises me that this (imo big) issue hasn't been resolved in 3 years?
How do people cope with this situation, if you want to run multiple nfs servers - you can't always hardcode the IPs?

Any suggestions on how to workaround this?
Specifing nfs-service.default.svc.cluster.local didn't work out for us.

pentago · 2020-12-15T09:51:11Z

I'm not sure but could be using the ExternalName service be a viable solution?

For example, using NFS server service DNS name in the externalName property?

I was under impression that this particular object was created to solve these issues. I didn't try it yet but would welcome feedback from those who did, regardless of the outcome.

colearendt · 2020-12-15T11:05:32Z

FWIW I have worked around this by using the nfs-server-provisioner helm chart, persistent volume claims, and moving on with my life. I will say, something has changed about helm's website and now there seem to be two (identical?) options for this, which is a bit weird.

https://artifacthub.io/packages/helm/kvaps/nfs-server-provisioner

Hope that helps! It has worked well enough for me! It would definitely be nice to have a fix though!

A coworker dug into the source and suspected the bug was here, in case it is any help: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/nfs/nfs.go#L256

msau42 · 2020-12-15T17:19:54Z

The issue is in some environments, kubelet's host network does not have access to the cluster dns. Using https://github.com/kubernetes-csi/csi-driver-nfs should resolve this because it runs as a Pod so has access to cluster services.

h0jeZvgoxFepBQ2C · 2020-12-15T17:22:43Z

This does not appear to us as far as I understand it, when I connect via shell I can ping the nfs server directly via nfs-server-service and the IP resolution works fine. So the the kube proxy knows where our nfs server is - it's just that the volume doesn't know it.

andre-lx · 2021-01-18T15:59:38Z

As @msau42 mentioned, I solved this issue using the https://github.com/kubernetes-csi/csi-driver-nfs

saad-ali added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Apr 21, 2017

thockin removed the sig/network Categorizes an issue or PR as relevant to SIG Network. label May 19, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2018

k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Feb 5, 2018

k8s-ci-robot closed this as completed Apr 13, 2018

craig-willis mentioned this issue Jun 14, 2018

Shared storage across containers nds-org/esiphub#8

Closed

awalker125 mentioned this issue Dec 6, 2019

Cluster restart cause NFS based PVs to fail rook/rook#4441

Closed

garthwilliamson mentioned this issue Jul 14, 2020

Give EKS nodes the ability to resolve in-cluster dns awslabs/amazon-eks-ami#509

Closed

Gisleburt mentioned this issue Feb 26, 2022

Do I need an IP for the NFS server? Wouldn't it be better if there is just the service used? kubernetes-sigs/nfs-subdir-external-provisioner#155

Closed

volumes/nfs example: service name instead hardcoded IP #44528

volumes/nfs example: service name instead hardcoded IP #44528

Comments

z1nkum commented Apr 15, 2017

wongma7 commented Apr 17, 2017

saad-ali commented Apr 21, 2017

fejta-bot commented Dec 24, 2017

fejta-bot commented Jan 23, 2018

jtgans commented Feb 4, 2018

msau42 commented Feb 4, 2018

jtgans commented Feb 4, 2018 via email

msau42 commented Feb 5, 2018

jtgans commented Feb 5, 2018

msau42 commented Feb 5, 2018

jtgans commented Feb 5, 2018

yaumeg commented Feb 20, 2018

jtgans commented Feb 25, 2018

jtgans commented Feb 25, 2018

MrHohn commented Feb 25, 2018

jtgans commented Feb 25, 2018

mtricolici commented Mar 14, 2018

fejta-bot commented Apr 13, 2018

digglife commented Jun 15, 2018

sjerdo commented Aug 6, 2018

PaulMazzuca commented Aug 22, 2018

PaulMazzuca commented Aug 22, 2018

wsourdin commented May 15, 2019

bmbferreira commented May 21, 2019

MikeDaniel18 commented May 25, 2019

rjohnson3 commented Jul 2, 2019

k8s-ci-robot commented Jul 2, 2019

kendru commented Jul 10, 2019

noeliajimenezg commented Jul 30, 2019 • edited

linclaus commented Sep 12, 2019

will-beta commented Sep 17, 2019

k8s-ci-robot commented Sep 17, 2019

cl4u2 commented Nov 4, 2019

pentago commented Feb 17, 2020

colearendt commented Feb 23, 2020 • edited

pentago commented Feb 23, 2020

Marek00Malik commented Nov 23, 2020

h0jeZvgoxFepBQ2C commented Dec 15, 2020

h0jeZvgoxFepBQ2C commented Dec 15, 2020

pentago commented Dec 15, 2020 • edited

colearendt commented Dec 15, 2020 • edited

msau42 commented Dec 15, 2020

h0jeZvgoxFepBQ2C commented Dec 15, 2020

andre-lx commented Jan 18, 2021 • edited

noeliajimenezg commented Jul 30, 2019 •

edited

colearendt commented Feb 23, 2020 •

edited

pentago commented Dec 15, 2020 •

edited

colearendt commented Dec 15, 2020 •

edited

andre-lx commented Jan 18, 2021 •

edited