Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS example does not work on google container engine. #24687

Closed
arenoir opened this issue Apr 22, 2016 · 63 comments
Closed

NFS example does not work on google container engine. #24687

arenoir opened this issue Apr 22, 2016 · 63 comments
Assignees
Labels
area/example kind/documentation Categorizes issue or PR as related to documentation. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@arenoir
Copy link

arenoir commented Apr 22, 2016

The nfs example at /examples/nfs does not work on google container engine.

The nfs-server runs but the nfs-busybox won't mount the PersistentVolumeClaim.

The busybox pod errors out trying to mount the persistent volume claim.

Output: mount.nfs: Connection timed out

The nfs-pv uses the nfs-server service ip. Both the persistent volume and the persistent volume claim are bound.

I did notice the nfs server logged a warning.

rpcinfo: can't contact rpcbind: : RPC: Unable to receive; errno = Connection refused

I have tried exposing additional ports 111 tcp/udp 2049/udp but that had no effect.

Please help.

#kubectl describe service nfs-server
Name:           nfs-server
Namespace:      default
Labels:         <none>
Selector:       role=nfs-server
Type:           ClusterIP
IP:         10.19.247.137
Port:           <unset> 2049/TCP
Endpoints:      10.16.3.4:2049
Session Affinity:   None
No events.
#kubectl describe pv nfs
Name:       nfs
Labels:     <none>
Status:     Bound
Claim:      default/nfs
Reclaim Policy: Retain
Access Modes:   RWX
Capacity:   1Mi
Message:    
Source:
    Type:   NFS (an NFS mount that lasts the lifetime of a pod)
    Server: 10.19.247.137
    Path:   /
    ReadOnly:   false
#kubectl describe pvc nfs
Name:       nfs
Namespace:  default
Status:     Bound
Volume:     nfs
Labels:     <none>
Capacity:   1Mi
Access Modes:   RWX
#kubectl describe pod nfs-server-e71xs
Name:       nfs-server-e71xs
Namespace:  default
Node:       gke-fieldphone-32335ca1-node-9o0q/10.128.0.4
Start Time: Fri, 22 Apr 2016 13:39:57 -0700
Labels:     role=nfs-server
Status:     Running
IP:     10.16.3.4
Controllers:    ReplicationController/nfs-server
Containers:
  nfs-server:
    Container ID:   docker://d0f11148b09986163c73baf525d57b4a59b3bce149f1776f117adcb444993a5c
    Image:      gcr.io/google_containers/volume-nfs
    Image ID:       docker://3f8217a3a8f1e891612aece9cbf8b8defeb1f1ffa39836ebb7de5e03139f56a7
    Port:       2049/TCP
    QoS Tier:
      cpu:  Burstable
      memory:   BestEffort
    Requests:
      cpu:      100m
    State:      Running
      Started:      Fri, 22 Apr 2016 13:39:59 -0700
    Ready:      True
    Restart Count:  0
    Environment Variables:
Conditions:
  Type      Status
  Ready     True 
Volumes:
  default-token-szz1v:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-szz1v
Events:
  FirstSeen LastSeen    Count   From                        SubobjectPath           Type        Reason      Message
  --------- --------    -----   ----                        -------------           --------    ------      -------
  29m       29m     1   {default-scheduler }                                Normal      Scheduled   Successfully assigned nfs-server-e71xs to gke-fieldphone-32335ca1-node-9o0q
  29m       29m     1   {kubelet gke-fieldphone-32335ca1-node-9o0q} spec.containers{nfs-server} Normal      Pulling     pulling image "gcr.io/google_containers/volume-nfs"
  29m       29m     1   {kubelet gke-fieldphone-32335ca1-node-9o0q} spec.containers{nfs-server} Normal      Pulled      Successfully pulled image "gcr.io/google_containers/volume-nfs"
  29m       29m     1   {kubelet gke-fieldphone-32335ca1-node-9o0q} spec.containers{nfs-server} Normal      Created     Created container with docker id d0f11148b099
  29m       29m     1   {kubelet gke-fieldphone-32335ca1-node-9o0q} spec.containers{nfs-server} Normal      Started     Started container with docker id d0f11148b099
#kubectl describe pod nfs-busybox-fu4el
Name:       nfs-busybox-fu4el
Namespace:  default
Node:       gke-fieldphone-32335ca1-node-00f2/10.128.0.9
Start Time: Fri, 22 Apr 2016 13:49:25 -0700
Labels:     name=nfs-busybox
Status:     Pending
IP:     
Controllers:    ReplicationController/nfs-busybox
Containers:
  busybox:
    Container ID:   
    Image:      busybox
    Image ID:       
    Port:       
    Command:
      sh
      -c
      while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
    QoS Tier:
      cpu:  Burstable
      memory:   BestEffort
    Requests:
      cpu:      100m
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
Conditions:
  Type      Status
  Ready     False 
Volumes:
  nfs:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nfs
    ReadOnly:   false
  default-token-szz1v:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-szz1v
Events:
  FirstSeen LastSeen    Count   From                        SubobjectPath   Type        Reason          Message
  --------- --------    -----   ----                        -------------   --------    ------          -------
  19m       14m     21  {default-scheduler }                        Warning     FailedScheduling    PersistentVolumeClaim is not bound: "nfs"
  13m       13m     4   {default-scheduler }                        Warning     FailedScheduling    PersistentVolumeClaim 'default/nfs' is not in cache
  13m       13m     1   {default-scheduler }                        Normal      Scheduled       Successfully assigned nfs-busybox-fu4el to gke-fieldphone-32335ca1-node-00f2
  11m       1m      5   {kubelet gke-fieldphone-32335ca1-node-00f2}         Warning     FailedMount     Unable to mount volumes for pod "nfs-busybox-fu4el_default(ce04b0d3-08ca-11e6-a6a8-42010af000bb)": Mount failed: exit status 32
Mounting arguments: 10.19.247.137:/ /var/lib/kubelet/pods/ce04b0d3-08ca-11e6-a6a8-42010af000bb/volumes/kubernetes.io~nfs/nfs nfs []
Output: mount.nfs: Connection timed out


  11m   1m  5   {kubelet gke-fieldphone-32335ca1-node-00f2}     Warning FailedSync  Error syncing pod, skipping: Mount failed: exit status 32
Mounting arguments: 10.19.247.137:/ /var/lib/kubelet/pods/ce04b0d3-08ca-11e6-a6a8-42010af000bb/volumes/kubernetes.io~nfs/nfs nfs []
Output: mount.nfs: Connection timed out
@erinboyd
Copy link

What version of NFS are you using?

@erinboyd
Copy link

Can you also include your exports from the nfs server

@arenoir
Copy link
Author

arenoir commented Apr 22, 2016

@erinboyd I don't know. What ever version is running in the google image from the example. gcr.io/google_containers/volume-nfs

I can't find the source of google docker images.

This is a vanilla setup taken straight out of the documentation it should "just work".

@arenoir
Copy link
Author

arenoir commented Apr 22, 2016

According to the README it is exporting /mnt/data as /.

The server exports /mnt/data directory as / (fsid=0). The directory contains dummy index.html. Wait until the pod is running by checking kubectl get pods -lrole=nfs-server.

@arenoir
Copy link
Author

arenoir commented Apr 25, 2016

@erinboyd I have tried a couple of other docker images in place of the gcr.io/google_containers/volume-nfs without any success. Do you have or know of a working nfs-server docker implementation?

This is the last piece keeping me from moving my setup from colocation to google cloud.

@xidui
Copy link

xidui commented Apr 26, 2016

+1
I also had a same problem.

@xidui
Copy link

xidui commented Apr 26, 2016

here is my output:

[root@kubernetes-master nfs]# kubectl get pod
NAME                READY     STATUS              RESTARTS   AGE
nfs-busybox-30a30   0/1       ContainerCreating   0          9m
nfs-busybox-8dw8q   0/1       ContainerCreating   0          9m
nfs-server-0e7rq    1/1       Running             0          4s
[root@kubernetes-master nfs]# kubectl logs nfs-server-0e7rq
Serving /exports
Serving /
rpcinfo: can't contact rpcbind: : RPC: Unable to receive; errno = Connection refused
Starting rpcbind
NFS started
[root@kubernetes-master ~]# kubectl describe po nfs-busybox-8dw8q
Name:                           nfs-busybox-8dw8q
Namespace:                      default
Image(s):                       busybox
Node:                           node-2-slave-1/107.170.32.151
Start Time:                     Mon, 25 Apr 2016 22:29:56 -0400
Labels:                         name=nfs-busybox
Status:                         Pending
Reason:
Message:
IP:
Replication Controllers:        nfs-busybox (2/2 replicas created)
Containers:
  busybox:
    Container ID:
    Image:              busybox
    Image ID:
    QoS Tier:
      cpu:              BestEffort
      memory:           BestEffort
    State:              Waiting
      Reason:           ContainerCreating
    Ready:              False
    Restart Count:      0
    Environment Variables:
Conditions:
  Type          Status
  Ready         False
Volumes:
  nfs:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  nfs
    ReadOnly:   false
  default-token-pjwpm:
    Type:       Secret (a secret that should populate this volume)
    SecretName: default-token-pjwpm
Events:
  FirstSeen     LastSeen        Count   From                            SubobjectPath   Reason          Message
  ─────────     ────────        ─────   ────                            ─────────────   ──────          ───────
  16m           16m             1       {scheduler }                                    Scheduled       Successfully assigned nfs-busybox-8dw8q to node-2-slave-1
  16m           6s              92      {kubelet node-2-slave-1}                        FailedMount     Unable to mount volumes for pod "nfs-busybox-8dw8q_default            status 32
Mounting arguments: 10.254.207.116:/ /var/lib/kubelet/pods/bd8e9ba7-0b56-11e6-a2e5-0401cb31c801/volumes/kubernetes.io~nfs/nfs nfs []
Output: Job for rpc-statd.service failed because the control process exited with error code. See "systemctl status rpc-statd.service" and "journalctl -xe" for det
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified


  16m   6s      92      {kubelet node-2-slave-1}                FailedSync      Error syncing pod, skipping: Mount failed: exit status 32
Mounting arguments: 10.254.207.116:/ /var/lib/kubelet/pods/bd8e9ba7-0b56-11e6-a2e5-0401cb31c801/volumes/kubernetes.io~nfs/nfs nfs []
Output: Job for rpc-statd.service failed because the control process exited with error code. See "systemctl status rpc-statd.service" and "journalctl -xe" for det
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified

@xidui
Copy link

xidui commented Apr 26, 2016

@arenoir I have solved this problem, see if my circumstance also applies for you:

I found that there are some problems within the image gcr.io/google_containers/volume-nfs,
See if there is /mnt/data directory in the nfs-server container.

The docker file says that it copies index.html to /mnt/data/index.html. But after I execute kubectl exec -it nfs-server bash and ls /mnt. I fount there is no data directory inside. So I rebuild the image and use the new one. After that this issue solved.

@arenoir
Copy link
Author

arenoir commented Apr 28, 2016

@xidui, thanks for the info... I was not able to get the gcr.io/google_containers/volume-nfs to image to work. I ended up using the image jsafrane/nfsexporter. All is well that ends well.

@mml mml added kind/documentation Categorizes issue or PR as related to documentation. sig/storage Categorizes an issue or PR as relevant to SIG Storage. team/cluster labels May 2, 2016
@mml
Copy link
Contributor

mml commented May 2, 2016

@arenoir it looks like you got things working, but I believe we should still correct our docs, at least.

@ekozan
Copy link

ekozan commented May 3, 2016

I think somebody have push on gcr.io/google_containers/volume-nfs the builded image of the pr #22665

That's why actually it's not work any more
@rootfs

@rootfs
Copy link
Contributor

rootfs commented May 3, 2016

@pwittrock ^^^

@bgrant0607
Copy link
Member

cc @kubernetes/examples

@liubin
Copy link
Contributor

liubin commented Jun 14, 2016

The same error( in vagrant):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.4", GitCommit:"3eed1e3be6848b877ff80a93da3785d9034d0a4f", GitTreeState:"clean"}

And the nfs-server error:

2016-06-14T10:57:10.776492975Z Serving /exports
2016-06-14T10:57:10.780350890Z Serving /
2016-06-14T10:57:10.878049309Z rpcinfo: can't contact rpcbind: : RPC: Unable to receive; errno = Connection refused
2016-06-14T10:57:10.887171106Z Starting rpcbind
2016-06-14T10:57:11.143431556Z NFS started

@rootfs
Copy link
Contributor

rootfs commented Jun 14, 2016

Note, the image is updated per #22665.

The example works on GCE/GKE, AWS and Cinder.

@klaus
Copy link

klaus commented Jun 23, 2016

I am also getting this output, but the nfs-sample still works for me in the gce cloud installation

Output: mount.nfs: Connection timed out

It might be interesting to know that reproducible it does NOT work in local k8s in docker contained installations based on debian. I am wondering if this might be a useful hint? Be it, that locally I am using HostPath pvs or connected to some network-manager I am running on my laptop.

What hints me in the network direction rather than the HostPath is that when I start nfs-common service on my local machine I eventually get a full lockup of my box running the nfs example. Something about "cpu locked for more than 22s"

Jun 22 16:32:27 vex kernel: [ 1240.971949] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rpc.nfsd:27497]
Jun 22 16:32:27 vex kernel: [ 1240.971952] Modules linked in: xt_nat(E) xt_recent(E) xt_mark(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) xt_comment(E) veth(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) rfcomm(E) fuse(E) x
t_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) x_tables(E) br_netfilter(E) nf_nat(E) nf_conntrack(E
) bridge(E) stp(E) llc(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) overlay(E) bnep(E) cpufreq_powersave(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) nls_utf8(E) nl
s_cp437(E) vfat(E) fat(E) dm_crypt(E) wl(POE) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) arc4(E) btusb(E) kvm_intel(E) kvm(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) iTCO_wdt(E) iTCO_vendor_support
(E) evdev(E) irqbypass(E) ath9k(E) ath9k_common(E) ath9k_hw(E) dcdbas(E) ath(E) mac80211(E) cfg80211(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_
intel(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) sg(E) snd_pcm(E) snd_timer(E) pcspkr(E) rfkill(E) snd(E) mei_me(E) 8250_fintek(E) battery(E) mei(E) soundcore(E) lpc_ich(E) mfd_core(E) efi_pstore(E) ie31
200_edac(E) edac_core(E) i2c_i801(E) video(E) shpchp(E) serio_raw(E) tpm_tis(E) efivars(E) tpm(E) processor(E) button(E) amdgpu(E) parport_pc(E) ppdev(E) lp(E) parport(E) efivarfs(E) autofs4(E) ext4(E) ecb(E) crc16(E) jbd2(E) 
crc32c_generic(E) mbcache(E) dm_mod(E) raid1(E) md_mod(E) hid_generic(E) usbhid(E) hid(E) sr_mod(E) cdrom(E) sd_mod(E) crc32c_intel(E) ahci(E) aesni_intel(E) libahci(E) radeon(E) xhci_pci(E) i2c_algo_bit(E) xhci_hcd(E) libata(
E) aes_x86_64(E) ehci_pci(E) glue_helper(E) lrw(E) gf128mul(E) ehci_hcd(E) ablk_helper(E) cryptd(E) e1000e(E) drm_kms_helper(E) psmouse(E) scsi_mod(E) ttm(E) ptp(E) usbcore(E) pps_core(E) drm(E) usb_common(E) thermal(E) fjes(E
)
Jun 22 16:32:27 vex kernel: [ 1240.972022] CPU: 1 PID: 27497 Comm: rpc.nfsd Tainted: P           OEL  4.6.0-1-amd64 #1 Debian 4.6.1-1
Jun 22 16:32:27 vex kernel: [ 1240.972022] Hardware name: Dell Inc. OptiPlex 7010/0GY6Y8, BIOS A16 09/09/2013
Jun 22 16:32:27 vex kernel: [ 1240.972023] task: ffff8800a2de6080 ti: ffff8804c6364000 task.ti: ffff8804c6364000
Jun 22 16:32:27 vex kernel: [ 1240.972025] RIP: 0010:[<ffffffff8109a138>]  [<ffffffff8109a138>] blocking_notifier_chain_register+0x38/0x90
Jun 22 16:32:27 vex kernel: [ 1240.972029] RSP: 0018:ffff8804c6367dd8  EFLAGS: 00000246
Jun 22 16:32:27 vex kernel: [ 1240.972030] RAX: ffffffffc0c90cd0 RBX: ffffffff81add020 RCX: 0000000000000000
Jun 22 16:32:27 vex kernel: [ 1240.972031] RDX: ffffffffc0c90cd8 RSI: ffffffffc0bf7810 RDI: ffffffff81add020
Jun 22 16:32:27 vex kernel: [ 1240.972032] RBP: ffffffffc0bf7810 R08: 0000000000000005 R09: ffff8804abffc500
Jun 22 16:32:27 vex kernel: [ 1240.972032] R10: ffff8804c1fbb200 R11: 0000000000000000 R12: ffff8800a2e661c0
Jun 22 16:32:27 vex kernel: [ 1240.972033] R13: ffff8804c1fbb200 R14: 0000000000000000 R15: ffff8804c1fbb200
Jun 22 16:32:27 vex kernel: [ 1240.972034] FS:  00007fe8678fe840(0000) GS:ffff88061dc80000(0000) knlGS:0000000000000000
Jun 22 16:32:27 vex kernel: [ 1240.972035] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 22 16:32:27 vex kernel: [ 1240.972036] CR2: 00007fcb61036000 CR3: 00000004ac007000 CR4: 00000000001406e0
Jun 22 16:32:27 vex kernel: [ 1240.972036] Stack:
Jun 22 16:32:27 vex kernel: [ 1240.972037]  ffff8804690ac200 ffff8800a2e661c0 ffffffffc0bebe3d 0000000000000002
Jun 22 16:32:27 vex kernel: [ 1240.972039]  ffff8800a2e661c0 0000000000000000 ffff8804c1fbb200 ffffffffc0c570e7
Jun 22 16:32:27 vex kernel: [ 1240.972040]  ffffffffc0c58510 ffff8804e1d5c008 ffff8800a2e661c0 ffffffffc0c58510
Jun 22 16:32:27 vex kernel: [ 1240.972041] Call Trace:
Jun 22 16:32:27 vex kernel: [ 1240.972046]  [<ffffffffc0bebe3d>] ? lockd_up+0x11d/0x330 [lockd]
Jun 22 16:32:27 vex kernel: [ 1240.972053]  [<ffffffffc0c570e7>] ? nfsd_svc+0x1c7/0x2a0 [nfsd]
Jun 22 16:32:27 vex kernel: [ 1240.972058]  [<ffffffffc0c58510>] ? write_leasetime+0x80/0x80 [nfsd]
Jun 22 16:32:27 vex kernel: [ 1240.972062]  [<ffffffffc0c58510>] ? write_leasetime+0x80/0x80 [nfsd]
Jun 22 16:32:27 vex kernel: [ 1240.972065]  [<ffffffffc0c58596>] ? write_threads+0x86/0xe0 [nfsd]
Jun 22 16:32:27 vex kernel: [ 1240.972068]  [<ffffffff81176862>] ? get_zeroed_page+0x12/0x40
Jun 22 16:32:27 vex kernel: [ 1240.972070]  [<ffffffff812186b0>] ? simple_transaction_get+0xa0/0xb0
Jun 22 16:32:27 vex kernel: [ 1240.972074]  [<ffffffffc0c57b33>] ? nfsctl_transaction_write+0x43/0x70 [nfsd]
Jun 22 16:32:27 vex kernel: [ 1240.972076]  [<ffffffff811f1374>] ? vfs_write+0xa4/0x1a0
Jun 22 16:32:27 vex kernel: [ 1240.972077]  [<ffffffff811f2762>] ? SyS_write+0x52/0xc0
Jun 22 16:32:27 vex kernel: [ 1240.972080]  [<ffffffff815c65b6>] ? system_call_fast_compare_end+0xc/0x96
Jun 22 16:32:27 vex kernel: [ 1240.972081] Code: 74 4a 55 53 48 89 fb 48 89 f5 e8 04 aa 52 00 48 8b 43 28 48 8d 53 28 48 85 c0 74 1c 8b 4d 10 3b 48 10 7e 07 eb 12 39 48 10 7c 0d <48> 8d 50 08 48 8b 40 08 48 85 c0 75 ee 48 89 45 08 48 89 2a 48 

@pwittrock
Copy link
Member

cc @fabioy

@sijnc
Copy link

sijnc commented Jun 27, 2016

I got this example working, however during testing the busybox rc (replicas=2) I noticed that both busybox pods seemed to write to the file at the same time (first entry). Where all other cat file commands show only one host like the second one. Is this normal?

[root@nfs-web-363353589-qausi exports]# cat index.html
Mon Jun 27 22:53:42 UTC 2016
nfs-web-busybox-tf95c
nfs-web-busybox-wgxl4
[root@nfs-web-363353589-qausi exports]# cat index.html
Mon Jun 27 22:54:38 UTC 2016
nfs-web-busybox-wgxl4

@rootfs
Copy link
Contributor

rootfs commented Jun 28, 2016

@sijnc yes, that's how the example works. There are 2 replica writing to the nfs share. What you see from curl is the current snapshot of the file.

@klaus
Copy link

klaus commented Jun 28, 2016

I now got the example working, I still think my comment is not fully voided by this.

I am running a k8s 1.4.2 local cluster via docker. This funnily selects quite old components, esp. the DNS component is outdated / not working and throwing a lot of messages. I did initially not see them as the cluster was behaving quite well. So really, DNS was the first service that fully depended on a perfectly working dns server.

  • changing gcr.io/google-containers/kubedns-amd64:1.2 to 1.3 via kubectl edit rc kube-dns-v13 --namespace kube-system. (I needed to restart the whole cluster for this to have effect).
  • second, I had to install nfs-common sudo apt-get install nfs-common on the host to. Immediately after that, the volumes got mounted and the busybox images started. Amazingly, initially, I had to purge all nfs-related setup on the host machine to prevent the above described kernel-lockups.

so, with caveats, the example is working ...

@jingxu97
Copy link
Contributor

jingxu97 commented Jan 5, 2017 via email

@pmblatino
Copy link

@jingxu97 Thanks a lot for poiting those lines out, just tought of letting people know that it doesnt work on GCE kubernetes 1.4.6 only on 1.4.7, and when listing this nfs disk (df -h) on the pod it shows the size of the original disk and not the size set on the nfs-pvc.

@ghost
Copy link

ghost commented Jan 25, 2017

only to inform that i was able to use the nfs example in GKE node version 1.4.8 but only when using Node image container-vm. If i tried to use with node image gci it doesn't work, giving this information in event history of the pod

  18m		5m		7	{kubelet gke-qamar-n1-standard2-55a0bb05-wcqw}			Warning		FailedMount	Unable to mount volumes for pod "frontoffice-rc-39.0-u7vp7_default(ebd46ce9-e253-11e6-b119-42010a84013c)": timeout expired waiting for volumes to attach/mount for pod "frontoffice-rc-39.0-u7vp7"/"default". list of unattached/unmounted volumes=[nfsvol]
  18m		5m		7	{kubelet gke-qamar-n1-standard2-55a0bb05-wcqw}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "frontoffice-rc-39.0-u7vp7"/"default". list of unattached/unmounted volumes=[nfsvol]
  20m		4m		15	{kubelet gke-qamar-n1-standard2-55a0bb05-wcqw}			Warning		FailedMount	MountVolume.SetUp failed for volume "kubernetes.io/nfs/ebd46ce9-e253-11e6-b119-42010a84013c-client-nfs-pv" (spec.Name: "client-nfs-pv") pod "ebd46ce9-e253-11e6-b119-42010a84013c" (UID: "ebd46ce9-e253-11e6-b119-42010a84013c") with: mount failed: exit status 32
Mounting command: /home/kubernetes/bin/mounter
Mounting arguments: 10.100.245.245:/exports /var/lib/kubelet/pods/ebd46ce9-e253-11e6-b119-42010a84013c/volumes/kubernetes.io~nfs/client-nfs-pv nfs []
Output: Running mount using a rkt fly container
run: group "rkt" not found, will use default gid when rendering images
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified

@jingxu97
Copy link
Contributor

jingxu97 commented Jan 25, 2017 via email

@avra911
Copy link

avra911 commented Jan 25, 2017

@jingxu97, using path '/' gives a different error: Invalid specification: destination can't be '/', while using gcr.io/google_containers/volume-nfs:0.8.

For the client container, the error is still the same:

Running mount using a rkt fly container run: group "rkt" not found, 
will use default gid when rendering images mount.nfs: 
Failed to resolve server magento-nfs: Temporary failure in name resolution

@jingxu97
Copy link
Contributor

@avra911 the following are the changes I made to make the test /examples/nfs work

Edit examples/volumes/nfs/nfs-pv.yaml
change the last line to path: "/"

Edit examples/volumes/nfs/nfs-server-rc.yaml
change the image to the one that enabled NFSv4
image: gcr.io/google_containers/volume-nfs:0.8

Your error message has this Temporary failure in name resolution. Are you using IP in volume spec for the client pod? You could also share us your yaml file so we can help take a look.
Please let me know if you have any problems with these changes. Thanks!

@avra911
Copy link

avra911 commented Jan 27, 2017

Hello,

I will try today once again exactly the files from the example. If it works, I will double check my files and come back here for more help.

@jingxu97 , quick question, does it matter how the cluster is created, because I am starting with gcloud container clusters create my-project --zone europe-west1-d --disk-size=40 --num-nodes=1 --machine-type n1-highcpu-2? I haven't tried on a cluster created with kube-up.sh, maybe is missing rkt support on the host.

EDITED: The files from the example works out of the box, without any modifications to the image or path. Tested on 1.5.2 (server and client) on a cluster created with cluster/kube-up.sh and using the rkt runtime.

Thank you!

@wstrange
Copy link
Contributor

wstrange commented Feb 7, 2017

Also having issues. I have made the suggested edits above, but the busybox pod can not mount the pvc:

8b6f997-ed66-11e6-88e4-08002702efd7-nfs" (spec.Name: "nfs") pod "38b6f997-ed66-11e6-88e4-08002702efd7" (UID: "38b6f997-ed66-11e6-88e4-08002702efd7") with: mount failed: exit status 32
Mounting command: mount
Mounting arguments: 10.0.0.212:/ /var/lib/kubelet/pods/38b6f997-ed66-11e6-88e4-08002702efd7/volumes/kubernetes.io~nfs/nfs nfs []
Output: mount.nfs: an incorrect mount option was specified

This is on minikube 0.16, kube 1.5.2

Also - is there a tracking bug for

@jingxu97
Copy link
Contributor

@avra911 sorry that I missed your message. It does not matter how the cluster is created, I think.

@jingxu97
Copy link
Contributor

All users, now NFSv3 is also supported on GKE, please give it a try and let us know if there is any problem. Thanks!

@ahmetb
Copy link
Member

ahmetb commented Jun 27, 2017

We moved the examples to their own repo (https://github.com/kubernetes/examples) for further maintenance. However such popular examples to host their own repo for maintenance, and/or convert into Helm charts.

It also looks like this issue is now fixed, @jingxu97 should we close now?

@kwiesmueller
Copy link
Member

@jingxu97 we just migrated to the new cos due to the deprecation of container-vm and I can confirm that NFS is still not working with us. We keep getting the Output: mount.nfs: Connection timed out error on an up-to-date cluster.

@jingxu97
Copy link
Contributor

jingxu97 commented Jul 5, 2017

@kwiesmueller Could you please provide more details about your NFS setup so that we can help figure out what the problem is. Thanks!

@kwiesmueller
Copy link
Member

@jingxu97 sure thing!
The Cluster is a default GCE Project on 1.6.4 using cos.
The NFS Server is this:

...
containers:
      - name: nfs-server
        image: gcr.io/google-samples/nfs-server:1.1
        imagePullPolicy: IfNotPresent
        ports:
        - name: nfs
          containerPort: 2049
        - name: mountd
          containerPort: 20048
        - name: rpcbind
          containerPort: 111
        securityContext:
          privileged: true
...

The Volume for the Server is a Google PD.

The Clients are using the Server like this:

volumes:
      - name: file-store
        nfs:
          server: 10.55.254.247
          path: '/exports'

Oh and the NFS Server IP is fixed in the Service:

spec:
  type: ClusterIP
  clusterIP: 10.55.254.247
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111

Oh and there are no errors on the server side and I can not find any in Google Logs as well...
But the Pods return this:

Warning		FailedMount	MountVolume.SetUp failed for volume "kubernetes.io/nfs/ef6d92c5-619f-11e7-a897-42010a84016a-file-store" (spec.Name: "file-store") pod "ef6d92c5-619f-11e7-a897-42010a84016a" (UID: "ef6d92c5-619f-11e7-a897-42010a84016a") with: mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: 10.55.254.247:/exports /var/lib/kubelet/pods/ef6d92c5-619f-11e7-a897-42010a84016a/volumes/kubernetes.io~nfs/file-store nfs []
Output: Mount failed: Mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs 10.55.254.247:/exports /var/lib/kubelet/pods/ef6d92c5-619f-11e7-a897-42010a84016a/volumes/kubernetes.io~nfs/file-store]
Output: mount.nfs: Connection timed out

@jingxu97
Copy link
Contributor

jingxu97 commented Jul 5, 2017

Could you try to use gcr.io/google_containers/volume-nfs:0.8 as the NFS server container image? And also use Path: '/' in client volumes. This makes sure to use NFSv4.

@kwiesmueller
Copy link
Member

Will try right tomorrow, thanks!

@kwiesmueller
Copy link
Member

Getting this on the NFS Pod:

2017-07-06T12:37:45.365208494Z Serving /exports
2017-07-06T12:37:45.365990557Z Serving /
2017-07-06T12:37:47.626999272Z Starting rpcbind
2017-07-06T12:37:47.637040428Z /usr/local/bin/run_nfs.sh: line 18:     9 Killed                  /usr/sbin/rpcinfo 127.0.0.1 > /dev/null
2017-07-06T12:37:48.235323386Z exportfs: / does not support NFS export
2017-07-06T12:37:52.52793174Z NFS started

The client Pod now only gives a timeout, nfs error:

Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----							-------------	--------	------		-------
  2m		2m		1	default-scheduler						Normal		Scheduled	Successfully assigned app-1675861351-xv4qc to gke-cluster-1-default-pool-8e381c1c-4l61
  21s		21s		1	kubelet, gke-cluster-1-default-pool-8e381c1c-4l61		Warning		FailedMount	Unable to mount volumes for pod "app-1675861351-xv4qc_gartentechnik-com-test(f7006f38-6247-11e7-a897-42010a84016a)": timeout expired waiting for volumes to attach/mount for pod "gartentechnik-com-test"/"app-1675861351-xv4qc". list of unattached/unmounted volumes=[file-store]
  21s		21s		1	kubelet, gke-cluster-1-default-pool-8e381c1c-4l61		Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "gartentechnik-com-test"/"app-1675861351-xv4qc". list of unattached/unmounted volumes=[file-store]

@kwiesmueller
Copy link
Member

NFS Depl:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: fileserver
  namespace: {{ "NAMESPACE" | env }}
  labels:
    app: fileserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fileserver-worker
  template:
    metadata:
      labels:
        app: fileserver-worker
    spec:
      containers:
      - name: fileserver-server
        image: 'gcr.io/google_containers/volume-nfs:0.8'
        imagePullPolicy: IfNotPresent
        ports:
        - name: nfs
          containerPort: 2049
        - name: mountd
          containerPort: 20048
        - name: rpcbind
          containerPort: 111
        securityContext:
          privileged: true
        resources:
          limits:
            cpu: 200m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 10Mi
        volumeMounts:
          - mountPath: /exports
            name: files-store
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 2049
          timeoutSeconds: 2
        readinessProbe:
          failureThreshold: 1
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 2049
          timeoutSeconds: 2
      volumes:
      - name: files-store
        gcePersistentDisk:
          fsType: "ext4"
          pdName: "{{ "NFS_PD_NAME" | env }}"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-preemptible
                operator: DoesNotExist

Client:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: app
  namespace: {{ "NAMESPACE" | env }}
  labels:
    app: app
spec:
  replicas: 2
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: app-worker
  template:
    metadata:
      labels:
        app: app-worker
    spec:
      containers:
      - name: app-apache
        image: 'eu.gcr.io/project/app:{{ "VERSION" | env }}'
        imagePullPolicy: IfNotPresent
        # command: ["tail", "-f", "/var/log/dpkg.log"]
        ports:
        - name: gt
          containerPort: 8080
        - name: api
          containerPort: 8085
        resources:
          limits:
            # cpu: 4
            memory: 2Gi
          requests:
            cpu: 0.1
            memory: 0.5Gi
        volumeMounts:
          - mountPath: /mnt
            name: file-store
      volumes:
      - name: file-store
        nfs:
          server: {{ "NFS_SERVER_IP" | env }}
          path: /
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: cloud.google.com/gke-preemptible
                operator: DoesNotExist

@avra911
Copy link

avra911 commented Jul 6, 2017 via email

@kwiesmueller
Copy link
Member

@avra911 No, I can not mount / on my PD obviously... I was jut following @jingxu97 's advise to switch the client path.

@kwiesmueller
Copy link
Member

Never mind... Running now...
NFSv4 works, the new Container is good... @jingxu97 where could one find the dockerfile for this one?
And the last issue on which I posted the manifests was, that NFS had not enough resources and then killed itself.

@erik777
Copy link

erik777 commented Jul 6, 2017

Thanks @jingxu97. I got it working with your client-side method instead of the PVC from example, which never matched the PV.

Do you know if there is a way to provide input on k8s direction to help prioritize? I'd like to see k8s provide a purely dynamic, including elastic scaling, with no single points of failure for a horizontally clustered database. This NFS solution, with a static IP in the client YAML to a single point of failure, the node running nfs-server, appears to be a a work-around until that can be achieved.

Setting the NFS patch solution aside, I love k8s and am very hopeful its direction will address these use cases.

@jingxu97
Copy link
Contributor

jingxu97 commented Jul 7, 2017

@jingxu97
Copy link
Contributor

jingxu97 commented Jul 7, 2017

@erik777 You mentioned PVC never matches PV. Do you know the reason for it?

I am not sure I understand your second question. Could you please give me more details? Thanks!

@erik777
Copy link

erik777 commented Jul 7, 2017

@jinkxu97 I could not figure out the reason. It does not produce an error. Is there a way to diagnose the matching logic of PVCs on GKE?

I think I found the answer to #2, how to contribute to direction: https://github.com/kubernetes/community

Heck, I even found Priority column in this spreadsheet. lol

@msau42
Copy link
Member

msau42 commented Aug 5, 2017

The NFS example should be updated now to work on the latest version of K8s. Can this be closed?

@jingxu97
Copy link
Contributor

jingxu97 commented Aug 5, 2017

Yes, it is fixed with kubernetes/examples#30. Close this issue

@jingxu97 jingxu97 closed this as completed Aug 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/example kind/documentation Categorizes issue or PR as related to documentation. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests