Can't grab NFS lock in LXZone #144

Smithx10 · 2017-05-17T19:24:21Z

While trying to run sonatype/nexus3 http://www.sonatype.org/nexus/ in an lx branded zone image: 23ee2dbc-c155-11e6-ab6d-bf5689f582fd centos-7: 12/13/2016 and using an nfs mount as its data directory I got the following error. If I use mount -o nolock, the application works fine.

[root@8461974a-f64c-e153-f288-d2dba74d5ba8 nexus]# bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
Unable to update instance pid: No locks available
ERROR: Error creating bundle cache.
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: No locks available
        at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
        at org.apache.felix.framework.Felix.init(Felix.java:691)
        at org.apache.felix.framework.Felix.init(Felix.java:625)
        at org.apache.karaf.main.Main.launch(Main.java:296)
        at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:106)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
Error creating bundle cache.
java.lang.NullPointerException
        at org.apache.karaf.main.Main.destroy(Main.java:626)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)

Instructions to Reproduce:

Client:

yum install nfs-utils
sudo yum install java-1.8.0-openjdk.x86_64
sudo mkdir /app && cd /app
sudo wget https://sonatype-download.global.ssl.fastly.net/nexus/3/nexus-3.3.1-01-unix.tar.gz
tar -xvf nexus-3.3.1-01-unix.tar.gz
mv nexus-3.3.1-01 nexus
cd nexus/
mount nfs.tritonhost.com:/zones/nfs/nexus /app/sonatype-work
./bin/nexus run

SmartOS Server:
zfs set sharenfs=rw=@10.1.102.0/24 zones/nfs/nexus

DTrace of the failure:

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_*:entry,::*_frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop_*:return,::*_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }'
CPU FUNCTION
  2  -> fop_frlock
  2   | fop_frlock:entry
  2    -> zfs_frlock
  2      -> fs_frlock
  2      <- fs_frlock                         0 (0x0)
  2    <- zfs_frlock                          0 (0x0)
  2   | fop_frlock:return                     0 (0x0)
  2  <- fop_frlock                            0 (0x0)
  2  -> fop_frlock
  2   | fop_frlock:entry
  2    -> zfs_frlock
  2      -> fs_frlock
  2      <- fs_frlock                         0 (0x0)
  2    <- zfs_frlock                          0 (0x0)
  2   | fop_frlock:return                     0 (0x0)
  2  <- fop_frlock                            0 (0x0)
  2  -> fop_frlock
  2   | fop_frlock:entry
  2    -> nfs4_frlock
  2     | nfs4_frlock:entry
  2      -> nfs_zone
  2      <- nfs_zone                          -52562137079744 (0xffffd031ec7da040)
  2      -> nfs_rw_enter_sig
  2      <- nfs_rw_enter_sig                  0 (0x0)
  2      -> nfs4_safelock
  2      <- nfs4_safelock                     1 (0x1)
  2      -> nfs4_putpage
  2        -> nfs_zone
  2        <- nfs_zone                        -52562137079744 (0xffffd031ec7da040)
  2        -> nfs4_putpages
  2          -> nfs4_has_pages
  2          <- nfs4_has_pages                0 (0x0)
  2        <- nfs4_putpages                   0 (0x0)
  2      <- nfs4_putpage                      0 (0x0)
  2      -> nfs4frlock
  2        -> nfs4_error_zinit
  2        <- nfs4_error_zinit                0 (0x0)
  2        -> nfs4frlock_validate_args
  2        <- nfs4frlock_validate_args        0 (0x0)
  2        -> nfs4frlock_get_sysid
  2          -> nfs4_find_sysid
  2            -> lm_get_sysid
  2              -> nlm_knc_to_netid
  2               | nlm_knc_to_netid:entry
  2               | nlm_knc_to_netid:return   -131999496 (0xfffffffff821d8f8)
  2              <- nlm_knc_to_netid          -131999496 (0xfffffffff821d8f8)
  2              -> nlm_host_findcreate
  2               | nlm_host_findcreate:entry
  2               | nlm_host_findcreate:return 0 (0x0)
  2              <- nlm_host_findcreate       0 (0x0)
  2            <- lm_get_sysid                0 (0x0)
  2          <- nfs4_find_sysid               0 (0x0)
  2        <- nfs4frlock_get_sysid            46 (0x2e)
  2      <- nfs4frlock                        46 (0x2e)
  2      -> nfs_rw_exit
  2      <- nfs_rw_exit                       0 (0x0)
  2     | nfs4_frlock:return                  46 (0x2e)
  2    <- nfs4_frlock                         46 (0x2e)
  2   | fop_frlock:return                     46 (0x2e)
  2  <- fop_frlock                            46 (0x2e)
  3  -> fop_frlock
  3   | fop_frlock:entry
  3    -> zfs_frlock
  3      -> fs_frlock
  3      <- fs_frlock                         0 (0x0)
  3    <- zfs_frlock                          0 (0x0)
  3   | fop_frlock:return                     0 (0x0)
  3  <- fop_frlock                            0 (0x0)
  3  -> fop_frlock
  3   | fop_frlock:entry
  3    -> zfs_frlock
  3      -> fs_frlock
  3      <- fs_frlock                         0 (0x0)
  3    <- zfs_frlock                          0 (0x0)
  3   | fop_frlock:return                     0 (0x0)
  3  <- fop_frlock                            0 (0x0)

The text was updated successfully, but these errors were encountered:

jjelinek · 2017-05-17T19:33:52Z

This should be fixed in commit:
commit 6e02122
Author: Jerry Jelinek jerry.jelinek@joyent.com
Date: Fri Mar 24 17:56:42 2017 +0000

OS-5873 Need NFS client lockd support: fcntl F_SETLK returns ENOLCK in LX zone

Can you verify on a newer platform build?

Smithx10 · 2017-05-17T19:42:02Z

joyent_20170413T062134Z

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# clear
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
828cf0cd-0946-6873-d28c-92d7cb49247e  LX    128      running           elk_consul_1
8e63c0b8-1e83-49b4-80c9-cfd574870488  OS    256      running           storage.phl.tritonhost.com-8e63c0b8
cb26736d-2ddd-4b6e-9d11-54004f1977a3  OS    256      running           marlin.phl.tritonhost.com-cb26736d
d4809633-9862-415f-b92f-203cec9f0172  OS    256      running           marlin.phl.tritonhost.com-d4809633
949324bd-1d1b-c456-972b-aaebe4ab1023  LX    1024     running           test
b77419de-16d5-caeb-9290-d769e8ce253e  LX    1024     running           lonely_wilson
8461974a-f64c-e153-f288-d2dba74d5ba8  LX    4096     running           c7-nfs-test
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# zlogin 8461974a-f64c-e153-f288-d2dba74d5ba8
[Connected to zone '8461974a-f64c-e153-f288-d2dba74d5ba8' pts/7]
Last login: Wed May 17 18:46:17 from zone:global
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 ~]# cd /app/nexus/
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 nexus]# bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
Unable to update instance pid: No locks available
ERROR: Error creating bundle cache.
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: No locks available
        at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
        at org.apache.felix.framework.Felix.init(Felix.java:691)
        at org.apache.felix.framework.Felix.init(Felix.java:625)
        at org.apache.karaf.main.Main.launch(Main.java:296)
        at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:106)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
Error creating bundle cache.
java.lang.NullPointerException
        at org.apache.karaf.main.Main.destroy(Main.java:626)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 nexus]# exit
logout

[Connection to zone '8461974a-f64c-e153-f288-d2dba74d5ba8' pts/7 closed]
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a
SunOS 40-8d-5c-b3-b4-aa 5.11 joyent_20170413T062134Z i86pc i386 i86pc

jjelinek · 2017-05-17T19:53:50Z

It will take me a little while to setup to reproduce and debug this, but maybe you could provide the same DTrace you performed earlier, but on the newer platform? From that output it is easy to see that we're failing when we return NULL from nlm_host_findcreate because the kernel's network lock code was never properly initialized. That should be fixed in the commit I mentioned above, so it would be good to see where we're failing now. If its not obvious, I'll work on getting something setup to try to reproduce this.

Smithx10 · 2017-05-17T19:58:00Z

I ran

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_:entry,::frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop:return,::_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }'

In the GZ of the

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a
SunOS 40-8d-5c-b3-b4-aa 5.11 joyent_20170413T062134Z i86pc i386 i86pc

Smithx10 · 2017-05-17T19:58:37Z

Did that commit land in joyent_20170413T062134Z ? I can try updating the platform and see what happens.

Smithx10 · 2017-05-17T23:56:59Z

@jjelinek

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a
SunOS 40-8d-5c-b3-b4-aa 5.11 joyent_20170516T123047Z i86pc i386 i86pc
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
828cf0cd-0946-6873-d28c-92d7cb49247e  LX    128      running           elk_consul_1
8e63c0b8-1e83-49b4-80c9-cfd574870488  OS    256      running           storage.phl.tritonhost.com-8e63c0b8
cb26736d-2ddd-4b6e-9d11-54004f1977a3  OS    256      running           marlin.phl.tritonhost.com-cb26736d
d4809633-9862-415f-b92f-203cec9f0172  OS    256      running           marlin.phl.tritonhost.com-d4809633
949324bd-1d1b-c456-972b-aaebe4ab1023  LX    1024     stopped           test
b77419de-16d5-caeb-9290-d769e8ce253e  LX    1024     stopped           lonely_wilson
8461974a-f64c-e153-f288-d2dba74d5ba8  LX    4096     running           c7-nfs-test
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# zlogin 8461974a-f64c-e153-f288-d2dba74d5ba8
[Connected to zone '8461974a-f64c-e153-f288-d2dba74d5ba8' pts/2]
Last login: Wed May 17 23:54:10 from zone:global
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 ~]# mount
/dev/zfs on / type zfs (rw)
devtmpfs on /dev type devtmpfs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
tmpfs on /dev/shm type tmpfs (rw)
tmpfs on /run type tmpfs (rw)
tmpfs on /sys/fs/cgroup type tmpfs (rw)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw)
tmpfs on /run/user/0 type tmpfs (rw)
10.1.102.250:/zones/nfs/nexus on /app/sonatype-work type nfs (rw)
/native/usr on /native/usr type zfs (ro)
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 ~]# /app/nexus/bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
Unable to update instance pid: No locks available
ERROR: Error creating bundle cache.
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: No locks available
	at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
	at org.apache.felix.framework.Felix.init(Felix.java:691)
	at org.apache.felix.framework.Felix.init(Felix.java:625)
	at org.apache.karaf.main.Main.launch(Main.java:296)
	at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:106)
	at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:49)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
	at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
Error creating bundle cache.
java.lang.NullPointerException
	at org.apache.karaf.main.Main.destroy(Main.java:626)
	at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:56)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
	at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)

Smithx10 · 2017-05-18T00:18:21Z

@jjelinek, Also here is from a centos7 KVM instance running the same mount version.

Curious, What is the cause of lx not to report the same info as what's occurring in kvm?

type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.1.102.81,local_lock=none,addr=10.1.102.250)

 [root@c7-nfs-test-vm ~]# mount | grep nfs.tr
nfs.tritonhost.com:/zones/nfs/nexus on /app/sonatype-work type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.1.102.81,local_lock=none,addr=10.1.102.250)
[root@c7-nfs-test-vm ~]# mount --version
mount from util-linux 2.23.2 (libmount 2.23.0: selinux, debug, assert)
[root@c7-nfs-test-vm ~]# /app/nexus/bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
2017-05-18 00:15:08,209+0000 WARN  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4JInitialiser - Your logging framework class org.ops4j.pax.logging.slf4j.Slf4jLogger is not known - if it needs access to the standard println methods on the console you will need to register it by calling registerLoggingSystemPackage
2017-05-18 00:15:08,213+0000 INFO  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4J - Package org.ops4j.pax.logging.slf4j registered; all classes within it or subpackages of it will be allowed to print to System.out and System.err
2017-05-18 00:15:08,220+0000 INFO  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4J - Replaced standard System.out and System.err PrintStreams with SLF4JPrintStreams
2017-05-18 00:15:08,224+0000 INFO  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4J - Redirected System.out and System.err to SLF4J for this context
2017-05-18 00:15:08,240+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder - Properties:
2017-05-18 00:15:08,245+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   application-host='0.0.0.0'
2017-05-18 00:15:08,247+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   application-port='8081'
2017-05-18 00:15:08,248+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   fabric.etc='/app/nexus/etc/fabric'
2017-05-18 00:15:08,248+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   jetty.etc='/app/nexus/etc/jetty'
2017-05-18 00:15:08,248+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   karaf.base='/app/nexus'
2017-05-18 00:15:08,249+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   karaf.data='/app/sonatype-work/nexus3'
2017-05-18 00:15:08,249+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   karaf.etc='/app/nexus/etc/karaf'

jjelinek · 2017-05-18T12:08:52Z

The fix for OS-5873 is in the 3/30/2017 platform release build, so you should have it in the 4/13/2017 platform that you are running. I'll have to setup this SW and see if I can reproduce the problem you're hitting. As to your other question, lx is a bare-metal container, so it is running on the underlying SmartOS kernel, whereas in kvm you are running your CentOS 7 kernel.

Smithx10 · 2017-05-30T17:11:02Z

@jjelinek need anything else from me?

jjelinek · 2017-05-31T12:02:09Z

I haven't had time to look into this yet, but it is on my list. The only thing that you might include here, if you have time, is the output of the same DTrace run you did originally, but while running on a newer platform (joyent_20170413T062134Z is fine) .

Smithx10 · 2017-05-31T14:49:24Z

@jjelinek,

I was able to run the following on joyent_20170519T195636Z.

[root@c4-54-44-64-35-80 (bs-1) ~]# uname -a
SunOS c4-54-44-64-35-80 5.11 joyent_20170519T195636Z i86pc i386 i86pc
 [root@c4-54-44-64-35-80 (bs-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_*:entry,::*_frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop_*:return,::*_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }'
CPU FUNCTION
 15  -> fop_frlock
 15   | fop_frlock:entry
 15    -> zfs_frlock
 15      -> fs_frlock
 15      <- fs_frlock                         0 (0x0)
 15    <- zfs_frlock                          0 (0x0)
 15   | fop_frlock:return                     0 (0x0)
 15  <- fop_frlock                            0 (0x0)
 15  -> fop_frlock
 15   | fop_frlock:entry
 15    -> zfs_frlock
 15      -> fs_frlock
 15      <- fs_frlock                         0 (0x0)
 15    <- zfs_frlock                          0 (0x0)
 15   | fop_frlock:return                     0 (0x0)
 15  <- fop_frlock                            0 (0x0)
 24  -> fop_frlock
 24   | fop_frlock:entry
 24    -> nfs4_frlock
 24     | nfs4_frlock:entry
 24      -> nfs_zone
 24      <- nfs_zone                          -761054202240 (0xffffff4ecda27680)
 24      -> nfs_rw_enter_sig
 24      <- nfs_rw_enter_sig                  0 (0x0)
 24      -> nfs4_safelock
 24      <- nfs4_safelock                     1 (0x1)
 24      -> nfs4_putpage
 24        -> nfs_zone
 24        <- nfs_zone                        -761054202240 (0xffffff4ecda27680)
 24        -> nfs4_putpages
 24          -> nfs4_has_pages
 24          <- nfs4_has_pages                0 (0x0)
 24        <- nfs4_putpages                   0 (0x0)
 24      <- nfs4_putpage                      0 (0x0)
 24      -> nfs4frlock
 24        -> nfs4_error_zinit
 24        <- nfs4_error_zinit                0 (0x0)
 24        -> nfs4frlock_validate_args
 24        <- nfs4frlock_validate_args        0 (0x0)
 24        -> nfs4frlock_get_sysid
 24          -> nfs4_find_sysid
 24            -> lm_get_sysid
 24              -> nlm_knc_to_netid
 24               | nlm_knc_to_netid:entry
 24               | nlm_knc_to_netid:return   -131917576 (0xfffffffff82318f8)
 24              <- nlm_knc_to_netid          -131917576 (0xfffffffff82318f8)
 24              -> nlm_host_findcreate
 24               | nlm_host_findcreate:entry
 24               | nlm_host_findcreate:return 0 (0x0)
 24              <- nlm_host_findcreate       0 (0x0)
 24            <- lm_get_sysid                0 (0x0)
 24          <- nfs4_find_sysid               0 (0x0)
 24        <- nfs4frlock_get_sysid            46 (0x2e)
 24      <- nfs4frlock                        46 (0x2e)
 24      -> nfs_rw_exit
 24      <- nfs_rw_exit                       0 (0x0)
 24     | nfs4_frlock:return                  46 (0x2e)
 24    <- nfs4_frlock                         46 (0x2e)
 24   | fop_frlock:return                     46 (0x2e)
 24  <- fop_frlock                            46 (0x2e)
 25  -> fop_frlock
 25   | fop_frlock:entry
 25    -> zfs_frlock
 25      -> fs_frlock
 25      <- fs_frlock                         0 (0x0)
 25    <- zfs_frlock                          0 (0x0)
 25   | fop_frlock:return                     0 (0x0)
 25  <- fop_frlock                            0 (0x0)
 25  -> fop_frlock
 25   | fop_frlock:entry
 25    -> zfs_frlock
 25      -> fs_frlock
 25      <- fs_frlock                         0 (0x0)
 25    <- zfs_frlock                          0 (0x0)
 25   | fop_frlock:return                     0 (0x0)
 25  <- fop_frlock                            0 (0x0)

jjelinek · 2017-05-31T21:58:42Z

I have been able to reproduce this locally and I filed OS-6155 to track it internally. This appears to be specific to using a CentOS 7 image in the zone, since I am able to perform the locking fine from within an Ubuntu zone. I'll investigate what is going on.

Smithx10 · 2017-05-31T22:21:36Z

@jjelinek NICE!! Killin' it! JERRY! lol, seems crazy that different lx userland's acquire NFS locks differently ?!$. Can't wait to see what the problem is!

jjelinek · 2017-06-01T14:00:15Z

The problem on CentOS 7 is that rpcbind and rpc.statd are not running. Due to some of the differences in our NFS lockd handling vs. Linux, these two services need to be running for NFS locking to work on lx. I will explore how we can handle this better, but in the meantime, the following should allow you to workaround this.

Confirm there is no rpcbind process running. If not, start it:
systemctl start rpcbind
Confirm that rpcbind is running but that rpc.statd is not registered:
rpcinfo -p
Assuming that you don't see a 'status' service registered, the easiest way I've found to enable it and make it persistent on CentOS 7 is to run:
systemctl enable nfs-server
systemctl restart nfs-server
You should now see the 'status' service if you run 'rpcinfo -p'.

These two services need to be running before you perform the NFS mount. If you already did the mount, you could unmount the filesystem and remount it, or simply reboot. After a reboot, you should be able to run 'rpcinfo -p' and see that both the rpcbind service and the rpc.statd service are available. Once you have done the NFS mount, you should also see that the 'nlockmgr' is registered with rpcbind on several versions and protocols.

Smithx10 · 2017-06-01T14:38:14Z

I'll test the NFS functionality in docker with Alpine / Ubuntu / Debian to see if this peaks it's head up there. I believe that Nexus image I was running is CentOS image.

Smithx10 · 2017-07-12T02:18:00Z

Jerry,

Saw https://cr.joyent.us/#/c/2200/ and pulled the latest platform the dev channel. Working 👍

jjelinek · 2017-07-14T21:33:47Z

Thanks for the update, I closed this out as fixed with 3a5445f.

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> If the pool/dataset command-line argument is specified with a trailing slash, for example, "tank/", we should interpret it as the topmost dataset (rather than the whole pool). References: openzfs/zfs#3415 Closes TritonDataCenter#144

jjelinek self-assigned this May 31, 2017

jjelinek closed this as completed Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't grab NFS lock in LXZone #144

Can't grab NFS lock in LXZone #144

Smithx10 commented May 17, 2017 •

edited

jjelinek commented May 17, 2017

Smithx10 commented May 17, 2017

jjelinek commented May 17, 2017

Smithx10 commented May 17, 2017

Smithx10 commented May 17, 2017

Smithx10 commented May 17, 2017

Smithx10 commented May 18, 2017 •

edited

jjelinek commented May 18, 2017

Smithx10 commented May 30, 2017

jjelinek commented May 31, 2017

Smithx10 commented May 31, 2017 •

edited

jjelinek commented May 31, 2017

Smithx10 commented May 31, 2017

jjelinek commented Jun 1, 2017

Smithx10 commented Jun 1, 2017

Smithx10 commented Jul 12, 2017

jjelinek commented Jul 14, 2017

Can't grab NFS lock in LXZone #144

Can't grab NFS lock in LXZone #144

Comments

Smithx10 commented May 17, 2017 • edited

jjelinek commented May 17, 2017

Smithx10 commented May 17, 2017

jjelinek commented May 17, 2017

Smithx10 commented May 17, 2017

Smithx10 commented May 17, 2017

Smithx10 commented May 17, 2017

Smithx10 commented May 18, 2017 • edited

jjelinek commented May 18, 2017

Smithx10 commented May 30, 2017

jjelinek commented May 31, 2017

Smithx10 commented May 31, 2017 • edited

jjelinek commented May 31, 2017

Smithx10 commented May 31, 2017

jjelinek commented Jun 1, 2017

Smithx10 commented Jun 1, 2017

Smithx10 commented Jul 12, 2017

jjelinek commented Jul 14, 2017

Smithx10 commented May 17, 2017 •

edited

Smithx10 commented May 18, 2017 •

edited

Smithx10 commented May 31, 2017 •

edited