Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't grab NFS lock in LXZone #144

Closed
Smithx10 opened this issue May 17, 2017 · 17 comments
Closed

Can't grab NFS lock in LXZone #144

Smithx10 opened this issue May 17, 2017 · 17 comments
Assignees

Comments

@Smithx10
Copy link

Smithx10 commented May 17, 2017

While trying to run sonatype/nexus3 http://www.sonatype.org/nexus/ in an lx branded zone image: 23ee2dbc-c155-11e6-ab6d-bf5689f582fd centos-7: 12/13/2016 and using an nfs mount as its data directory I got the following error. If I use mount -o nolock, the application works fine.

[root@8461974a-f64c-e153-f288-d2dba74d5ba8 nexus]# bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
Unable to update instance pid: No locks available
ERROR: Error creating bundle cache.
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: No locks available
        at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
        at org.apache.felix.framework.Felix.init(Felix.java:691)
        at org.apache.felix.framework.Felix.init(Felix.java:625)
        at org.apache.karaf.main.Main.launch(Main.java:296)
        at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:106)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
Error creating bundle cache.
java.lang.NullPointerException
        at org.apache.karaf.main.Main.destroy(Main.java:626)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)

Instructions to Reproduce:

Client:

yum install nfs-utils
sudo yum install java-1.8.0-openjdk.x86_64
sudo mkdir /app && cd /app
sudo wget https://sonatype-download.global.ssl.fastly.net/nexus/3/nexus-3.3.1-01-unix.tar.gz
tar -xvf nexus-3.3.1-01-unix.tar.gz
mv nexus-3.3.1-01 nexus
cd nexus/
mount nfs.tritonhost.com:/zones/nfs/nexus /app/sonatype-work
./bin/nexus run

SmartOS Server:
zfs set sharenfs=rw=@10.1.102.0/24 zones/nfs/nexus

DTrace of the failure:

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_*:entry,::*_frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop_*:return,::*_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }'
CPU FUNCTION
  2  -> fop_frlock
  2   | fop_frlock:entry
  2    -> zfs_frlock
  2      -> fs_frlock
  2      <- fs_frlock                         0 (0x0)
  2    <- zfs_frlock                          0 (0x0)
  2   | fop_frlock:return                     0 (0x0)
  2  <- fop_frlock                            0 (0x0)
  2  -> fop_frlock
  2   | fop_frlock:entry
  2    -> zfs_frlock
  2      -> fs_frlock
  2      <- fs_frlock                         0 (0x0)
  2    <- zfs_frlock                          0 (0x0)
  2   | fop_frlock:return                     0 (0x0)
  2  <- fop_frlock                            0 (0x0)
  2  -> fop_frlock
  2   | fop_frlock:entry
  2    -> nfs4_frlock
  2     | nfs4_frlock:entry
  2      -> nfs_zone
  2      <- nfs_zone                          -52562137079744 (0xffffd031ec7da040)
  2      -> nfs_rw_enter_sig
  2      <- nfs_rw_enter_sig                  0 (0x0)
  2      -> nfs4_safelock
  2      <- nfs4_safelock                     1 (0x1)
  2      -> nfs4_putpage
  2        -> nfs_zone
  2        <- nfs_zone                        -52562137079744 (0xffffd031ec7da040)
  2        -> nfs4_putpages
  2          -> nfs4_has_pages
  2          <- nfs4_has_pages                0 (0x0)
  2        <- nfs4_putpages                   0 (0x0)
  2      <- nfs4_putpage                      0 (0x0)
  2      -> nfs4frlock
  2        -> nfs4_error_zinit
  2        <- nfs4_error_zinit                0 (0x0)
  2        -> nfs4frlock_validate_args
  2        <- nfs4frlock_validate_args        0 (0x0)
  2        -> nfs4frlock_get_sysid
  2          -> nfs4_find_sysid
  2            -> lm_get_sysid
  2              -> nlm_knc_to_netid
  2               | nlm_knc_to_netid:entry
  2               | nlm_knc_to_netid:return   -131999496 (0xfffffffff821d8f8)
  2              <- nlm_knc_to_netid          -131999496 (0xfffffffff821d8f8)
  2              -> nlm_host_findcreate
  2               | nlm_host_findcreate:entry
  2               | nlm_host_findcreate:return 0 (0x0)
  2              <- nlm_host_findcreate       0 (0x0)
  2            <- lm_get_sysid                0 (0x0)
  2          <- nfs4_find_sysid               0 (0x0)
  2        <- nfs4frlock_get_sysid            46 (0x2e)
  2      <- nfs4frlock                        46 (0x2e)
  2      -> nfs_rw_exit
  2      <- nfs_rw_exit                       0 (0x0)
  2     | nfs4_frlock:return                  46 (0x2e)
  2    <- nfs4_frlock                         46 (0x2e)
  2   | fop_frlock:return                     46 (0x2e)
  2  <- fop_frlock                            46 (0x2e)
  3  -> fop_frlock
  3   | fop_frlock:entry
  3    -> zfs_frlock
  3      -> fs_frlock
  3      <- fs_frlock                         0 (0x0)
  3    <- zfs_frlock                          0 (0x0)
  3   | fop_frlock:return                     0 (0x0)
  3  <- fop_frlock                            0 (0x0)
  3  -> fop_frlock
  3   | fop_frlock:entry
  3    -> zfs_frlock
  3      -> fs_frlock
  3      <- fs_frlock                         0 (0x0)
  3    <- zfs_frlock                          0 (0x0)
  3   | fop_frlock:return                     0 (0x0)
  3  <- fop_frlock                            0 (0x0)
@jjelinek
Copy link

This should be fixed in commit:
commit 6e02122
Author: Jerry Jelinek jerry.jelinek@joyent.com
Date: Fri Mar 24 17:56:42 2017 +0000

OS-5873 Need NFS client lockd support: fcntl F_SETLK returns ENOLCK in LX zone

Can you verify on a newer platform build?

@Smithx10
Copy link
Author

joyent_20170413T062134Z

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# clear
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
828cf0cd-0946-6873-d28c-92d7cb49247e  LX    128      running           elk_consul_1
8e63c0b8-1e83-49b4-80c9-cfd574870488  OS    256      running           storage.phl.tritonhost.com-8e63c0b8
cb26736d-2ddd-4b6e-9d11-54004f1977a3  OS    256      running           marlin.phl.tritonhost.com-cb26736d
d4809633-9862-415f-b92f-203cec9f0172  OS    256      running           marlin.phl.tritonhost.com-d4809633
949324bd-1d1b-c456-972b-aaebe4ab1023  LX    1024     running           test
b77419de-16d5-caeb-9290-d769e8ce253e  LX    1024     running           lonely_wilson
8461974a-f64c-e153-f288-d2dba74d5ba8  LX    4096     running           c7-nfs-test
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# zlogin 8461974a-f64c-e153-f288-d2dba74d5ba8
[Connected to zone '8461974a-f64c-e153-f288-d2dba74d5ba8' pts/7]
Last login: Wed May 17 18:46:17 from zone:global
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 ~]# cd /app/nexus/
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 nexus]# bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
Unable to update instance pid: No locks available
ERROR: Error creating bundle cache.
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: No locks available
        at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
        at org.apache.felix.framework.Felix.init(Felix.java:691)
        at org.apache.felix.framework.Felix.init(Felix.java:625)
        at org.apache.karaf.main.Main.launch(Main.java:296)
        at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:106)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
Error creating bundle cache.
java.lang.NullPointerException
        at org.apache.karaf.main.Main.destroy(Main.java:626)
        at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
        at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 nexus]# exit
logout

[Connection to zone '8461974a-f64c-e153-f288-d2dba74d5ba8' pts/7 closed]
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a
SunOS 40-8d-5c-b3-b4-aa 5.11 joyent_20170413T062134Z i86pc i386 i86pc

@jjelinek
Copy link

It will take me a little while to setup to reproduce and debug this, but maybe you could provide the same DTrace you performed earlier, but on the newer platform? From that output it is easy to see that we're failing when we return NULL from nlm_host_findcreate because the kernel's network lock code was never properly initialized. That should be fixed in the commit I mentioned above, so it would be good to see where we're failing now. If its not obvious, I'll work on getting something setup to try to reproduce this.

@Smithx10
Copy link
Author

I ran

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_:entry,::frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop:return,::_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }'

In the GZ of the

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a
SunOS 40-8d-5c-b3-b4-aa 5.11 joyent_20170413T062134Z i86pc i386 i86pc

@Smithx10
Copy link
Author

Did that commit land in joyent_20170413T062134Z ? I can try updating the platform and see what happens.

@Smithx10
Copy link
Author

@jjelinek

[root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a
SunOS 40-8d-5c-b3-b4-aa 5.11 joyent_20170516T123047Z i86pc i386 i86pc
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# vmadm list
UUID                                  TYPE  RAM      STATE             ALIAS
828cf0cd-0946-6873-d28c-92d7cb49247e  LX    128      running           elk_consul_1
8e63c0b8-1e83-49b4-80c9-cfd574870488  OS    256      running           storage.phl.tritonhost.com-8e63c0b8
cb26736d-2ddd-4b6e-9d11-54004f1977a3  OS    256      running           marlin.phl.tritonhost.com-cb26736d
d4809633-9862-415f-b92f-203cec9f0172  OS    256      running           marlin.phl.tritonhost.com-d4809633
949324bd-1d1b-c456-972b-aaebe4ab1023  LX    1024     stopped           test
b77419de-16d5-caeb-9290-d769e8ce253e  LX    1024     stopped           lonely_wilson
8461974a-f64c-e153-f288-d2dba74d5ba8  LX    4096     running           c7-nfs-test
[root@40-8d-5c-b3-b4-aa (phl-1) ~]# zlogin 8461974a-f64c-e153-f288-d2dba74d5ba8
[Connected to zone '8461974a-f64c-e153-f288-d2dba74d5ba8' pts/2]
Last login: Wed May 17 23:54:10 from zone:global
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 ~]# mount
/dev/zfs on / type zfs (rw)
devtmpfs on /dev type devtmpfs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
tmpfs on /dev/shm type tmpfs (rw)
tmpfs on /run type tmpfs (rw)
tmpfs on /sys/fs/cgroup type tmpfs (rw)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw)
tmpfs on /run/user/0 type tmpfs (rw)
10.1.102.250:/zones/nfs/nexus on /app/sonatype-work type nfs (rw)
/native/usr on /native/usr type zfs (ro)
[root@8461974a-f64c-e153-f288-d2dba74d5ba8 ~]# /app/nexus/bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
Unable to update instance pid: No locks available
ERROR: Error creating bundle cache.
java.lang.Exception: Unable to lock bundle cache: java.io.IOException: No locks available
	at org.apache.felix.framework.cache.BundleCache.<init>(BundleCache.java:176)
	at org.apache.felix.framework.Felix.init(Felix.java:691)
	at org.apache.felix.framework.Felix.init(Felix.java:625)
	at org.apache.karaf.main.Main.launch(Main.java:296)
	at org.sonatype.nexus.karaf.NexusMain.launch(NexusMain.java:106)
	at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:49)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
	at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)
Error creating bundle cache.
java.lang.NullPointerException
	at org.apache.karaf.main.Main.destroy(Main.java:626)
	at org.sonatype.nexus.karaf.NexusMain.main(NexusMain.java:56)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:62)
	at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:63)

@Smithx10
Copy link
Author

Smithx10 commented May 18, 2017

@jjelinek, Also here is from a centos7 KVM instance running the same mount version.

Curious, What is the cause of lx not to report the same info as what's occurring in kvm?

type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.1.102.81,local_lock=none,addr=10.1.102.250)
 [root@c7-nfs-test-vm ~]# mount | grep nfs.tr
nfs.tritonhost.com:/zones/nfs/nexus on /app/sonatype-work type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.1.102.81,local_lock=none,addr=10.1.102.250)
[root@c7-nfs-test-vm ~]# mount --version
mount from util-linux 2.23.2 (libmount 2.23.0: selinux, debug, assert)
[root@c7-nfs-test-vm ~]# /app/nexus/bin/nexus run
WARNING: ************************************************************
WARNING: Detected execution as "root" user.  This is NOT recommended!
WARNING: ************************************************************
2017-05-18 00:15:08,209+0000 WARN  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4JInitialiser - Your logging framework class org.ops4j.pax.logging.slf4j.Slf4jLogger is not known - if it needs access to the standard println methods on the console you will need to register it by calling registerLoggingSystemPackage
2017-05-18 00:15:08,213+0000 INFO  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4J - Package org.ops4j.pax.logging.slf4j registered; all classes within it or subpackages of it will be allowed to print to System.out and System.err
2017-05-18 00:15:08,220+0000 INFO  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4J - Replaced standard System.out and System.err PrintStreams with SLF4JPrintStreams
2017-05-18 00:15:08,224+0000 INFO  [FelixStartLevel] *SYSTEM uk.org.lidalia.sysoutslf4j.context.SysOutOverSLF4J - Redirected System.out and System.err to SLF4J for this context
2017-05-18 00:15:08,240+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder - Properties:
2017-05-18 00:15:08,245+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   application-host='0.0.0.0'
2017-05-18 00:15:08,247+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   application-port='8081'
2017-05-18 00:15:08,248+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   fabric.etc='/app/nexus/etc/fabric'
2017-05-18 00:15:08,248+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   jetty.etc='/app/nexus/etc/jetty'
2017-05-18 00:15:08,248+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   karaf.base='/app/nexus'
2017-05-18 00:15:08,249+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   karaf.data='/app/sonatype-work/nexus3'
2017-05-18 00:15:08,249+0000 INFO  [FelixStartLevel] *SYSTEM org.sonatype.nexus.bootstrap.ConfigurationBuilder -   karaf.etc='/app/nexus/etc/karaf'

@jjelinek
Copy link

The fix for OS-5873 is in the 3/30/2017 platform release build, so you should have it in the 4/13/2017 platform that you are running. I'll have to setup this SW and see if I can reproduce the problem you're hitting. As to your other question, lx is a bare-metal container, so it is running on the underlying SmartOS kernel, whereas in kvm you are running your CentOS 7 kernel.

@Smithx10
Copy link
Author

@jjelinek need anything else from me?

@jjelinek
Copy link

I haven't had time to look into this yet, but it is on my list. The only thing that you might include here, if you have time, is the output of the same DTrace run you did originally, but while running on a newer platform (joyent_20170413T062134Z is fine) .

@Smithx10
Copy link
Author

Smithx10 commented May 31, 2017

@jjelinek,

I was able to run the following on joyent_20170519T195636Z.

[root@c4-54-44-64-35-80 (bs-1) ~]# uname -a
SunOS c4-54-44-64-35-80 5.11 joyent_20170519T195636Z i86pc i386 i86pc
 [root@c4-54-44-64-35-80 (bs-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_*:entry,::*_frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop_*:return,::*_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }'
CPU FUNCTION
 15  -> fop_frlock
 15   | fop_frlock:entry
 15    -> zfs_frlock
 15      -> fs_frlock
 15      <- fs_frlock                         0 (0x0)
 15    <- zfs_frlock                          0 (0x0)
 15   | fop_frlock:return                     0 (0x0)
 15  <- fop_frlock                            0 (0x0)
 15  -> fop_frlock
 15   | fop_frlock:entry
 15    -> zfs_frlock
 15      -> fs_frlock
 15      <- fs_frlock                         0 (0x0)
 15    <- zfs_frlock                          0 (0x0)
 15   | fop_frlock:return                     0 (0x0)
 15  <- fop_frlock                            0 (0x0)
 24  -> fop_frlock
 24   | fop_frlock:entry
 24    -> nfs4_frlock
 24     | nfs4_frlock:entry
 24      -> nfs_zone
 24      <- nfs_zone                          -761054202240 (0xffffff4ecda27680)
 24      -> nfs_rw_enter_sig
 24      <- nfs_rw_enter_sig                  0 (0x0)
 24      -> nfs4_safelock
 24      <- nfs4_safelock                     1 (0x1)
 24      -> nfs4_putpage
 24        -> nfs_zone
 24        <- nfs_zone                        -761054202240 (0xffffff4ecda27680)
 24        -> nfs4_putpages
 24          -> nfs4_has_pages
 24          <- nfs4_has_pages                0 (0x0)
 24        <- nfs4_putpages                   0 (0x0)
 24      <- nfs4_putpage                      0 (0x0)
 24      -> nfs4frlock
 24        -> nfs4_error_zinit
 24        <- nfs4_error_zinit                0 (0x0)
 24        -> nfs4frlock_validate_args
 24        <- nfs4frlock_validate_args        0 (0x0)
 24        -> nfs4frlock_get_sysid
 24          -> nfs4_find_sysid
 24            -> lm_get_sysid
 24              -> nlm_knc_to_netid
 24               | nlm_knc_to_netid:entry
 24               | nlm_knc_to_netid:return   -131917576 (0xfffffffff82318f8)
 24              <- nlm_knc_to_netid          -131917576 (0xfffffffff82318f8)
 24              -> nlm_host_findcreate
 24               | nlm_host_findcreate:entry
 24               | nlm_host_findcreate:return 0 (0x0)
 24              <- nlm_host_findcreate       0 (0x0)
 24            <- lm_get_sysid                0 (0x0)
 24          <- nfs4_find_sysid               0 (0x0)
 24        <- nfs4frlock_get_sysid            46 (0x2e)
 24      <- nfs4frlock                        46 (0x2e)
 24      -> nfs_rw_exit
 24      <- nfs_rw_exit                       0 (0x0)
 24     | nfs4_frlock:return                  46 (0x2e)
 24    <- nfs4_frlock                         46 (0x2e)
 24   | fop_frlock:return                     46 (0x2e)
 24  <- fop_frlock                            46 (0x2e)
 25  -> fop_frlock
 25   | fop_frlock:entry
 25    -> zfs_frlock
 25      -> fs_frlock
 25      <- fs_frlock                         0 (0x0)
 25    <- zfs_frlock                          0 (0x0)
 25   | fop_frlock:return                     0 (0x0)
 25  <- fop_frlock                            0 (0x0)
 25  -> fop_frlock
 25   | fop_frlock:entry
 25    -> zfs_frlock
 25      -> fs_frlock
 25      <- fs_frlock                         0 (0x0)
 25    <- zfs_frlock                          0 (0x0)
 25   | fop_frlock:return                     0 (0x0)
 25  <- fop_frlock                            0 (0x0)

@jjelinek
Copy link

I have been able to reproduce this locally and I filed OS-6155 to track it internally. This appears to be specific to using a CentOS 7 image in the zone, since I am able to perform the locking fine from within an Ubuntu zone. I'll investigate what is going on.

@jjelinek jjelinek self-assigned this May 31, 2017
@Smithx10
Copy link
Author

@jjelinek NICE!! Killin' it! JERRY! lol, seems crazy that different lx userland's acquire NFS locks differently ?!$. Can't wait to see what the problem is!

@jjelinek
Copy link

jjelinek commented Jun 1, 2017

The problem on CentOS 7 is that rpcbind and rpc.statd are not running. Due to some of the differences in our NFS lockd handling vs. Linux, these two services need to be running for NFS locking to work on lx. I will explore how we can handle this better, but in the meantime, the following should allow you to workaround this.

  1. Confirm there is no rpcbind process running. If not, start it:
    systemctl start rpcbind
  2. Confirm that rpcbind is running but that rpc.statd is not registered:
    rpcinfo -p
  3. Assuming that you don't see a 'status' service registered, the easiest way I've found to enable it and make it persistent on CentOS 7 is to run:
    systemctl enable nfs-server
    systemctl restart nfs-server
  4. You should now see the 'status' service if you run 'rpcinfo -p'.

These two services need to be running before you perform the NFS mount. If you already did the mount, you could unmount the filesystem and remount it, or simply reboot. After a reboot, you should be able to run 'rpcinfo -p' and see that both the rpcbind service and the rpc.statd service are available. Once you have done the NFS mount, you should also see that the 'nlockmgr' is registered with rpcbind on several versions and protocols.

@Smithx10
Copy link
Author

Smithx10 commented Jun 1, 2017

I'll test the NFS functionality in docker with Alpine / Ubuntu / Debian to see if this peaks it's head up there. I believe that Nexus image I was running is CentOS image.

@Smithx10
Copy link
Author

Jerry,

Saw https://cr.joyent.us/#/c/2200/ and pulled the latest platform the dev channel. Working 👍

@jjelinek
Copy link

Thanks for the update, I closed this out as fixed with 3a5445f.

mgerdts pushed a commit to mgerdts/illumos-joyent that referenced this issue Mar 16, 2018
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>

If the pool/dataset command-line argument is specified with a trailing
slash, for example, "tank/", we should interpret it as the topmost
dataset (rather than the whole pool).

References: openzfs/zfs#3415

Closes TritonDataCenter#144
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants