New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't grab NFS lock in LXZone #144
Comments
This should be fixed in commit:
Can you verify on a newer platform build? |
joyent_20170413T062134Z
|
It will take me a little while to setup to reproduce and debug this, but maybe you could provide the same DTrace you performed earlier, but on the newer platform? From that output it is easy to see that we're failing when we return NULL from nlm_host_findcreate because the kernel's network lock code was never properly initialized. That should be fixed in the commit I mentioned above, so it would be good to see where we're failing now. If its not obvious, I'll work on getting something setup to try to reproduce this. |
I ran [root@40-8d-5c-b3-b4-aa (phl-1) ~]# dtrace -Fqn '::lx_fcntl64:entry /arg1 == 6/ { self->t=1; } ::nfs*:entry,::fop_:entry,::frlock:entry,::lm_get_sysid:entry,::nlm_knc_to_netid:entry,::nlm_host_findcreate:entry,::nlm*:entry /self->t/ { printf("\n"); } ::nfs*:return,::fop:return,::_frlock:return,::lm_get_sysid:return,::nlm_knc_to_netid:return,::nlm_host_findcreate:return,::nlm*:return /self->t/ { printf("%d (0x%p)\n", arg1, arg1); } ::lx_fcntl64:return /self->t/ { self->t=0; }' In the GZ of the [root@40-8d-5c-b3-b4-aa (phl-1) ~]# uname -a |
Did that commit land in joyent_20170413T062134Z ? I can try updating the platform and see what happens. |
|
@jjelinek, Also here is from a centos7 KVM instance running the same mount version. Curious, What is the cause of lx not to report the same info as what's occurring in kvm?
|
The fix for OS-5873 is in the 3/30/2017 platform release build, so you should have it in the 4/13/2017 platform that you are running. I'll have to setup this SW and see if I can reproduce the problem you're hitting. As to your other question, lx is a bare-metal container, so it is running on the underlying SmartOS kernel, whereas in kvm you are running your CentOS 7 kernel. |
@jjelinek need anything else from me? |
I haven't had time to look into this yet, but it is on my list. The only thing that you might include here, if you have time, is the output of the same DTrace run you did originally, but while running on a newer platform (joyent_20170413T062134Z is fine) . |
I was able to run the following on joyent_20170519T195636Z.
|
I have been able to reproduce this locally and I filed OS-6155 to track it internally. This appears to be specific to using a CentOS 7 image in the zone, since I am able to perform the locking fine from within an Ubuntu zone. I'll investigate what is going on. |
@jjelinek NICE!! Killin' it! JERRY! lol, seems crazy that different lx userland's acquire NFS locks differently ?!$. Can't wait to see what the problem is! |
The problem on CentOS 7 is that rpcbind and rpc.statd are not running. Due to some of the differences in our NFS lockd handling vs. Linux, these two services need to be running for NFS locking to work on lx. I will explore how we can handle this better, but in the meantime, the following should allow you to workaround this.
These two services need to be running before you perform the NFS mount. If you already did the mount, you could unmount the filesystem and remount it, or simply reboot. After a reboot, you should be able to run 'rpcinfo -p' and see that both the rpcbind service and the rpc.statd service are available. Once you have done the NFS mount, you should also see that the 'nlockmgr' is registered with rpcbind on several versions and protocols. |
I'll test the NFS functionality in docker with Alpine / Ubuntu / Debian to see if this peaks it's head up there. I believe that Nexus image I was running is CentOS image. |
Jerry, Saw https://cr.joyent.us/#/c/2200/ and pulled the latest platform the dev channel. Working 👍 |
Thanks for the update, I closed this out as fixed with 3a5445f. |
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> If the pool/dataset command-line argument is specified with a trailing slash, for example, "tank/", we should interpret it as the topmost dataset (rather than the whole pool). References: openzfs/zfs#3415 Closes TritonDataCenter#144
While trying to run sonatype/nexus3 http://www.sonatype.org/nexus/ in an lx branded zone image: 23ee2dbc-c155-11e6-ab6d-bf5689f582fd centos-7: 12/13/2016 and using an nfs mount as its data directory I got the following error. If I use mount -o nolock, the application works fine.
Instructions to Reproduce:
Client:
SmartOS Server:
zfs set sharenfs=rw=@10.1.102.0/24 zones/nfs/nexus
DTrace of the failure:
The text was updated successfully, but these errors were encountered: