New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.flock does not work on Solaris for 9k #3343

Closed
oblutak opened this Issue Sep 22, 2015 · 37 comments

Comments

Projects
None yet
5 participants
@oblutak

oblutak commented Sep 22, 2015

Same issue reported for 3254. The code change fixed the 1.7 branch (1.7.22) but issue is still in 9k. I thought it would be in 9.0.1.0 since it was released after:

$ ./jruby --version
jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [SunOS-amd64]

$ ./jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
AsmRuntime.java:40:in `newUnsatisifiedLinkError': java.lang.UnsatisfiedLinkError: unknown
        from null:-1:in `flock'
        from BaseNativePOSIX.java:527:in `flock'
        from CheckedPOSIX.java:416:in `flock'
        from LazyPOSIX.java:412:in `flock'
        from PosixShim.java:179:in `flock'
        from OpenFile.java:2488:in `run'
        from OpenFile.java:2485:in `run'
        from RubyThread.java:1352:in `executeTask'
        from OpenFile.java:2485:in `threadFlock'
        from RubyFile.java:298:in `flock'
        from RubyFile$INVOKER$i$1$0$flock.gen:-1:in `call'
        from CachingCallSite.java:313:in `cacheAndCall'
        from CachingCallSite.java:163:in `call'
        from -e:-1:in `invokeOther1:flock'
        from -e:1:in `RUBY$block$-e$0'
        from CompiledIRBlockBody.java:70:in `commonYieldPath'
        from IRBlockBody.java:140:in `doYield'
        from BlockBody.java:77:in `yield'
        from Block.java:147:in `yield'
        from RubyIO.java:1133:in `ensureYieldClose'
        from RubyIO.java:1126:in `open'
        from RubyIO$INVOKER$s$0$0$open.gen:-1:in `call'
        from DynamicMethod.java:209:in `call'
        from CachingCallSite.java:343:in `cacheAndCall'
        from CachingCallSite.java:205:in `callBlock'
        from CachingCallSite.java:209:in `call'
        from -e:-1:in `invokeOther7:open'
        from -e:1:in `RUBY$script'
        from MethodHandle.java:625:in `invokeWithArguments'
        from Compiler.java:111:in `load'
        from Ruby.java:822:in `runScript'
        from Ruby.java:814:in `runScript'
        from Ruby.java:752:in `runNormally'
        from Ruby.java:574:in `runFromMain'
        from Main.java:409:in `doRunFromMain'
        from Main.java:304:in `internalRun'
        from Main.java:231:in `run'
        from Main.java:200:in `main'
@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Oct 19, 2015

Hearing 9.0.2.0 coming out really soon. Any chance to squeak a fix for this issue? I just tried a snapshot from ci.jruby.com but problem is still present. It is a blocker for my to run 9k on Solaris, since I can't install gems
Last version I tried.
./jruby --version
jruby 9.0.2.0-SNAPSHOT (2.2.2) 2015-10-18 f37539a Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [SunOS-amd64]

oblutak commented Oct 19, 2015

Hearing 9.0.2.0 coming out really soon. Any chance to squeak a fix for this issue? I just tried a snapshot from ci.jruby.com but problem is still present. It is a blocker for my to run 9k on Solaris, since I can't install gems
Last version I tried.
./jruby --version
jruby 9.0.2.0-SNAPSHOT (2.2.2) 2015-10-18 f37539a Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [SunOS-amd64]

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 19, 2015

Member

Looking at the code, I see the same fix for #3254 applied in JRuby master:

if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD && !Platform.IS_SOLARIS) {

The only way this check could pass is if we're not on Solaris, or if that check is not correctly detecting Solaris.

Member

headius commented Oct 19, 2015

Looking at the code, I see the same fix for #3254 applied in JRuby master:

if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD && !Platform.IS_SOLARIS) {

The only way this check could pass is if we're not on Solaris, or if that check is not correctly detecting Solaris.

@headius headius added this to the JRuby 9.0.2.0 milestone Oct 19, 2015

@headius headius self-assigned this Oct 19, 2015

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Oct 19, 2015

ok... weird
./jruby -e 'import org.jruby.platform.Platform; puts Platform::OS'
sunos
./jruby -e 'import org.jruby.platform.Platform; puts Platform::IS_SOLARIS'
false

oblutak commented Oct 19, 2015

ok... weird
./jruby -e 'import org.jruby.platform.Platform; puts Platform::OS'
sunos
./jruby -e 'import org.jruby.platform.Platform; puts Platform::IS_SOLARIS'
false

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Oct 19, 2015

I'm REALLY confused. 1.7.22 responds the same way but I don't have the issue on 1.7.22

./jruby --version
jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16 +jit [SunOS-amd64]

./jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
io/console not supported; tty will not be manipulated

./jruby -e 'import org.jruby.platform.Platform; puts Platform::IS_SOLARIS'
io/console not supported; tty will not be manipulated
false

oblutak commented Oct 19, 2015

I'm REALLY confused. 1.7.22 responds the same way but I don't have the issue on 1.7.22

./jruby --version
jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16 +jit [SunOS-amd64]

./jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
io/console not supported; tty will not be manipulated

./jruby -e 'import org.jruby.platform.Platform; puts Platform::IS_SOLARIS'
io/console not supported; tty will not be manipulated
false

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Oct 19, 2015

on irc, was asked for uname -a output
SunOS ltetools 5.10 Generic_144489-12 i86pc i386 i86pc

oblutak commented Oct 19, 2015

on irc, was asked for uname -a output
SunOS ltetools 5.10 Generic_144489-12 i86pc i386 i86pc

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 19, 2015

Member

I believe 1.7 did not have native flock logic, so it wouldn't hit the bug. It seems our IS_SOLARIS is faulty somehow. Will look into that.

Member

headius commented Oct 19, 2015

I believe 1.7 did not have native flock logic, so it wouldn't hit the bug. It seems our IS_SOLARIS is faulty somehow. Will look into that.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 19, 2015

Member

@oblutak Can you provide the output of the following code?

puts ENV_JAVA["os.name"]
puts ENV_JAVA["os.arch"]

The fact that io/console is giving you a warning makes me think we're not handling native code loading right on Solaris either.

Member

headius commented Oct 19, 2015

@oblutak Can you provide the output of the following code?

puts ENV_JAVA["os.name"]
puts ENV_JAVA["os.arch"]

The fact that io/console is giving you a warning makes me think we're not handling native code loading right on Solaris either.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 19, 2015

Member

Ok, I have a fix for the IS_SOLARIS problem, but there's a larger issue.

In JRuby 9k, we tend to prefer real native channels even for opening files. Unfortunately that means we can't use the pure-Java fallback logic here anyway. We'll either need to not use native file channels on Solaris or, more likely, emulate flock with fcntl the same way MRI does.

Member

headius commented Oct 19, 2015

Ok, I have a fix for the IS_SOLARIS problem, but there's a larger issue.

In JRuby 9k, we tend to prefer real native channels even for opening files. Unfortunately that means we can't use the pure-Java fallback logic here anyway. We'll either need to not use native file channels on Solaris or, more likely, emulate flock with fcntl the same way MRI does.

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Oct 20, 2015

Here is the requested info
./jruby -e 'puts ENV_JAVA["os.name"], ENV_JAVA["os.arch"]'
SunOS
amd64

On your last comment about flock emulation, is that something that would be done in jnr-posix? Unfortunately, I'm not a coder so can't help that way, but if there is anything I could help testing with, please let me know.

oblutak commented Oct 20, 2015

Here is the requested info
./jruby -e 'puts ENV_JAVA["os.name"], ENV_JAVA["os.arch"]'
SunOS
amd64

On your last comment about flock emulation, is that something that would be done in jnr-posix? Unfortunately, I'm not a coder so can't help that way, but if there is anything I could help testing with, please let me know.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 20, 2015

Member

@oblutak Yes, I'm working on the flock emulation in jnr-posix. I've hit some snags, though, so this won't make it into the next 9k release. It should be in the one after that.

Member

headius commented Oct 20, 2015

@oblutak Yes, I'm working on the flock emulation in jnr-posix. I've hit some snags, though, so this won't make it into the next 9k release. It should be in the one after that.

@headius headius modified the milestones: JRuby 9.0.4.0, JRuby 9.0.2.0 Oct 20, 2015

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Oct 20, 2015

Understood. Thanks for the update. Would you rather I open a different issue for the submission of the IS_SOLARIS fix, or will you just submit both fixes under this issue?

oblutak commented Oct 20, 2015

Understood. Thanks for the update. Would you rather I open a different issue for the submission of the IS_SOLARIS fix, or will you just submit both fixes under this issue?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 20, 2015

Member

@oblutak I'll do both fixes as part of this issue.

Member

headius commented Oct 20, 2015

@oblutak I'll do both fixes as part of this issue.

@headius headius modified the milestones: JRuby 9.0.5.0, JRuby 9.0.4.0 Oct 22, 2015

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 22, 2015

Member

Bumping to 9.0.5.0 because 9.0.4 will be another quick flip to fix some regressions.

Member

headius commented Oct 22, 2015

Bumping to 9.0.5.0 because 9.0.4 will be another quick flip to fix some regressions.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Jan 19, 2016

Member

I never got back to this, but I did at least start to attempt a flock impl for Solaris based on fcntl. Unfortunately never completed and had to revert when I accidentally pushed it: jnr/jnr-posix@32b9c84

Member

headius commented Jan 19, 2016

I never got back to this, but I did at least start to attempt a flock impl for Solaris based on fcntl. Unfortunately never completed and had to revert when I accidentally pushed it: jnr/jnr-posix@32b9c84

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Jan 21, 2016

Member

Bumping to 9.1 so we can get 9.0.5 out.

Member

headius commented Jan 21, 2016

Bumping to 9.1 so we can get 9.0.5 out.

@headius headius modified the milestones: JRuby 9.1.0.0, JRuby 9.0.5.0 Jan 21, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

@oblutak For 9.1 the current code basically just falls back on the old JDK logic when we're on Solaris. Can you test out a 9.1 snapshot build and let me know if it's working ok for you? http://ci.jruby.org

Member

headius commented Mar 3, 2016

@oblutak For 9.1 the current code basically just falls back on the old JDK logic when we're on Solaris. Can you test out a 9.1 snapshot build and let me know if it's working ok for you? http://ci.jruby.org

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

I believe I found the bug in my patch and will attempt to apply it now. Downloading a Solaris vbox VM to test against.

Member

headius commented Mar 3, 2016

I believe I found the bug in my patch and will attempt to apply it now. Downloading a Solaris vbox VM to test against.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

Pushed my tweaks for the patch in jnr/jnr-posix@f34106c (patch was applied in previous commit).

Member

headius commented Mar 3, 2016

Pushed my tweaks for the patch in jnr/jnr-posix@f34106c (patch was applied in previous commit).

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

Ok, I was unable to get the flock test to pass on Solaris and I have another task I need to work on.

@oblutak If you're still out there, you could pull master from jnr-posix and try to get the flock test passing on Solaris. I'm guessing the struct layout is not quite right, but I'm not certain.

Member

headius commented Mar 3, 2016

Ok, I was unable to get the flock test to pass on Solaris and I have another task I need to work on.

@oblutak If you're still out there, you could pull master from jnr-posix and try to get the flock test passing on Solaris. I'm guessing the struct layout is not quite right, but I'm not certain.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 3, 2016

Member

Patch to apply when flock is working in jnr-posix:

diff --git a/core/src/main/java/org/jruby/util/io/PosixShim.java b/core/src/main/java/org/jruby/util/io/PosixShim.java
index 1c122dc..7a4ff67 100644
--- a/core/src/main/java/org/jruby/util/io/PosixShim.java
+++ b/core/src/main/java/org/jruby/util/io/PosixShim.java
@@ -187,7 +187,7 @@ public class PosixShim {

         int real_fd = fd.realFileno;

-        if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD && !Platform.IS_SOLARIS) {
+        if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD) {
             // we have a real fd and not on Solaris...try native flocking
             // see jruby/jruby#3254 and jnr/jnr-posix#60
             int result = posix.flock(real_fd, lockMode);
Member

headius commented Mar 3, 2016

Patch to apply when flock is working in jnr-posix:

diff --git a/core/src/main/java/org/jruby/util/io/PosixShim.java b/core/src/main/java/org/jruby/util/io/PosixShim.java
index 1c122dc..7a4ff67 100644
--- a/core/src/main/java/org/jruby/util/io/PosixShim.java
+++ b/core/src/main/java/org/jruby/util/io/PosixShim.java
@@ -187,7 +187,7 @@ public class PosixShim {

         int real_fd = fd.realFileno;

-        if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD && !Platform.IS_SOLARIS) {
+        if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD) {
             // we have a real fd and not on Solaris...try native flocking
             // see jruby/jruby#3254 and jnr/jnr-posix#60
             int result = posix.flock(real_fd, lockMode);
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 8, 2016

Member

Punting to 9.1.1.0. I need assistance.

Member

headius commented Mar 8, 2016

Punting to 9.1.1.0. I need assistance.

@mcarpenter

This comment has been minimized.

Show comment
Hide comment
@mcarpenter

mcarpenter Mar 14, 2016

I've taken a look at this but have also been unable to get it working. Sharing my thoughts in the hope that someone can punt this a little further...

The failing test: https://github.com/jnr/jnr-posix/blob/master/src/test/java/jnr/posix/FileTest.java#L172

flockTest(jnr.posix.FileTest)  Time elapsed: 0.029 sec  <<< FAILURE!
java.lang.AssertionError: expected:<-1> but was:<0>
...
        at jnr.posix.FileTest.flockTest(FileTest.java:172)

I wrote a small standalone Java program imitating the failing test to pass to truss. That produced the following output:

22950/2:        fcntl(275, F_SETLKW64, 0xB2210000)              = 0
22950/2:                typ=F_RDLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0
22950/2:        fcntl(275, F_SETLKW64, 0xB2210020)              = 0
22950/2:                typ=F_UNLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0
22950/2:        fcntl(275, F_SETLKW64, 0xB2210040)              = 0
22950/2:                typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0
22950/2:        fcntl(276, F_SETLK64, 0xB2210060)               = 0
22950/2:                typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0

The struct flock looks okay but that 4th non-blocking exclusive-lock call to fcntl (that the unit test expects to fail) continues to succeed. (Also I noticed that jnr-constant appears to be returning the 64 bit version of F_SETLK[W]64 but changing this didn't make any difference).

I started to doubt the assumption of the test, so wrote some C:

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>                                                           
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, const char *argv[])
{
    system("/bin/rm -f hello ; /bin/touch hello ; /bin/chmod g+l hello");     
    int fd1 = open("hello", O_RDWR);
    int fd2 = open("hello", O_RDWR);

    struct flock lock;
    lock.l_type = 0;
    lock.l_whence = SEEK_SET;                                                 
    lock.l_start = 0;
    lock.l_len = 0;                                                           
    lock.l_sysid = 0;
    lock.l_pid = 0;

    lock.l_type = F_RDLCK;                                                    
    if(fcntl(fd1, F_SETLKW, &lock)) {
        fprintf(stderr, "flock1\n");                                          
        exit(1);                                                              
    }
    lock.l_type = F_UNLCK;                                                    
    if(fcntl(fd1, F_SETLKW, &lock)) {                                         
        fprintf(stderr, "flock1\n");
        exit(1);
    }
    lock.l_type = F_WRLCK;                                                    
    if(fcntl(fd1, F_SETLKW, &lock)) {                                         
        fprintf(stderr, "flock3\n");
        exit(1);                                                              
    }
    lock.l_type = F_WRLCK;  
    if(!fcntl(fd2, F_SETLK, &lock)) {                                         
        fprintf(stderr, "flock4\n");                                          
        exit(1);
    }                                                                         
    lock.l_type = F_UNLCK;
    if(fcntl(fd1, F_SETLKW, &lock)) {                                         
        fprintf(stderr, "flock5\n");                                          
        exit(1);
    }

    return 0;
}

Compiling and running this:

$ make foo
gcc     foo.c   -o foo
$ ./foo 
flock4
$ truss -tfcntl -vall ./foo
fcntl(3, F_SETLKW, 0xFEFFE994)                  = 0
        typ=F_RDLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
fcntl(3, F_SETLKW, 0xFEFFE994)                  = 0
        typ=F_UNLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
fcntl(3, F_SETLKW, 0xFEFFE994)                  = 0
        typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
fcntl(4, F_SETLK, 0xFEFFE994)                   = 0
        typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
flock4
$

Same result: that fourth "exclusive lock it again" call continues to succeed :(

Other things I've considered:

  • I'm using ZFS, and this is a supported file system (see https://docs.oracle.com/cd/E19683-01/806-4125/6jd7pe6bk/index.html).
  • Mandatory vs. advisory locking (the sgid bit set in the above by the call to chmod(1)): this makes
    no difference. (See above link).
  • I note locks are destroyed by close(2)ing any other file, but that doesn't happen here.
  • Lack of lock inheritance on fork(2) is irrelevant.
  • I've tested the C program on Solaris 10 SPARC and Solaris 11 x86 with same results.

So we conclude that... the unit test assumption is wrong? Why? Or there's a bug in Solaris fcntl(2) (which... seems unlikely...)?

Other possibilities?

mcarpenter commented Mar 14, 2016

I've taken a look at this but have also been unable to get it working. Sharing my thoughts in the hope that someone can punt this a little further...

The failing test: https://github.com/jnr/jnr-posix/blob/master/src/test/java/jnr/posix/FileTest.java#L172

flockTest(jnr.posix.FileTest)  Time elapsed: 0.029 sec  <<< FAILURE!
java.lang.AssertionError: expected:<-1> but was:<0>
...
        at jnr.posix.FileTest.flockTest(FileTest.java:172)

I wrote a small standalone Java program imitating the failing test to pass to truss. That produced the following output:

22950/2:        fcntl(275, F_SETLKW64, 0xB2210000)              = 0
22950/2:                typ=F_RDLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0
22950/2:        fcntl(275, F_SETLKW64, 0xB2210020)              = 0
22950/2:                typ=F_UNLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0
22950/2:        fcntl(275, F_SETLKW64, 0xB2210040)              = 0
22950/2:                typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0
22950/2:        fcntl(276, F_SETLK64, 0xB2210060)               = 0
22950/2:                typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=0  pid=0

The struct flock looks okay but that 4th non-blocking exclusive-lock call to fcntl (that the unit test expects to fail) continues to succeed. (Also I noticed that jnr-constant appears to be returning the 64 bit version of F_SETLK[W]64 but changing this didn't make any difference).

I started to doubt the assumption of the test, so wrote some C:

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>                                                           
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, const char *argv[])
{
    system("/bin/rm -f hello ; /bin/touch hello ; /bin/chmod g+l hello");     
    int fd1 = open("hello", O_RDWR);
    int fd2 = open("hello", O_RDWR);

    struct flock lock;
    lock.l_type = 0;
    lock.l_whence = SEEK_SET;                                                 
    lock.l_start = 0;
    lock.l_len = 0;                                                           
    lock.l_sysid = 0;
    lock.l_pid = 0;

    lock.l_type = F_RDLCK;                                                    
    if(fcntl(fd1, F_SETLKW, &lock)) {
        fprintf(stderr, "flock1\n");                                          
        exit(1);                                                              
    }
    lock.l_type = F_UNLCK;                                                    
    if(fcntl(fd1, F_SETLKW, &lock)) {                                         
        fprintf(stderr, "flock1\n");
        exit(1);
    }
    lock.l_type = F_WRLCK;                                                    
    if(fcntl(fd1, F_SETLKW, &lock)) {                                         
        fprintf(stderr, "flock3\n");
        exit(1);                                                              
    }
    lock.l_type = F_WRLCK;  
    if(!fcntl(fd2, F_SETLK, &lock)) {                                         
        fprintf(stderr, "flock4\n");                                          
        exit(1);
    }                                                                         
    lock.l_type = F_UNLCK;
    if(fcntl(fd1, F_SETLKW, &lock)) {                                         
        fprintf(stderr, "flock5\n");                                          
        exit(1);
    }

    return 0;
}

Compiling and running this:

$ make foo
gcc     foo.c   -o foo
$ ./foo 
flock4
$ truss -tfcntl -vall ./foo
fcntl(3, F_SETLKW, 0xFEFFE994)                  = 0
        typ=F_RDLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
fcntl(3, F_SETLKW, 0xFEFFE994)                  = 0
        typ=F_UNLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
fcntl(3, F_SETLKW, 0xFEFFE994)                  = 0
        typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
fcntl(4, F_SETLK, 0xFEFFE994)                   = 0
        typ=F_WRLCK  whence=SEEK_SET start=0     len=0     sys=4267573568 pid=-16782700
flock4
$

Same result: that fourth "exclusive lock it again" call continues to succeed :(

Other things I've considered:

  • I'm using ZFS, and this is a supported file system (see https://docs.oracle.com/cd/E19683-01/806-4125/6jd7pe6bk/index.html).
  • Mandatory vs. advisory locking (the sgid bit set in the above by the call to chmod(1)): this makes
    no difference. (See above link).
  • I note locks are destroyed by close(2)ing any other file, but that doesn't happen here.
  • Lack of lock inheritance on fork(2) is irrelevant.
  • I've tested the C program on Solaris 10 SPARC and Solaris 11 x86 with same results.

So we conclude that... the unit test assumption is wrong? Why? Or there's a bug in Solaris fcntl(2) (which... seems unlikely...)?

Other possibilities?

@headius headius modified the milestones: JRuby 9.1.1.0, JRuby 9.1.2.0 May 11, 2016

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak May 17, 2016

My apologies, I've been away from all (j)ruby activities since xmas. As indicated before, I'm no designer/coder so I quickly fall out of my element in too technical discussion.
I would love to help, but I think I would use more of your time. I don't even know how to use maven and get to running the jnr-posix tests. I haven't given up yet so if you have any good resources for me, please send them my way

I also tried to read/search re: fcntl and flock. The behavior of the C code above just didn't make sense to me. Comments in stackoverflow discussion I think give a hint.

  • simply: fnctl locks work as a Process <--> File relationship, ignoring filedescriptors
  • flock() places it's locks via a FD on a "Open file description"
    So I think the unit test of locking the same file/region within the same process will never 'fail' for Solaris/fcntl.
    Using C code provided above and compiling to versions with 'sleeps' in it, I think shows that other process trying to get exclusive lock to same file region behaves as expected.
    So I think the unit test assumption is wrong but don't know how to write a different test to show 'behavior' between two different processes. Is that 'adequate' locking for the needs here? There is certainly difference and have no idea how you would make fcntl equivalent to flock.

oblutak commented May 17, 2016

My apologies, I've been away from all (j)ruby activities since xmas. As indicated before, I'm no designer/coder so I quickly fall out of my element in too technical discussion.
I would love to help, but I think I would use more of your time. I don't even know how to use maven and get to running the jnr-posix tests. I haven't given up yet so if you have any good resources for me, please send them my way

I also tried to read/search re: fcntl and flock. The behavior of the C code above just didn't make sense to me. Comments in stackoverflow discussion I think give a hint.

  • simply: fnctl locks work as a Process <--> File relationship, ignoring filedescriptors
  • flock() places it's locks via a FD on a "Open file description"
    So I think the unit test of locking the same file/region within the same process will never 'fail' for Solaris/fcntl.
    Using C code provided above and compiling to versions with 'sleeps' in it, I think shows that other process trying to get exclusive lock to same file region behaves as expected.
    So I think the unit test assumption is wrong but don't know how to write a different test to show 'behavior' between two different processes. Is that 'adequate' locking for the needs here? There is certainly difference and have no idea how you would make fcntl equivalent to flock.
@mcarpenter

This comment has been minimized.

Show comment
Hide comment
@mcarpenter

mcarpenter May 18, 2016

I'm no designer/coder

You are so busted :) This was a really good comment, I think you're right on the money. I've since been looking at the Linux fcntl(2) man page and a couple of things jump out at me:

A single process can hold only one type of lock on a file region; if a new lock
is  applied  to  an already-locked region, then the existing lock is converted
to the new lock type.

and:

F_SETLK (struct flock *)
          Acquire a lock [...]  If a  conflicting lock is held *by another process*,
this call returns -1

(That's with *my emphasis*). Opengroup has similar wording "An exclusive lock shall prevent any other process from setting a shared lock or an exclusive lock" (again my emphasis).

Given the unit test exists (and presumably doesn't fail) for systems with native flock() I guess that's what we should be aiming for if we wanted absolutely correct (equivalent) behavior. Linux flock() man page explains the unit test:

If a process uses open(2) (or similar) to obtain more than one descriptor for
the same file, these descriptors are treated independently by flock(). An
attempt to lock the file using one of these file descriptors may be denied by
a lock that the calling process has already placed via another descriptor.

On the other hand:

duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer
to the same lock

I agree this seems hard to emulate with plain fcntl() and I have no good ideas on that for the moment. So now where?

mcarpenter commented May 18, 2016

I'm no designer/coder

You are so busted :) This was a really good comment, I think you're right on the money. I've since been looking at the Linux fcntl(2) man page and a couple of things jump out at me:

A single process can hold only one type of lock on a file region; if a new lock
is  applied  to  an already-locked region, then the existing lock is converted
to the new lock type.

and:

F_SETLK (struct flock *)
          Acquire a lock [...]  If a  conflicting lock is held *by another process*,
this call returns -1

(That's with *my emphasis*). Opengroup has similar wording "An exclusive lock shall prevent any other process from setting a shared lock or an exclusive lock" (again my emphasis).

Given the unit test exists (and presumably doesn't fail) for systems with native flock() I guess that's what we should be aiming for if we wanted absolutely correct (equivalent) behavior. Linux flock() man page explains the unit test:

If a process uses open(2) (or similar) to obtain more than one descriptor for
the same file, these descriptors are treated independently by flock(). An
attempt to lock the file using one of these file descriptors may be denied by
a lock that the calling process has already placed via another descriptor.

On the other hand:

duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer
to the same lock

I agree this seems hard to emulate with plain fcntl() and I have no good ideas on that for the moment. So now where?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 18, 2016

Member

MRI supports some of these platforms that don't have flock...we really just need to do what they're doing.

I'd thought we might just be able to use pure-Java flocking, but then we'd need to make sure we're not using any native descriptors...which kills a bunch of more valuable features.

Member

headius commented May 18, 2016

MRI supports some of these platforms that don't have flock...we really just need to do what they're doing.

I'd thought we might just be able to use pure-Java flocking, but then we'd need to make sure we're not using any native descriptors...which kills a bunch of more valuable features.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius
Member

headius commented May 18, 2016

See also ruby/spec#255.

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak May 19, 2016

If I understand the flock_spec from ruby/spec

 it "returns false if trying to lock an exclusively locked file" do
    @file.flock File::LOCK_EX

    ruby_exe(<<-END_OF_CODE, escape: true).should == "false"
      File.open('#{@name}', "w") do |f2|
        print f2.flock(File::LOCK_EX | File::LOCK_NB).to_s
      end
    END_OF_CODE
  end

The test is checking that a file with an exclusive lock appears locked to another process.
So I think the 'flock-fcntl bridge' implemented would satisfy (j)ruby's needs.

As for making jnr/posix tests passed for a flock-fcntl bridge, I don't know if that is possible. All existing flock test are perfectly valid for flock. Maybe fcntl locking needs to exists separately with its own set of unit tests (in jnr/posix) and at 'higher' layer (jruby?) need to make the distinction of using/implement a flock-fcntl bridge when a ruby 'flock' is called. (sorry, I said before I'm no coder so don't know the correct terminology but hopefully you understand what I am trying to say)

oblutak commented May 19, 2016

If I understand the flock_spec from ruby/spec

 it "returns false if trying to lock an exclusively locked file" do
    @file.flock File::LOCK_EX

    ruby_exe(<<-END_OF_CODE, escape: true).should == "false"
      File.open('#{@name}', "w") do |f2|
        print f2.flock(File::LOCK_EX | File::LOCK_NB).to_s
      end
    END_OF_CODE
  end

The test is checking that a file with an exclusive lock appears locked to another process.
So I think the 'flock-fcntl bridge' implemented would satisfy (j)ruby's needs.

As for making jnr/posix tests passed for a flock-fcntl bridge, I don't know if that is possible. All existing flock test are perfectly valid for flock. Maybe fcntl locking needs to exists separately with its own set of unit tests (in jnr/posix) and at 'higher' layer (jruby?) need to make the distinction of using/implement a flock-fcntl bridge when a ruby 'flock' is called. (sorry, I said before I'm no coder so don't know the correct terminology but hopefully you understand what I am trying to say)

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius May 20, 2016

Member

@oblutak Actually that makes a lot of sense. jnr-posix should just implement and test both flock and fcntl, and if flock is not available it doesn't run the tests, like any other non-universal feature. We'd mimic MRI's logic for using fcntl as flock from the JRuby side.

So I think next steps would be to ensure that fcntl locking works with jnr-posix's fcntl today and write some jnr-posix tests for it. Then we look at how MRI emulates flock with fcntl, implement that in JRuby, and we're good?

Member

headius commented May 20, 2016

@oblutak Actually that makes a lot of sense. jnr-posix should just implement and test both flock and fcntl, and if flock is not available it doesn't run the tests, like any other non-universal feature. We'd mimic MRI's logic for using fcntl as flock from the JRuby side.

So I think next steps would be to ensure that fcntl locking works with jnr-posix's fcntl today and write some jnr-posix tests for it. Then we look at how MRI emulates flock with fcntl, implement that in JRuby, and we're good?

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak May 20, 2016

I think the flock (using fnctl) as implemented in jnr-posix master in SolarisPOSIX.java replicates what is done in MRI.

Now for fcntl locking unit tests for jnr-posix, I wish I could help but I have no idea how. I don't know how you can 'launch' another process, required to try to get a lock on the same file. Maybe we just do the positive paths test for now just to validate that fcntl calls return successfully and rely on the flock_spec test at ruby/jruby to confirm 'flock' at (j)ruby layer behave as expected.

If you think there is a way I could help, let me know how, since I don't know java at all and never used tools like maven. I am trying a bit on my own. I got a copy of jnr-posix, got jdk-1.8u92 and maven 3.3.9 installed. Not sure what mvn commands to use. Doing mvn verify gave me some test failure as I would expect but I also get a core dump! I think asking to your help to just figuring out these basics steps would just impede and use your time needlessly. But I will look at the final code submitted and hope to learn from it and maybe one day I will be more helpful.

oblutak commented May 20, 2016

I think the flock (using fnctl) as implemented in jnr-posix master in SolarisPOSIX.java replicates what is done in MRI.

Now for fcntl locking unit tests for jnr-posix, I wish I could help but I have no idea how. I don't know how you can 'launch' another process, required to try to get a lock on the same file. Maybe we just do the positive paths test for now just to validate that fcntl calls return successfully and rely on the flock_spec test at ruby/jruby to confirm 'flock' at (j)ruby layer behave as expected.

If you think there is a way I could help, let me know how, since I don't know java at all and never used tools like maven. I am trying a bit on my own. I got a copy of jnr-posix, got jdk-1.8u92 and maven 3.3.9 installed. Not sure what mvn commands to use. Doing mvn verify gave me some test failure as I would expect but I also get a core dump! I think asking to your help to just figuring out these basics steps would just impede and use your time needlessly. But I will look at the final code submitted and hope to learn from it and maybe one day I will be more helpful.

@enebo enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2016

Member

@oblutak Can you recommend a good free Solaris (or friends) that I could use to verify the fix on a VirtualBox VM?

Member

headius commented Aug 17, 2016

@oblutak Can you recommend a good free Solaris (or friends) that I could use to verify the fix on a VirtualBox VM?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2016

Member

@oblutak RE: testing...I'm not sure either. Is there a standard way we can attempt to lock a file without introducing a lot of dependencies? If not, then I think the next best option would be to simply have the test launch another JVM with some prepared code that tries to lock and prints its success.

If the jnr-posix tests fail (and especially if they crash) we should get that fixed.

Member

headius commented Aug 17, 2016

@oblutak RE: testing...I'm not sure either. Is there a standard way we can attempt to lock a file without introducing a lot of dependencies? If not, then I think the next best option would be to simply have the test launch another JVM with some prepared code that tries to lock and prints its success.

If the jnr-posix tests fail (and especially if they crash) we should get that fixed.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 17, 2016

Member

I think this is turning out to be a bit more work than we can spin in the next week, so we'll bump this to 9.1.4.0 and work with @oblutak to get jnr-posix up to scratch on Solaris.

Member

headius commented Aug 17, 2016

I think this is turning out to be a bit more work than we can spin in the next week, so we'll bump this to 9.1.4.0 and work with @oblutak to get jnr-posix up to scratch on Solaris.

@headius headius modified the milestones: JRuby 9.1.4.0, JRuby 9.1.3.0 Aug 17, 2016

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Aug 23, 2016

Hi @headius, Sorry for the delay. I've spent vacations totally 'unplugged'. As for 'free Solaris and friends' I've only used Solaris on real hardware. On another issue reported, someone commented similar issue on SmartOS. There wiki has a section 'SmartOS as a Sandboxed VirtualBox Guest', but I don't have any personal experience to share.

oblutak commented Aug 23, 2016

Hi @headius, Sorry for the delay. I've spent vacations totally 'unplugged'. As for 'free Solaris and friends' I've only used Solaris on real hardware. On another issue reported, someone commented similar issue on SmartOS. There wiki has a section 'SmartOS as a Sandboxed VirtualBox Guest', but I don't have any personal experience to share.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Sep 20, 2016

Member

I have just pushed f8da225 for #4162, which implements flock like MRI using Ruby FFI. This should now be fixed.

Please test a build from http://ci.jruby.org once there's an updated snapshot.

Member

headius commented Sep 20, 2016

I have just pushed f8da225 for #4162, which implements flock like MRI using Ruby FFI. This should now be fixed.

Please test a build from http://ci.jruby.org once there's an updated snapshot.

@headius headius closed this Sep 20, 2016

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Sep 21, 2016

Looks promising!
$ jruby --version jruby 9.1.6.0-SNAPSHOT (2.3.1) 2016-09-21 01030ab Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [solaris-x86_64] $ jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
still unable to install gems, but looks like a totally different issue. I'll try to investigate a bit more and track under different issue, if required. Thanks for the fix!

oblutak commented Sep 21, 2016

Looks promising!
$ jruby --version jruby 9.1.6.0-SNAPSHOT (2.3.1) 2016-09-21 01030ab Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [solaris-x86_64] $ jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
still unable to install gems, but looks like a totally different issue. I'll try to investigate a bit more and track under different issue, if required. Thanks for the fix!

@oblutak

This comment has been minimized.

Show comment
Hide comment
@oblutak

oblutak Sep 21, 2016

In case someone stumble across my previous comment... Error is/was
jruby-9.1.6.0-SNAPSHOT/bin/jruby -S gem install builder LoadError: no such file to load -- rubygems/resolver/molinillo/lib/molinillo/dependency_graph/log <...>
I used the default 'tar' that ships with Solaris which had issue with long paths, so directory content looked like
ls jruby-9.1.6.0-SNAPSHOT/lib/ruby/stdlib/rubygems/resolver/molinillo/lib/molinillo/dependency_graph ac ad de lo se ta ve
instead of
ls jruby-9.1.6.0-SNAPSHOT/lib/ruby/stdlib/rubygems/resolver/molinillo/lib/molinillo/dependency_graph action.rb add_vertex.rb log.rb tag.rb add_edge_no_circular.rb detach_vertex_named.rb set_payload.rb vertex.rb
just use gnu tar instead to 'fix this issue'.

oblutak commented Sep 21, 2016

In case someone stumble across my previous comment... Error is/was
jruby-9.1.6.0-SNAPSHOT/bin/jruby -S gem install builder LoadError: no such file to load -- rubygems/resolver/molinillo/lib/molinillo/dependency_graph/log <...>
I used the default 'tar' that ships with Solaris which had issue with long paths, so directory content looked like
ls jruby-9.1.6.0-SNAPSHOT/lib/ruby/stdlib/rubygems/resolver/molinillo/lib/molinillo/dependency_graph ac ad de lo se ta ve
instead of
ls jruby-9.1.6.0-SNAPSHOT/lib/ruby/stdlib/rubygems/resolver/molinillo/lib/molinillo/dependency_graph action.rb add_vertex.rb log.rb tag.rb add_edge_no_circular.rb detach_vertex_named.rb set_payload.rb vertex.rb
just use gnu tar instead to 'fix this issue'.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Sep 21, 2016

Member

Interesting. Narrow that down and file it...we'll get Solaris 100% yet!

Member

headius commented Sep 21, 2016

Interesting. Narrow that down and file it...we'll get Solaris 100% yet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment