Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.flock does not work on Solaris for 9k #3343

Closed
oblutak opened this issue Sep 22, 2015 · 37 comments
Closed

File.flock does not work on Solaris for 9k #3343

oblutak opened this issue Sep 22, 2015 · 37 comments
Assignees
Milestone

Comments

@oblutak
Copy link

@oblutak oblutak commented Sep 22, 2015

Same issue reported for 3254. The code change fixed the 1.7 branch (1.7.22) but issue is still in 9k. I thought it would be in 9.0.1.0 since it was released after:

$ ./jruby --version
jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [SunOS-amd64]

$ ./jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
AsmRuntime.java:40:in `newUnsatisifiedLinkError': java.lang.UnsatisfiedLinkError: unknown
        from null:-1:in `flock'
        from BaseNativePOSIX.java:527:in `flock'
        from CheckedPOSIX.java:416:in `flock'
        from LazyPOSIX.java:412:in `flock'
        from PosixShim.java:179:in `flock'
        from OpenFile.java:2488:in `run'
        from OpenFile.java:2485:in `run'
        from RubyThread.java:1352:in `executeTask'
        from OpenFile.java:2485:in `threadFlock'
        from RubyFile.java:298:in `flock'
        from RubyFile$INVOKER$i$1$0$flock.gen:-1:in `call'
        from CachingCallSite.java:313:in `cacheAndCall'
        from CachingCallSite.java:163:in `call'
        from -e:-1:in `invokeOther1:flock'
        from -e:1:in `RUBY$block$-e$0'
        from CompiledIRBlockBody.java:70:in `commonYieldPath'
        from IRBlockBody.java:140:in `doYield'
        from BlockBody.java:77:in `yield'
        from Block.java:147:in `yield'
        from RubyIO.java:1133:in `ensureYieldClose'
        from RubyIO.java:1126:in `open'
        from RubyIO$INVOKER$s$0$0$open.gen:-1:in `call'
        from DynamicMethod.java:209:in `call'
        from CachingCallSite.java:343:in `cacheAndCall'
        from CachingCallSite.java:205:in `callBlock'
        from CachingCallSite.java:209:in `call'
        from -e:-1:in `invokeOther7:open'
        from -e:1:in `RUBY$script'
        from MethodHandle.java:625:in `invokeWithArguments'
        from Compiler.java:111:in `load'
        from Ruby.java:822:in `runScript'
        from Ruby.java:814:in `runScript'
        from Ruby.java:752:in `runNormally'
        from Ruby.java:574:in `runFromMain'
        from Main.java:409:in `doRunFromMain'
        from Main.java:304:in `internalRun'
        from Main.java:231:in `run'
        from Main.java:200:in `main'
@oblutak
Copy link
Author

@oblutak oblutak commented Oct 19, 2015

Hearing 9.0.2.0 coming out really soon. Any chance to squeak a fix for this issue? I just tried a snapshot from ci.jruby.com but problem is still present. It is a blocker for my to run 9k on Solaris, since I can't install gems
Last version I tried.
./jruby --version
jruby 9.0.2.0-SNAPSHOT (2.2.2) 2015-10-18 f37539a Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [SunOS-amd64]

@headius
Copy link
Member

@headius headius commented Oct 19, 2015

Looking at the code, I see the same fix for #3254 applied in JRuby master:

if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD && !Platform.IS_SOLARIS) {

The only way this check could pass is if we're not on Solaris, or if that check is not correctly detecting Solaris.

@headius headius added this to the JRuby 9.0.2.0 milestone Oct 19, 2015
@headius headius self-assigned this Oct 19, 2015
@oblutak
Copy link
Author

@oblutak oblutak commented Oct 19, 2015

ok... weird
./jruby -e 'import org.jruby.platform.Platform; puts Platform::OS'
sunos
./jruby -e 'import org.jruby.platform.Platform; puts Platform::IS_SOLARIS'
false

@oblutak
Copy link
Author

@oblutak oblutak commented Oct 19, 2015

I'm REALLY confused. 1.7.22 responds the same way but I don't have the issue on 1.7.22

./jruby --version
jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16 +jit [SunOS-amd64]

./jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
io/console not supported; tty will not be manipulated

./jruby -e 'import org.jruby.platform.Platform; puts Platform::IS_SOLARIS'
io/console not supported; tty will not be manipulated
false

@oblutak
Copy link
Author

@oblutak oblutak commented Oct 19, 2015

on irc, was asked for uname -a output
SunOS ltetools 5.10 Generic_144489-12 i86pc i386 i86pc

@headius
Copy link
Member

@headius headius commented Oct 19, 2015

I believe 1.7 did not have native flock logic, so it wouldn't hit the bug. It seems our IS_SOLARIS is faulty somehow. Will look into that.

@headius
Copy link
Member

@headius headius commented Oct 19, 2015

@oblutak Can you provide the output of the following code?

puts ENV_JAVA["os.name"]
puts ENV_JAVA["os.arch"]

The fact that io/console is giving you a warning makes me think we're not handling native code loading right on Solaris either.

@headius
Copy link
Member

@headius headius commented Oct 19, 2015

Ok, I have a fix for the IS_SOLARIS problem, but there's a larger issue.

In JRuby 9k, we tend to prefer real native channels even for opening files. Unfortunately that means we can't use the pure-Java fallback logic here anyway. We'll either need to not use native file channels on Solaris or, more likely, emulate flock with fcntl the same way MRI does.

@oblutak
Copy link
Author

@oblutak oblutak commented Oct 20, 2015

Here is the requested info
./jruby -e 'puts ENV_JAVA["os.name"], ENV_JAVA["os.arch"]'
SunOS
amd64

On your last comment about flock emulation, is that something that would be done in jnr-posix? Unfortunately, I'm not a coder so can't help that way, but if there is anything I could help testing with, please let me know.

@headius
Copy link
Member

@headius headius commented Oct 20, 2015

@oblutak Yes, I'm working on the flock emulation in jnr-posix. I've hit some snags, though, so this won't make it into the next 9k release. It should be in the one after that.

@headius headius modified the milestones: JRuby 9.0.4.0, JRuby 9.0.2.0 Oct 20, 2015
@oblutak
Copy link
Author

@oblutak oblutak commented Oct 20, 2015

Understood. Thanks for the update. Would you rather I open a different issue for the submission of the IS_SOLARIS fix, or will you just submit both fixes under this issue?

@headius
Copy link
Member

@headius headius commented Oct 20, 2015

@oblutak I'll do both fixes as part of this issue.

@headius headius modified the milestones: JRuby 9.0.5.0, JRuby 9.0.4.0 Oct 22, 2015
@headius
Copy link
Member

@headius headius commented Oct 22, 2015

Bumping to 9.0.5.0 because 9.0.4 will be another quick flip to fix some regressions.

@headius
Copy link
Member

@headius headius commented Jan 19, 2016

I never got back to this, but I did at least start to attempt a flock impl for Solaris based on fcntl. Unfortunately never completed and had to revert when I accidentally pushed it: jnr/jnr-posix@32b9c84

@headius
Copy link
Member

@headius headius commented Jan 21, 2016

Bumping to 9.1 so we can get 9.0.5 out.

@headius headius modified the milestones: JRuby 9.1.0.0, JRuby 9.0.5.0 Jan 21, 2016
@headius
Copy link
Member

@headius headius commented Mar 3, 2016

@oblutak For 9.1 the current code basically just falls back on the old JDK logic when we're on Solaris. Can you test out a 9.1 snapshot build and let me know if it's working ok for you? http://ci.jruby.org

@headius
Copy link
Member

@headius headius commented Mar 3, 2016

I believe I found the bug in my patch and will attempt to apply it now. Downloading a Solaris vbox VM to test against.

@headius
Copy link
Member

@headius headius commented Mar 3, 2016

Pushed my tweaks for the patch in jnr/jnr-posix@f34106c (patch was applied in previous commit).

@headius
Copy link
Member

@headius headius commented Mar 3, 2016

Ok, I was unable to get the flock test to pass on Solaris and I have another task I need to work on.

@oblutak If you're still out there, you could pull master from jnr-posix and try to get the flock test passing on Solaris. I'm guessing the struct layout is not quite right, but I'm not certain.

@headius
Copy link
Member

@headius headius commented Mar 3, 2016

Patch to apply when flock is working in jnr-posix:

diff --git a/core/src/main/java/org/jruby/util/io/PosixShim.java b/core/src/main/java/org/jruby/util/io/PosixShim.java
index 1c122dc..7a4ff67 100644
--- a/core/src/main/java/org/jruby/util/io/PosixShim.java
+++ b/core/src/main/java/org/jruby/util/io/PosixShim.java
@@ -187,7 +187,7 @@ public class PosixShim {

         int real_fd = fd.realFileno;

-        if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD && !Platform.IS_SOLARIS) {
+        if (real_fd != -1 && real_fd < FilenoUtil.FIRST_FAKE_FD) {
             // we have a real fd and not on Solaris...try native flocking
             // see jruby/jruby#3254 and jnr/jnr-posix#60
             int result = posix.flock(real_fd, lockMode);
@oblutak
Copy link
Author

@oblutak oblutak commented May 17, 2016

My apologies, I've been away from all (j)ruby activities since xmas. As indicated before, I'm no designer/coder so I quickly fall out of my element in too technical discussion.
I would love to help, but I think I would use more of your time. I don't even know how to use maven and get to running the jnr-posix tests. I haven't given up yet so if you have any good resources for me, please send them my way

I also tried to read/search re: fcntl and flock. The behavior of the C code above just didn't make sense to me. Comments in stackoverflow discussion I think give a hint.

  • simply: fnctl locks work as a Process <--> File relationship, ignoring filedescriptors
  • flock() places it's locks via a FD on a "Open file description"
    So I think the unit test of locking the same file/region within the same process will never 'fail' for Solaris/fcntl.
    Using C code provided above and compiling to versions with 'sleeps' in it, I think shows that other process trying to get exclusive lock to same file region behaves as expected.
    So I think the unit test assumption is wrong but don't know how to write a different test to show 'behavior' between two different processes. Is that 'adequate' locking for the needs here? There is certainly difference and have no idea how you would make fcntl equivalent to flock.
@mcarpenter
Copy link

@mcarpenter mcarpenter commented May 18, 2016

I'm no designer/coder

You are so busted :) This was a really good comment, I think you're right on the money. I've since been looking at the Linux fcntl(2) man page and a couple of things jump out at me:

A single process can hold only one type of lock on a file region; if a new lock
is  applied  to  an already-locked region, then the existing lock is converted
to the new lock type.

and:

F_SETLK (struct flock *)
          Acquire a lock [...]  If a  conflicting lock is held *by another process*,
this call returns -1

(That's with *my emphasis*). Opengroup has similar wording "An exclusive lock shall prevent any other process from setting a shared lock or an exclusive lock" (again my emphasis).

Given the unit test exists (and presumably doesn't fail) for systems with native flock() I guess that's what we should be aiming for if we wanted absolutely correct (equivalent) behavior. Linux flock() man page explains the unit test:

If a process uses open(2) (or similar) to obtain more than one descriptor for
the same file, these descriptors are treated independently by flock(). An
attempt to lock the file using one of these file descriptors may be denied by
a lock that the calling process has already placed via another descriptor.

On the other hand:

duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer
to the same lock

I agree this seems hard to emulate with plain fcntl() and I have no good ideas on that for the moment. So now where?

@headius
Copy link
Member

@headius headius commented May 18, 2016

MRI supports some of these platforms that don't have flock...we really just need to do what they're doing.

I'd thought we might just be able to use pure-Java flocking, but then we'd need to make sure we're not using any native descriptors...which kills a bunch of more valuable features.

@headius
Copy link
Member

@headius headius commented May 18, 2016

See also ruby/spec#255.

@oblutak
Copy link
Author

@oblutak oblutak commented May 19, 2016

If I understand the flock_spec from ruby/spec

 it "returns false if trying to lock an exclusively locked file" do
    @file.flock File::LOCK_EX

    ruby_exe(<<-END_OF_CODE, escape: true).should == "false"
      File.open('#{@name}', "w") do |f2|
        print f2.flock(File::LOCK_EX | File::LOCK_NB).to_s
      end
    END_OF_CODE
  end

The test is checking that a file with an exclusive lock appears locked to another process.
So I think the 'flock-fcntl bridge' implemented would satisfy (j)ruby's needs.

As for making jnr/posix tests passed for a flock-fcntl bridge, I don't know if that is possible. All existing flock test are perfectly valid for flock. Maybe fcntl locking needs to exists separately with its own set of unit tests (in jnr/posix) and at 'higher' layer (jruby?) need to make the distinction of using/implement a flock-fcntl bridge when a ruby 'flock' is called. (sorry, I said before I'm no coder so don't know the correct terminology but hopefully you understand what I am trying to say)

@headius
Copy link
Member

@headius headius commented May 20, 2016

@oblutak Actually that makes a lot of sense. jnr-posix should just implement and test both flock and fcntl, and if flock is not available it doesn't run the tests, like any other non-universal feature. We'd mimic MRI's logic for using fcntl as flock from the JRuby side.

So I think next steps would be to ensure that fcntl locking works with jnr-posix's fcntl today and write some jnr-posix tests for it. Then we look at how MRI emulates flock with fcntl, implement that in JRuby, and we're good?

@oblutak
Copy link
Author

@oblutak oblutak commented May 20, 2016

I think the flock (using fnctl) as implemented in jnr-posix master in SolarisPOSIX.java replicates what is done in MRI.

Now for fcntl locking unit tests for jnr-posix, I wish I could help but I have no idea how. I don't know how you can 'launch' another process, required to try to get a lock on the same file. Maybe we just do the positive paths test for now just to validate that fcntl calls return successfully and rely on the flock_spec test at ruby/jruby to confirm 'flock' at (j)ruby layer behave as expected.

If you think there is a way I could help, let me know how, since I don't know java at all and never used tools like maven. I am trying a bit on my own. I got a copy of jnr-posix, got jdk-1.8u92 and maven 3.3.9 installed. Not sure what mvn commands to use. Doing mvn verify gave me some test failure as I would expect but I also get a core dump! I think asking to your help to just figuring out these basics steps would just impede and use your time needlessly. But I will look at the final code submitted and hope to learn from it and maybe one day I will be more helpful.

@enebo enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016
@headius
Copy link
Member

@headius headius commented Aug 17, 2016

@oblutak Can you recommend a good free Solaris (or friends) that I could use to verify the fix on a VirtualBox VM?

@headius
Copy link
Member

@headius headius commented Aug 17, 2016

@oblutak RE: testing...I'm not sure either. Is there a standard way we can attempt to lock a file without introducing a lot of dependencies? If not, then I think the next best option would be to simply have the test launch another JVM with some prepared code that tries to lock and prints its success.

If the jnr-posix tests fail (and especially if they crash) we should get that fixed.

@headius
Copy link
Member

@headius headius commented Aug 17, 2016

I think this is turning out to be a bit more work than we can spin in the next week, so we'll bump this to 9.1.4.0 and work with @oblutak to get jnr-posix up to scratch on Solaris.

@headius headius modified the milestones: JRuby 9.1.4.0, JRuby 9.1.3.0 Aug 17, 2016
@oblutak
Copy link
Author

@oblutak oblutak commented Aug 23, 2016

Hi @headius, Sorry for the delay. I've spent vacations totally 'unplugged'. As for 'free Solaris and friends' I've only used Solaris on real hardware. On another issue reported, someone commented similar issue on SmartOS. There wiki has a section 'SmartOS as a Sandboxed VirtualBox Guest', but I don't have any personal experience to share.

@headius
Copy link
Member

@headius headius commented Sep 20, 2016

I have just pushed f8da225 for #4162, which implements flock like MRI using Ruby FFI. This should now be fixed.

Please test a build from http://ci.jruby.org once there's an updated snapshot.

@headius headius closed this Sep 20, 2016
@oblutak
Copy link
Author

@oblutak oblutak commented Sep 21, 2016

Looks promising!
$ jruby --version jruby 9.1.6.0-SNAPSHOT (2.3.1) 2016-09-21 01030ab Java HotSpot(TM) 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [solaris-x86_64] $ jruby -e "File.open('/tmp/test', File::RDWR|File::CREAT) { |f| f.flock(File::LOCK_EX) }"
still unable to install gems, but looks like a totally different issue. I'll try to investigate a bit more and track under different issue, if required. Thanks for the fix!

@oblutak
Copy link
Author

@oblutak oblutak commented Sep 21, 2016

In case someone stumble across my previous comment... Error is/was
jruby-9.1.6.0-SNAPSHOT/bin/jruby -S gem install builder LoadError: no such file to load -- rubygems/resolver/molinillo/lib/molinillo/dependency_graph/log <...>
I used the default 'tar' that ships with Solaris which had issue with long paths, so directory content looked like
ls jruby-9.1.6.0-SNAPSHOT/lib/ruby/stdlib/rubygems/resolver/molinillo/lib/molinillo/dependency_graph ac ad de lo se ta ve
instead of
ls jruby-9.1.6.0-SNAPSHOT/lib/ruby/stdlib/rubygems/resolver/molinillo/lib/molinillo/dependency_graph action.rb add_vertex.rb log.rb tag.rb add_edge_no_circular.rb detach_vertex_named.rb set_payload.rb vertex.rb
just use gnu tar instead to 'fix this issue'.

@headius
Copy link
Member

@headius headius commented Sep 21, 2016

Interesting. Narrow that down and file it...we'll get Solaris 100% yet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.