Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent File#flock calls of the same file results in Errno::EINVAL exception on Windows #5909

Closed
meineerde opened this issue Oct 9, 2019 · 10 comments

Comments

@meineerde
Copy link

commented Oct 9, 2019

Environment

  • jruby 9.2.8.0 (2.5.3) 2019-08-12 a1ac7ff OpenJDK 64-Bit Server VM 25.222-b10 on 1.8.0_222-ojdkbuild-b10 +jit [mswin32-x86_64]
  • Operating System: Windows 10 1803

Expected Behavior

Consider the following code:

require 'tempfile'
tmpfile = Tempfile.new('')

10.times do
  Thread.new do
    mode = File::WRONLY | File::APPEND | File::CREAT
    file = File.new(tmpfile.path, mode: mode)

    50.times do
      file.flock(File::LOCK_EX)
      file.puts Thread.current.object_id
      file.flock(File::LOCK_UN)
    end
  end
end

Each Thread should write its object_id 50 times to the file while exclusively flocking it around each write. The flocks should not interfere with each other and should not result in an exception. Instead, they should serialize the writes of the different threads.

This works as expected on both Linux (e.g. on Travis CI) as well as macOS with JRuby and MRI. It also works correctly with all MRI versions between 2.1 - trunk, even on Windows.

Actual Behavior

With the current versions of JRuby 9.0, 9.1, and 9.2 on Windows (e.g. on AppVeyor), the above code results in the following error in some of the concurrent threads when attempting the file.flock(File::LOCK_EX):

Errno::EINVAL Invalid argument - C:/path/to/file
    flock at org/jruby/RubyFile.java:342

Note: Since this appears to be a race condition, you might have to adjust the number of threads and the inner loop a bit to reliably reproduce this depending on your environment or try the code multiple times.

@meineerde meineerde changed the title Concurrent File.flock calls of the same file results in Errno::EINVAL exception on Windows Concurrent File#flock calls of the same file results in Errno::EINVAL exception on Windows Oct 9, 2019
@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

I'm guessing it's the EINVAL from here:

} catch (OverlappingFileLockException ioe) {
setErrno(Errno.EINVAL);
errmsg = "overlapping file locks";
}

When on Windows, I believe we are not using a true native flock functions. Instead we are leaning on the JDK's own file locking logic, which has some differing semantics.

In this case, assuming the EINVAL is from the above code, it's happening because the file appears to already be locked by this process and we attempt to lock it again. That could be a race in our code or in the JDK code for file locking. I'm poking around a bit now

@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

If you could, provide a run with -Xbacktrace.style=full passed to JRuby. I don't currently have a handy Windows environment to test again.

@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

Disabled native support on MacOS and confirmed that it's caused by what I suspected:

java.nio.channels.OverlappingFileLockException
	at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
	at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
	at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1062)
	at org.jruby.util.io.PosixShim.lock(PosixShim.java:623)
	at org.jruby.util.io.PosixShim.flock(PosixShim.java:217)
	at org.jruby.util.io.OpenFile$4.run(OpenFile.java:2534)
	at org.jruby.util.io.OpenFile$4.run(OpenFile.java:2531)
	at org.jruby.RubyThread.executeTask(RubyThread.java:1580)
	at org.jruby.util.io.OpenFile.threadFlock(OpenFile.java:2531)
	at org.jruby.RubyFile.flock(RubyFile.java:323)
...

Now the question is whether this is expected JDK behavior or if we are doing something wrong.

@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

Well a modified version of your script that uses the JDK file-locking API directly does not appear to error:

require 'tempfile'
tmpfile = Tempfile.new('')

Thread.abort_on_exception = true

20.times do
  Thread.new do
    mode = File::WRONLY | File::APPEND | File::CREAT
    file = File.new(tmpfile.path, mode: mode)

    500.times do
      lock = file.to_channel.lock(0, java.lang.Long::MAX_VALUE, false)
      file.puts Thread.current.object_id
      lock.release
    end
  end
end
@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

Confirmed also on JDK 11.

@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

Ah, I think I see the problem. In our ChannelFD class, we have a single field for tracking the currently-aqcuired FileLock object. This obviously can be stepped on by multiple threads, and there's no simple way to avoid that.

I'll make it a thread-local value.

@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

Thread-local seems to have fixed it. 👍

@headius headius closed this in 3ff23d7 Oct 16, 2019
@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

This appears to be fixed, or at least the original problem is solved (a race dealing with the FileLock object when doing flock using JDK APIs).

I am seeing some other errors looping the same script, perhaps one out of a dozen runs with 20 threads and 500 writes produces a string of errors that seem to indicate the file is getting deleted prematurely. I see various combinations of these:

Errno::EINVAL: Invalid argument - /var/folders/cq/ylcgmnn556x33f5hsqd0h54h0000gn/T/20191015-1106131243-1q1pamf
       getStackTrace at java/lang/Thread.java:1606
    getBacktraceData at org/jruby/runtime/backtrace/TraceType.java:242
        getBacktrace at org/jruby/runtime/backtrace/TraceType.java:53
    captureBacktrace at org/jruby/RubyException.java:379
            preRaise at org/jruby/exceptions/RaiseException.java:113
            preRaise at org/jruby/exceptions/RaiseException.java:95
              <init> at org/jruby/exceptions/RaiseException.java:61
              <init> at org/jruby/exceptions/Exception.java:38
              <init> at org/jruby/exceptions/StandardError.java:38
              <init> at org/jruby/exceptions/SystemCallError.java:38
  constructThrowable at org/jruby/RubySystemCallError.java:133
         toThrowable at org/jruby/RubyException.java:357
                from at org/jruby/exceptions/RaiseException.java:71
   newRaiseException at org/jruby/Ruby.java:4285
     newErrnoFromInt at org/jruby/Ruby.java:3913
   newErrnoFromErrno at org/jruby/Ruby.java:3923
               flock at org/jruby/RubyFile.java:342

This one happens at the top-level of flock, and does not appear to be related to this bug or its fix. I'm not sure what's causing it, but it might be related to...

IOError: closed stream
       getStackTrace at java/lang/Thread.java:1606
    getBacktraceData at org/jruby/runtime/backtrace/TraceType.java:242
        getBacktrace at org/jruby/runtime/backtrace/TraceType.java:53
    captureBacktrace at org/jruby/RubyException.java:379
            preRaise at org/jruby/exceptions/RaiseException.java:113
            preRaise at org/jruby/exceptions/RaiseException.java:95
              <init> at org/jruby/exceptions/RaiseException.java:61
              <init> at org/jruby/exceptions/Exception.java:38
              <init> at org/jruby/exceptions/StandardError.java:38
              <init> at org/jruby/exceptions/IOError.java:38
  constructThrowable at org/jruby/RubyIOError.java:51
         toThrowable at org/jruby/RubyException.java:357
                from at org/jruby/exceptions/RaiseException.java:71
   newRaiseException at org/jruby/Ruby.java:4285
          newIOError at org/jruby/Ruby.java:4202
         checkClosed at org/jruby/util/io/OpenFile.java:683
  getOpenFileChecked at org/jruby/RubyIO.java:288
               flock at org/jruby/RubyFile.java:314

This bubbles out of flock and indicates that the file has been closed, for some reason.

I'll open a separate issue for this.

@headius

This comment has been minimized.

Copy link
Member

commented Oct 16, 2019

Aha I think I figured out the problem: the script exits.

This logic spins up a number of threads with a number of files, and then does not keep the main script from existing. Upon exit, we actively try to shut down any open IO, in order to make sure they have flushed their contents. This interferes with the still-running threads attempting to flock and puts against those IOs.

If I modify the script as follows, I never see any errors:

require 'tempfile'
tmpfile = Tempfile.new('')

20.times.map do
  Thread.new do
    mode = File::WRONLY | File::APPEND
    file = File.new(tmpfile.path, mode: mode)

    500.times do
      file.flock(File::LOCK_EX)
      file.puts Thread.current.object_id
      file.flock(File::LOCK_UN)
    end
  end
end.each(&:join)
headius added a commit to headius/jruby that referenced this issue Oct 16, 2019
This has come up many times; Ruby threads are daemon threads by
default, which means they will not keep the main script from
exiting. As a result, if there's any threads still runnign when
the main script exits, we may tear down resources they need to
run properly. Even though they're about to die, they may produce
errors on the way out as shown in jruby#5909 (comment)
and other bug reports over the years like jruby#5519, jruby#3316, jruby#3313 and
others.

This is not a fix for the behavior, but it introduces a
non-verbose warning if the JRuby runtime is torn down while
there's still active threads.
@headius headius added this to the JRuby 9.2.9.0 milestone Oct 16, 2019
@meineerde

This comment has been minimized.

Copy link
Author

commented Oct 16, 2019

Thanks for fixing this issue!

Sorry for the missing joins. I was originally running the test script under irb where this issue didn't occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.