Concurrent File#flock calls of the same file results in Errno::EINVAL exception on Windows #5909
Consider the following code:
require 'tempfile' tmpfile = Tempfile.new('') 10.times do Thread.new do mode = File::WRONLY | File::APPEND | File::CREAT file = File.new(tmpfile.path, mode: mode) 50.times do file.flock(File::LOCK_EX) file.puts Thread.current.object_id file.flock(File::LOCK_UN) end end end
Each Thread should write its
This works as expected on both Linux (e.g. on Travis CI) as well as macOS with JRuby and MRI. It also works correctly with all MRI versions between 2.1 - trunk, even on Windows.
With the current versions of JRuby 9.0, 9.1, and 9.2 on Windows (e.g. on AppVeyor), the above code results in the following error in some of the concurrent threads when attempting the
Note: Since this appears to be a race condition, you might have to adjust the number of threads and the inner loop a bit to reliably reproduce this depending on your environment or try the code multiple times.
I'm guessing it's the EINVAL from here:
When on Windows, I believe we are not using a true native
In this case, assuming the EINVAL is from the above code, it's happening because the file appears to already be locked by this process and we attempt to lock it again. That could be a race in our code or in the JDK code for file locking. I'm poking around a bit now
Disabled native support on MacOS and confirmed that it's caused by what I suspected:
Now the question is whether this is expected JDK behavior or if we are doing something wrong.
Well a modified version of your script that uses the JDK file-locking API directly does not appear to error:
require 'tempfile' tmpfile = Tempfile.new('') Thread.abort_on_exception = true 20.times do Thread.new do mode = File::WRONLY | File::APPEND | File::CREAT file = File.new(tmpfile.path, mode: mode) 500.times do lock = file.to_channel.lock(0, java.lang.Long::MAX_VALUE, false) file.puts Thread.current.object_id lock.release end end end
This appears to be fixed, or at least the original problem is solved (a race dealing with the FileLock object when doing flock using JDK APIs).
I am seeing some other errors looping the same script, perhaps one out of a dozen runs with 20 threads and 500 writes produces a string of errors that seem to indicate the file is getting deleted prematurely. I see various combinations of these:
This one happens at the top-level of flock, and does not appear to be related to this bug or its fix. I'm not sure what's causing it, but it might be related to...
This bubbles out of flock and indicates that the file has been closed, for some reason.
I'll open a separate issue for this.
Aha I think I figured out the problem: the script exits.
This logic spins up a number of threads with a number of files, and then does not keep the main script from existing. Upon exit, we actively try to shut down any open IO, in order to make sure they have flushed their contents. This interferes with the still-running threads attempting to flock and puts against those IOs.
If I modify the script as follows, I never see any errors:
require 'tempfile' tmpfile = Tempfile.new('') 20.times.map do Thread.new do mode = File::WRONLY | File::APPEND file = File.new(tmpfile.path, mode: mode) 500.times do file.flock(File::LOCK_EX) file.puts Thread.current.object_id file.flock(File::LOCK_UN) end end end.each(&:join)
This has come up many times; Ruby threads are daemon threads by default, which means they will not keep the main script from exiting. As a result, if there's any threads still runnign when the main script exits, we may tear down resources they need to run properly. Even though they're about to die, they may produce errors on the way out as shown in jruby#5909 (comment) and other bug reports over the years like jruby#5519, jruby#3316, jruby#3313 and others. This is not a fix for the behavior, but it introduces a non-verbose warning if the JRuby runtime is torn down while there's still active threads.