Work around read/close race (x2) #26714
Conversation
IO#close and IO#read across threads don't get along so well: After T1 enters #read and releases the GVL, T2 can call #close on the IO, thereby both closing the fd and freeing the buffer while T1 is using them.
Work around read/close race (x2) (partial backport only)
Partially backported as e6b435d (websocket-client-simple is only used on master) |
* io.c (fptr_finalize): defer freeing buffers not to invalidate the buffer which is in use to read, in the case of read/close race. rails/rails#26714 * thread.c (rb_thread_fd_close): return TRUE if the fd is in use.
I can't reproduce it with simple code. |
@nobu I'm not sure I've run it for a while, and haven't seen the previous segfault. But I have seen some errors like Before, I got a segfault every ~20 runs. With this PR, I ran 1500 times without any problems. Without this PR and with ruby/ruby@b8f16cc, I've seen two (different) missing constant errors and one test failure, in just 300 runs. If it helps, here's a recipe to reproduce the original problem: On Linux (I'm using Ubuntu 16.04 on AWS; 4.4.0-38-generic x86_64) -- I could not reproduce on Mac OS X. I built ruby/ruby@54b8015 (tip of Clone 39fb306
For me, that will segfault and stop at about loop 20-30. |
I've also tried to get a simple script to reproduce it. This crashes in the same area, but it has a different error: Edit: This still crashes with ruby/ruby@54b8015 + a cherry-pick of ruby/ruby@b8f16cc -- gist showing both a read into freed memory, and a read into 0x0 100.times do
60.times.map do |i|
Thread.new do
x = i.to_s
s1, s2 = IO.pipe
t2 = Thread.new { 100.times { s2 << (x * 1000) } }
t = Thread.new { loop { s1.getc } }
Thread.new { sleep 0.1; s1.close }.join
# sleep 0.1; s1.close
begin
t.join
rescue IOError
end
print "."
end
end.map(&:join)
print "\n"
end
Also, I can't reproduce it now |
* io.c (fptr_finalize): defer freeing buffers not to invalidate the buffer which is in use to read, in the case of read/close race. rails/rails#26714 * thread.c (rb_thread_fd_close): return TRUE if the fd is in use.
Thank you, I've updated the branch. |
It caused unexpected |
It is very very not-good if one thread calls
close
on an IO at the same time that another callsread
.The segfaults we've recently seen on Travis appear to be attributable to websocket-client-simple, but our real
close
method seems subject to the race too.Branch is PRed here: shokai/websocket-client-simple#25
valgrind --tool=memcheck --max-stackframe=8382448 --track-origins=yes ruby …
output: https://gist.github.com/matthewd/0d584378aa3c083fa3581c72f15e292aHuge thanks to @tenderlove, who spent a lot of time on this with me -- and without whom I'd still just be staring blankly at Valgrind documentation🍻
I suspect the upstream fix is to wait inside😅
rb_thread_fd_close
until there are no threads whereth->waiting_fd == fd
. But I'll defer to @ko1 and @nobu on that