Segfault in Celluloid::IO specs due to thread safety issue in nio4r #8

Closed
dbussink opened this Issue Oct 22, 2012 · 1 comment

Comments

Projects
None yet
2 participants
@dbussink
Contributor

dbussink commented Oct 22, 2012

The Celluloid::IO specs sometimes crash on Rubinius, which is due to a thread safety issue in nio4r. The problem is that a NIO_Selector is being shut down while it's waiting in epoll_wait concurrently. This causes undefined behaviour because data structures are still used after cleanup. This often manifests as corruption in Rubinius which subsequently crashes in different ways.

Rubinius doesn't enable the C-API lock anymore by default, but even with the C-API lock this is still a problem since the epoll_wait() happens in a rb_thread_block_region.

Here is a gdb log of the problem, showing the different threads in the state that is problematic here:

(gdb) info threads
  Id   Target Id         Frame 
  13   Thread 0x7fffee2f9700 (LWP 18226) "rbx" 0x00007ffff6aa9a43 in epoll_wait () from /usr/lib/libc.so.6
  8    Thread 0x7fffee90a700 (LWP 18221) "rbx" 0x00007ffff7284954 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  7    Thread 0x7fffeed0b700 (LWP 18220) "rbx" 0x00007ffff7284954 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  6    Thread 0x7fffef5ce700 (LWP 18219) "rbx" 0x00007ffff7284954 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  5    Thread 0x7ffff57f4700 (LWP 18218) "rbx" 0x00007ffff6aa3073 in select () from /usr/lib/libc.so.6
* 1    Thread 0x7ffff7fed740 (LWP 18209) "rbx" NIO_Selector_shutdown (selector=0x2a24bb0) at selector.c:124
(gdb)

As you can see here, thread 13 is in epoll_wait() but thread 1 is cleaning up the selector. This is the (partial) C++ and the Ruby backtrace for thread 1:

(gdb) bt 
#0  NIO_Selector_shutdown (selector=0x2a2acd0) at selector.c:124
#1  0x00007fffee2ff1bf in NIO_Selector_close (self=33386032) at selector.c:405
#2  0x000000000083f7ed in rubinius::ZeroArguments::invoke (state=0x17ef8a0, nm=0x7ffff694a048, env=0x1846160, args=...) at vm/builtin/nativemethod.cpp:226
#3  0x000000000084276b in rubinius::NativeMethod::executor_implementation (state=0x17ef8a0, call_frame=0x7ffffffde550, exec=0x7ffff694a048, mod=0x283afe0, args=...) at vm/builtin/nativemethod.cpp:706
#4  0x00000000006a0bc7 in rubinius::Dispatch::send (this=0x7ffffffdd440, state=0x17ef8a0, call_frame=0x7ffffffde550, lookup=..., args=..., reason=rubinius::eNormal) at vm/dispatch.cpp:42
#5  0x0000000000846e33 in rubinius::Object::send_prim (this=0x7ffff66390f8, state=0x17ef8a0, call_frame=0x7ffffffde550, exec=0x7ffff5ebbca0, mod=0x18333b0, args=..., min_visibility=0xac6) at vm/builtin/object.cpp:562
#6  0x0000000000846ea1 in rubinius::Object::private_send_prim (this=0x7ffff66390f8, state=0x17ef8a0, call_frame=0x7ffffffde550, exec=0x7ffff5ebbca0, mod=0x18333b0, args=...) at vm/builtin/object.cpp:566
#7  0x0000000000764cb4 in rubinius::Primitives::object_send (state=0x17ef8a0, call_frame=0x7ffffffde550, exec=0x7ffff5ebbca0, mod=0x18333b0, args=...) at vm/gen/method_primitives.cpp:10876
#8  0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x17ef8a0, cache=0x2a1aee8, call_frame=0x7ffffffde550, args=...) at vm/inline_cache.cpp:592
#9  0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x2a1aee8, state=0x17ef8a0, call_frame=0x7ffffffde550, args=...) at vm/inline_cache.hpp:197
#10 0x00000000006c1f1d in rubinius::MachineCode::interpreter (state=0x17ef8a0, mcode=0x2a1a740, call_frame=0x7ffffffde550) at ./vm/gen/instruction_implementations.hpp:571
#11 0x00000000007375f7 in rubinius::MachineCode::execute_specialized (state=0x17ef8a0, previous=0x7ffffffdf730, exec=0x7ffff68dafb8, mod=0x282ea50, args=...) at vm/machine_code.cpp:630
#12 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x17ef8a0, cache=0x2a073a0, call_frame=0x7ffffffdf730, args=...) at vm/inline_cache.cpp:592
#13 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x2a073a0, state=0x17ef8a0, call_frame=0x7ffffffdf730, args=...) at vm/inline_cache.hpp:197
#14 0x00000000006c1b35 in rubinius::MachineCode::interpreter (state=0x17ef8a0, mcode=0x2a071d0, call_frame=0x7ffffffdf730) at ./vm/gen/instruction_implementations.hpp:524
#15 0x0000000000737170 in rubinius::MachineCode::execute_specialized (state=0x17ef8a0, previous=0x7ffffffe08c0, exec=0x7ffff68b4688, mod=0x28151d0, args=...) at vm/machine_code.cpp:630
#16 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x17ef8a0, cache=0x2a19d38, call_frame=0x7ffffffe08c0, args=...) at vm/inline_cache.cpp:592
#17 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x2a19d38, state=0x17ef8a0, call_frame=0x7ffffffe08c0, args=...) at vm/inline_cache.hpp:197
#18 0x00000000006c1b35 in rubinius::MachineCode::interpreter (state=0x17ef8a0, mcode=0x2a19760, call_frame=0x7ffffffe08c0) at ./vm/gen/instruction_implementations.hpp:524
#19 0x0000000000737a7e in rubinius::MachineCode::execute_specialized (state=0x17ef8a0, previous=0x7ffffffe1aa0, exec=0x7ffff4e31fc0, mod=0x284fd70, args=...) at vm/machine_code.cpp:630
#20 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x17ef8a0, cache=0x29d7968, call_frame=0x7ffffffe1aa0, args=...) at vm/inline_cache.cpp:592
#21 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x29d7968, state=0x17ef8a0, call_frame=0x7ffffffe1aa0, args=...) at vm/inline_cache.hpp:197
#22 0x00000000006c1b35 in rubinius::MachineCode::interpreter (state=0x17ef8a0, mcode=0x29d7110, call_frame=0x7ffffffe1aa0) at ./vm/gen/instruction_implementations.hpp:524
#23 0x00000000007f0386 in rubinius::BlockEnvironment::execute_interpreter (state=0x17ef8a0, previous=0x7ffffffe2e10, env=0x7ffff6945120, args=..., invocation=...) at vm/builtin/block_environment.cpp:340
#24 0x00000000007efe7a in rubinius::BlockEnvironment::invoke (state=0x17ef8a0, previous=0x7ffffffe2e10, env=0x7ffff6945120, args=..., invocation=...) at vm/builtin/block_environment.cpp:82
#25 0x00000000007f06ef in rubinius::BlockEnvironment::call_under (this=0x7ffff6945120, state=0x17ef8a0, call_frame=0x7ffffffe2e10, exec=0x7ffff5ebfed8, mod=0x18448e0, args=...) at vm/builtin/block_environment.cpp:396
#26 0x0000000000744816 in rubinius::Primitives::block_call_under (state=0x17ef8a0, call_frame=0x7ffffffe2e10, exec=0x7ffff5ebfed8, mod=0x18448e0, args=...) at vm/gen/method_primitives.cpp:2205
#27 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x17ef8a0, cache=0x1c56550, call_frame=0x7ffffffe2e10, args=...) at vm/inline_cache.cpp:592
#28 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x1c56550, state=0x17ef8a0, call_frame=0x7ffffffe2e10, args=...) at vm/inline_cache.hpp:197
#29 0x00000000006c1b35 in rubinius::MachineCode::interpreter (state=0x17ef8a0, mcode=0x1c551b0, call_frame=0x7ffffffe2e10) at ./vm/gen/instruction_implementations.hpp:524
#30 0x0000000000736ce9 in rubinius::MachineCode::execute_specialized (state=0x17ef8a0, previous=0x7ffffffe4130, exec=0x7ffff5fe6288, mod=0x18333b0, args=...) at vm/machine_code.cpp:630
#31 0x00000000006b9b68 in rubinius::InlineCache::empty_cache (state=0x17ef8a0, cache=0x29fab00, call_frame=0x7ffffffe4130, args=...) at vm/inline_cache.cpp:390
#32 0x00000000006bb9f4 in rubinius::InlineCache::initialize (this=0x29fab00, state=0x17ef8a0, call_frame=0x7ffffffe4130, args=...) at vm/inline_cache.hpp:201
#33 0x00000000006baaee in rubinius::InlineCache::check_cache_poly (state=0x17ef8a0, cache=0x29fab00, call_frame=0x7ffffffe4130, args=...) at vm/inline_cache.cpp:721
#34 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x29fab00, state=0x17ef8a0, call_frame=0x7ffffffe4130, args=...) at vm/inline_cache.hpp:197
0x7ffffffde550: Celluloid::IO::Reactor#shutdown in /home/dirkjan/rubinius/lib/19/forwardable.rb:182 (+34)
0x7ffffffdf730: Celluloid::IO::Mailbox#shutdown in /home/dirkjan/celluloid-io/lib/celluloid/io/mailbox.rb:75 (+5)
0x7ffffffe08c0: Celluloid::Actor.kill in /home/dirkjan/rubinius/gems/1.9/bundler/gems/celluloid-092c27e7676a/lib/celluloid/actor.rb:152 (+24)
0x7ffffffe1aa0: __block__ in /home/dirkjan/rubinius/gems/1.9/bundler/gems/celluloid-092c27e7676a/spec/support/actor_examples.rb:13 (+52)
0x7ffffffe2e10: BasicObject#instance_eval in kernel/common/eval19.rb:45 (+118)
0x7ffffffe4130: __block__ in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example.rb:113 (+36)
0x7ffffffe53a0: RSpec::Core::Example#with_around_each_hooks in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example.rb:253 (+21)
0x7ffffffe6590: RSpec::Core::Example#run in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example.rb:0 (+53)
0x7ffffffe7870: __block__ in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:378 (+41)
0x7ffffffe8ae0: Array#map in kernel/bootstrap/array19.rb:18 (+100)
0x7ffffffe9db0: RSpec::Core::ExampleGroup.run_examples in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:374 (+11)
0x7ffffffeb010: RSpec::Core::ExampleGroup.run in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:360 (+68)
0x7ffffffec2d0: __block__ in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:361 (+8)
0x7ffffffed520: Array#map in kernel/bootstrap/array19.rb:18 (+100)
0x7ffffffee7e0: RSpec::Core::ExampleGroup.run in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:361 (+82)
0x7ffffffefa10: __block__ in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:361 (+8)
0x7fffffff0c60: Array#map in kernel/bootstrap/array19.rb:18 (+100)
0x7fffffff1e90: RSpec::Core::ExampleGroup.run in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/example_group.rb:361 (+82)
0x7fffffff31a0: __block__ in /home/dirkjan/rubinius/gems/1.9/gems/rspec-core-2.11.1/lib/rspec/core/command_line.rb:28 (+8)
0x7fffffff43f0: Array#map in kernel/bootstrap/array19.rb:18 (+100)

For thread 13 these are the backtraces. You can also see that this is exactly the same selector object:

(gdb) thread 13
[Switching to thread 13 (Thread 0x7fffee2f9700 (LWP 18268))]
(gdb) frame 7
#7  NIO_Selector_select_synchronized (args=33385360) at selector.c:298
298     ready = NIO_Selector_run(selector, timeout);
(gdb) p selector
$1 = (struct NIO_Selector *) 0x2a2acd0

C++ backtrace:

#0  0x00007ffff6aa9a43 in epoll_wait () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff6aa9a43 in epoll_wait () from /usr/lib/libc.so.6
#1  0x00007fffee300802 in epoll_poll (loop=0x2a2ad60, timeout=) at ../libev/ev_epoll.c:153
#2  0x00007fffee2ffda2 in ev_backend_poll (ptr=) at ../libev/ev.c:2945
#3  0x00000000009357f3 in rb_thread_blocking_region (func=0x7fffee2ffd90 , data=0x7fffee2efd30, ubf=0xffffffffffffffff, ubf_data=0x0) at vm/capi/thread.cpp:135
#4  0x00007fffee302a74 in ev_run (loop=0x2a2ad60, flags=flags@entry=2) at ../libev/ev.c:3120
#5  0x00007fffee2fecd5 in ev_loop (loop=, flags=2) at ../libev/ev.h:820
#6  NIO_Selector_run (timeout=, selector=0x2a2acd0) at selector.c:338
#7  NIO_Selector_select_synchronized (args=33385360) at selector.c:298
#8  0x0000000000920bf2 in rb_ensure (func=0x7fffee2fec40 , arg1=33385360, ensure_func=0x7fffee2fe9e0 , arg2=33386032) at vm/capi/kernel.cpp:141
#9  0x00007fffee2ff2e2 in NIO_Selector_select (argc=1, argv=0x7fffee2eff60, self=33386032) at selector.c:280
#10 0x000000000083fc92 in rubinius::GenericArguments::invoke (state=0x7fffee2f8e40, nm=0x7ffff694a138, env=0x7fffd8005310, args=...) at vm/builtin/nativemethod.cpp:302
#11 0x0000000000843f4b in rubinius::NativeMethod::executor_implementation (state=0x7fffee2f8e40, call_frame=0x7fffee2f1790, exec=0x7ffff694a138, mod=0x283afe0, args=...) at vm/builtin/nativemethod.cpp:706
#12 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x7fffee2f8e40, cache=0x7fffd8002ab0, call_frame=0x7fffee2f1790, args=...) at vm/inline_cache.cpp:592
#13 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x7fffd8002ab0, state=0x7fffee2f8e40, call_frame=0x7fffee2f1790, args=...) at vm/inline_cache.hpp:197
#14 0x00000000006c1cdd in rubinius::MachineCode::interpreter (state=0x7fffee2f8e40, mcode=0x7fffd8002860, call_frame=0x7fffee2f1790) at ./vm/gen/instruction_implementations.hpp:542
#15 0x0000000000736ce9 in rubinius::MachineCode::execute_specialized (state=0x7fffee2f8e40, previous=0x7fffee2f2950, exec=0x7ffff68b4000, mod=0x282ea50, args=...) at vm/machine_code.cpp:630
#16 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x7fffee2f8e40, cache=0x7fffd8001888, call_frame=0x7fffee2f2950, args=...) at vm/inline_cache.cpp:592
#17 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x7fffd8001888, state=0x7fffee2f8e40, call_frame=0x7fffee2f2950, args=...) at vm/inline_cache.hpp:197
#18 0x00000000006c1b35 in rubinius::MachineCode::interpreter (state=0x7fffee2f8e40, mcode=0x7fffd8000ba0, call_frame=0x7fffee2f2950) at ./vm/gen/instruction_implementations.hpp:524
#19 0x0000000000736ce9 in rubinius::MachineCode::execute_specialized (state=0x7fffee2f8e40, previous=0x7fffee2f3b60, exec=0x7ffff68b44f8, mod=0x28151d0, args=...) at vm/machine_code.cpp:630
#20 0x00000000006ba536 in rubinius::InlineCache::check_cache_reference (state=0x7fffee2f8e40, cache=0x7fffe0004018, call_frame=0x7fffee2f3b60, args=...) at vm/inline_cache.cpp:592
#21 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x7fffe0004018, state=0x7fffee2f8e40, call_frame=0x7fffee2f3b60, args=...) at vm/inline_cache.hpp:197
#22 0x00000000006c1b35 in rubinius::MachineCode::interpreter (state=0x7fffee2f8e40, mcode=0x7fffe0003510, call_frame=0x7fffee2f3b60) at ./vm/gen/instruction_implementations.hpp:524
#23 0x0000000000737170 in rubinius::MachineCode::execute_specialized (state=0x7fffee2f8e40, previous=0x7fffee2f4df0, exec=0x7ffff4e360f8, mod=0x284f670, args=...) at vm/machine_code.cpp:630
#24 0x00000000006b9f86 in rubinius::InlineCache::empty_cache_vcall (state=0x7fffee2f8e40, cache=0x2827aa0, call_frame=0x7fffee2f4df0, args=...) at vm/inline_cache.cpp:472
#25 0x00000000006bb9f4 in rubinius::InlineCache::initialize (this=0x2827aa0, state=0x7fffee2f8e40, call_frame=0x7fffee2f4df0, args=...) at vm/inline_cache.hpp:201
#26 0x00000000006ba631 in rubinius::InlineCache::check_cache (state=0x7fffee2f8e40, cache=0x2827aa0, call_frame=0x7fffee2f4df0, args=...) at vm/inline_cache.cpp:615
#27 0x00000000006eb1ec in rubinius::InlineCache::execute (this=0x2827aa0, state=0x7fffee2f8e40, call_frame=0x7fffee2f4df0, args=...) at vm/inline_cache.hpp:197
#28 0x00000000006c19af in rubinius::MachineCode::interpreter (state=0x7fffee2f8e40, mcode=0x280f4b0, call_frame=0x7fffee2f4df0) at ./vm/gen/instruction_implementations.hpp:507
#29 0x00000000007f0386 in rubinius::BlockEnvironment::execute_interpreter (state=0x7fffee2f8e40, previous=0x7fffee2f6040, env=0x7ffff663a6a0, args=..., invocation=...) at vm/builtin/block_environment.cpp:340
#30 0x00000000007efe7a in rubinius::BlockEnvironment::invoke (state=0x7fffee2f8e40, previous=0x7fffee2f6040, env=0x7ffff663a6a0, args=..., invocation=...) at vm/builtin/block_environment.cpp:82
#31 0x00000000007f0444 in rubinius::BlockEnvironment::call (this=0x7ffff663a6a0, state=0x7fffee2f8e40, call_frame=0x7fffee2f6040, args=..., flags=0) at vm/builtin/block_environment.cpp:351
#32 0x00000000006c2e3a in rubinius::MachineCode::interpreter (state=0x7fffee2f8e40, mcode=0x27e9b80, call_frame=0x7fffee2f6040) at ./vm/gen/instruction_implementations.hpp:760
#33 0x00000000007f0386 in rubinius::BlockEnvironment::execute_interpreter (state=0x7fffee2f8e40, previous=0x7fffee2f74c0, env=0x7ffff663aa38, args=..., invocation=...) at vm/builtin/block_environment.cpp:340
#34 0x00000000007efe7a in rubinius::BlockEnvironment::invoke (state=0x7fffee2f8e40, previous=0x7fffee2f74c0, env=0x7ffff663aa38, args=..., invocation=...) at vm/builtin/block_environment.cpp:82
#35 0x00000000007f0444 in rubinius::BlockEnvironment::call (this=0x7ffff663aa38, state=0x7fffee2f8e40, call_frame=0x7fffee2f74c0, args=..., flags=0) at vm/builtin/block_environment.cpp:351

Ruby backtrace:

0x7fffee2f1790: Celluloid::IO::Reactor#run_once in /home/dirkjan/celluloid-io/lib/celluloid/io/reactor.rb:49 (+17)
0x7fffee2f2950: Celluloid::IO::Mailbox#receive in /home/dirkjan/celluloid-io/lib/celluloid/io/mailbox.rb:53 (+95)
0x7fffee2f3b60: Celluloid::Actor#run in /home/dirkjan/rubinius/gems/1.9/bundler/gems/celluloid-092c27e7676a/lib/celluloid/actor.rb:194 (+26)
0x7fffee2f4df0: __block__ in /home/dirkjan/rubinius/gems/1.9/bundler/gems/celluloid-092c27e7676a/lib/celluloid/actor.rb:184 (+38)
0x7fffee2f6040: __block__ in /home/dirkjan/rubinius/gems/1.9/bundler/gems/celluloid-092c27e7676a/lib/celluloid/thread_handle.rb:12 (+9)
0x7fffee2f74c0: __block__ in /home/dirkjan/rubinius/gems/1.9/bundler/gems/celluloid-092c27e7676a/lib/celluloid/internal_pool.rb:48 (+22)
0x7fffee2f8960: Thread#__run__ in kernel/bootstrap/thread19.rb:42 (+38)
@dbussink

This comment has been minimized.

Show comment Hide comment
@dbussink

dbussink Oct 22, 2012

Contributor

This might be related to #7

Contributor

dbussink commented Oct 22, 2012

This might be related to #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment