Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc-common/cpp/helloworld segfaults/hits assertion #946

Closed
mkustermann opened this issue Mar 4, 2015 · 7 comments
Closed

grpc-common/cpp/helloworld segfaults/hits assertion #946

mkustermann opened this issue Mar 4, 2015 · 7 comments
Assignees
Milestone

Comments

@mkustermann
Copy link

Using grpc at 1acbf43 and running the C++ grpc-common/cpp/helloworld example segfaults when hitting it very hard at roughly 15k requests/second.

$ gdb --args ./greetings_server
(gdb) r
...
Server listening on 0.0.0.0:50051
E0304 17:34:00.805358617   30051 sync_posix.c:55] assertion failed: result == 0
(gdb) bt
#0  0x00007ffff696dcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff69710d8 in __GI_abort () at abort.c:89
#2  0x00007ffff62fbfa8 in gpr_mu_lock (mu=mu@entry=0x7fffb00bdc28) at src/core/support/sync_posix.c:55
#3  0x00007ffff79d9716 in lock (call=0x7fffb00bdc10) at src/core/surface/call.c:360
#4  unlock (call=call@entry=0x7fffb00bdc10) at src/core/surface/call.c:406
#5  0x00007ffff79d9b6d in grpc_call_start_ioreq_and_call_back (call=0x7fffb00bdc10, reqs=reqs@entry=0x7fffe65c6af0, 
    nreqs=nreqs@entry=2, on_complete=on_complete@entry=0x7ffff79dca20 <publish_registered_or_batch>, user_data=0x6143f0)
    at src/core/surface/call.c:809
#6  0x00007ffff79dd722 in begin_call (calld=calld@entry=0x7fffb00be0e8, rc=rc@entry=0x7fffe65c6b80, server=0x6142e0)
    at src/core/surface/server.c:1073
#7  0x00007ffff79dd8aa in queue_call_request (server=server@entry=0x6142e0, rc=rc@entry=0x7fffe65c6b80)
    at src/core/surface/server.c:950
#8  0x00007ffff79dea45 in grpc_server_request_registered_call (server=0x6142e0, rm=0x611dd0, call=0x614410, deadline=0x614418, 
    initial_metadata=0x614428, optional_payload=0x614440, cq_bind=0x7fff6c00cbd0, tag=0x6143f0) at src/core/surface/server.c:990
#9  0x00007ffff776b1a2 in Request (server=0x6142e0, this=0x6143f0) at src/cpp/server/server.cc:82
#10 grpc::Server::RunRpc (this=0x619530) at src/cpp/server/server.cc:392
#11 0x00007ffff776d641 in operator() (this=0x7fffe65c6e50) at /usr/include/c++/4.8/functional:2464
#12 operator() (__closure=0x613140) at src/cpp/server/thread_pool.cc:54
#13 _M_invoke<> (this=0x613140) at /usr/include/c++/4.8/functional:1732
#14 operator() (this=0x613140) at /usr/include/c++/4.8/functional:1720
#15 std::thread::_Impl<std::_Bind_simple<grpc::ThreadPool::ThreadPool(int)::__lambda0()> >::_M_run(void) (this=0x613128)
    at /usr/include/c++/4.8/thread:115
#16 0x00007ffff6fc3bf0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff6508182 in start_thread (arg=0x7fffe65c7700) at pthread_create.c:312
#18 0x00007ffff6a3147d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I've added a small printf() statement to get the return value of pthread_mutex_lock and it turns out to be 22 which means EINVAL.

@ctiller ctiller self-assigned this Mar 4, 2015
@ctiller
Copy link
Member

ctiller commented Mar 4, 2015

Do you happen to have a client that reproduces this available? (It'd save
me a little time)

My guess is that we're hitting an already deleted call, which would
probably be a bug in the core library. I'm going to run it under ASAN (make
CONFIG=asan buildtests_c) and see if anything shows up.

On Wed, Mar 4, 2015 at 8:40 AM Martin Kustermann notifications@github.com
wrote:

Using grpc at 1acbf43
1acbf43
and running the C++ grpc-common/cpp/helloworld example segfaults when
hitting it very hard at roughly 15k requests/second.

$ gdb --args ./greetings_server
(gdb) r
...
Server listening on 0.0.0.0:50051
E0304 17:34:00.805358617 30051 sync_posix.c:55] assertion failed: result == 0
(gdb) bt
#0 0x00007ffff696dcc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff69710d8 in __GI_abort () at abort.c:89
#2 0x00007ffff62fbfa8 in gpr_mu_lock (mu=mu@entry=0x7fffb00bdc28) at src/core/support/sync_posix.c:55
#3 0x00007ffff79d9716 in lock (call=0x7fffb00bdc10) at src/core/surface/call.c:360
#4 unlock (call=call@entry=0x7fffb00bdc10) at src/core/surface/call.c:406
#5 0x00007ffff79d9b6d in grpc_call_start_ioreq_and_call_back (call=0x7fffb00bdc10, reqs=reqs@entry=0x7fffe65c6af0,
nreqs=nreqs@entry=2, on_complete=on_complete@entry=0x7ffff79dca20 <publish_registered_or_batch>, user_data=0x6143f0)
at src/core/surface/call.c:809
#6 0x00007ffff79dd722 in begin_call (calld=calld@entry=0x7fffb00be0e8, rc=rc@entry=0x7fffe65c6b80, server=0x6142e0)
at src/core/surface/server.c:1073
#7 0x00007ffff79dd8aa in queue_call_request (server=server@entry=0x6142e0, rc=rc@entry=0x7fffe65c6b80)
at src/core/surface/server.c:950
#8 0x00007ffff79dea45 in grpc_server_request_registered_call (server=0x6142e0, rm=0x611dd0, call=0x614410, deadline=0x614418,
initial_metadata=0x614428, optional_payload=0x614440, cq_bind=0x7fff6c00cbd0, tag=0x6143f0) at src/core/surface/server.c:990
#9 0x00007ffff776b1a2 in Request (server=0x6142e0, this=0x6143f0) at src/cpp/server/server.cc:82
#10 grpc::Server::RunRpc (this=0x619530) at src/cpp/server/server.cc:392
#11 0x00007ffff776d641 in operator() (this=0x7fffe65c6e50) at /usr/include/c++/4.8/functional:2464
#12 operator() (__closure=0x613140) at src/cpp/server/thread_pool.cc:54
#13 _M_invoke<> (this=0x613140) at /usr/include/c++/4.8/functional:1732
#14 operator() (this=0x613140) at /usr/include/c++/4.8/functional:1720
#15 std::thread::_Implstd::_Bind_simple<grpc::ThreadPool::ThreadPool(int)::__lambda0() >::_M_run(void) (this=0x613128)
at /usr/include/c++/4.8/thread:115
#16 0x00007ffff6fc3bf0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff6508182 in start_thread (arg=0x7fffe65c7700) at pthread_create.c:312
#18 0x00007ffff6a3147d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I've added a small printf() statement to get the return value of
pthread_mutex_lock and it turns out to be 22 which means EINVAL.


Reply to this email directly or view it on GitHub
#946.

@vjpai vjpai assigned vjpai and unassigned ctiller Mar 4, 2015
@mkustermann
Copy link
Author

I've sent you a reproduction of the bug via email.

@vjpai
Copy link
Member

vjpai commented Mar 5, 2015

I'll be looking at it now.

@vjpai
Copy link
Member

vjpai commented Mar 5, 2015

Ok, so I've seen that your custom HTTP/2 client seems to wrap the HTTP/2 stream ID after 24-bits. This is non-compliant with the spec (as the stream ID is supposed to keep increasing), but we need to be checking for it also in the gRPC libary. I've created a separate issue #957 for us to deal with that problem. I believe that recent changes in HEAD plus the changes that will be needed for #957 should solve the current problem, but I will keep checking it out. Thanks for sending your code.

@mikepb
Copy link

mikepb commented Jul 23, 2015

I have also run into this issue using the iOS binding. The bug is reproducible on the latest master after attempting multiple connections to the RPC server within 30 seconds.

#2  0x000000019411eb18 in abort ()
#3  0x000000010016db94 in gpr_mu_lock at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/support/sync_posix.c:48
#4  0x000000010015bb68 in grpc_pollset_add_fd at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/iomgr/pollset_posix.c:107
#5  0x000000010015ccc8 in grpc_pollset_set_add_fd at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/iomgr/pollset_set_posix.c:105
#6  0x000000010016e0d0 in grpc_tcp_client_connect at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/iomgr/tcp_client_posix.c:243
#7  0x0000000100143b4c in connector_connect at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/surface/channel_create.c:99
#8  0x000000010014a8c8 in grpc_connector_connect at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/client_config/connector.c:48
#9  0x000000010016d470 in continue_connect at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/client_config/subchannel.c:298
#10 0x000000010016cbfc in start_connect at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/client_config/subchannel.c:307
#11 0x000000010016ca20 in grpc_subchannel_create_call at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/client_config/subchannel.c:349
#12 0x000000010014946c in picked_target at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/channel/client_channel.c:232
#13 0x000000010015a16c in pf_pick at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/client_config/lb_policies/pick_first.c:109
#14 0x0000000100154e88 in grpc_lb_policy_pick at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/client_config/lb_policy.c:74
#15 0x0000000100149254 in pick_target at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/channel/client_channel.c:250
#16 0x0000000100148bf4 in perform_transport_stream_op at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/channel/client_channel.c:366
#17 0x0000000100147f18 in cc_start_transport_stream_op at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/channel/client_channel.c:392
#18 0x00000001001422b0 in execute_op at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/surface/call.c:1163
#19 0x000000010013f31c in unlock at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/surface/call.c:524
#20 0x000000010013f714 in grpc_call_start_ioreq_and_call_back at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/surface/call.c:1083
#21 0x00000001001402fc in grpc_call_start_batch at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/core/surface/call.c:1471
#22 0x0000000100136584 in -[GRPCWrappedCall startBatchWithOperations:errorHandler:] at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/objective-c/GRPCClient/private/GRPCWrappedCall.m:269
#23 0x000000010013626c in -[GRPCWrappedCall startBatchWithOperations:] at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/objective-c/GRPCClient/private/GRPCWrappedCall.m:259
#24 0x00000001001309c8 in -[GRPCCall sendHeaders:] at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/objective-c/GRPCClient/GRPCCall.m:231
#25 0x0000000100131f5c in -[GRPCCall startWithWriteable:] at /Users/mikepb/Developer/M2Demo/Pods/gRPC/src/objective-c/GRPCClient/GRPCCall.m:344

@vjpai
Copy link
Member

vjpai commented Aug 6, 2015

I believe that the initial reported bug is now working. Issue #957 was addressed in the code, but further tests are necessary and have been pushed to GA. I recommend closing this issue and reopening a new issue with the problem reported by @mikepb as I don't think that they are the same codepath at all.

Is everyone ok with that?

@mikepb
Copy link

mikepb commented Aug 6, 2015

Sounds great. I found that using the gRPC client to connect to a http:// server that does not implement gRPC reliably triggers the bug.

@vjpai vjpai closed this as completed Aug 8, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Oct 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants