Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bm_chttp2_transport --benchmark_filter=BM_TransportStreamSend/134217728$ GRPC_POLL_STRATEGY=poll #18416

Closed
markdroth opened this issue Mar 18, 2019 · 11 comments · Fixed by #19342
Assignees
Labels
disposition/BUILDNURSE For all buildnurse related build/test failures and flakes kind/bug lang/c++ priority/P1

Comments

@markdroth
Copy link
Member

https://source.cloud.google.com/results/invocations/17e1f1a3-a281-468a-9dd7-2c803fb25266/targets/github%2Fgrpc%2Fcpp_linux_dbg_native/tests

It's not clear what the actual failure was, since there aren't any error messages in the log:

D0315 20:32:46.716832373   18963 ev_posix.cc:169]            Using polling engine: poll
D0315 20:32:46.717027089   18963 dns_resolver_ares.cc:488]   Using ares dns resolver
Run on (32 X 2300 MHz CPU s)
2019-03-15 20:32:46
***WARNING*** Library was built as DEBUG. Timings may be affected.
@jtattermusch
Copy link
Contributor

Seen the same issue on run_tests.py stack:
https://source.cloud.google.com/results/invocations/a682e6f6-87bc-4a76-a856-d680dfc4d7bd/targets/github%2Fgrpc%2Fcpp_linux_tsan_native_x64_clang7.0/tests

2019-04-05 04:26:41
Running bins/tsan/bm_chttp2_transport
Run on (32 X 2600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x16)
  L1 Instruction 32K (x16)
  L2 Unified 256K (x16)
  L3 Unified 20480K (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
==================
WARNING: ThreadSanitizer: data race (pid=31556)
  Write of size 8 at 0x7b7c00000008 by main thread:
    #0 free /tmp/clang-build/src/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:709:3 (bm_chttp2_transport+0x548694)
    #1 gpr_free /var/local/git/grpc/src/core/lib/gpr/alloc.cc:77:3 (bm_chttp2_transport+0x7263c4)
    #2 Stream::~Stream() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:201:5 (bm_chttp2_transport+0x5bbc2a)
    #3 std::default_delete<Stream>::operator()(Stream*) const /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/unique_ptr.h:76:2 (bm_chttp2_transport+0x5c43f6)
    #4 std::unique_ptr<Stream, std::default_delete<Stream> >::reset(Stream*) /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/unique_ptr.h:344:4 (bm_chttp2_transport+0x5bf88d)
    #5 BM_TransportStreamSend(benchmark::State&) /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:456:5 (bm_chttp2_transport+0x5b4b04)
    #6 benchmark::internal::FunctionBenchmark::Run(benchmark::State&) /var/local/git/grpc/third_party/benchmark/src/benchmark_register.cc:453:42 (bm_chttp2_transport+0x5ebd46)
    #7 benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::Benchmark::Instance const*, unsigned long, int, benchmark::internal::ThreadManager*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:166:17 (bm_chttp2_transport+0x5d0bec)
    #8 benchmark::internal::(anonymous namespace)::RunBenchmark(benchmark::internal::Benchmark::Instance const&, std::vector<benchmark::BenchmarkReporter::Run, std::allocator<benchmark::BenchmarkReporter::Run> >*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:210:7 (bm_chttp2_transport+0x5cfb84)
    #9 benchmark::internal::(anonymous namespace)::RunBenchmarks(std::vector<benchmark::internal::Benchmark::Instance, std::allocator<benchmark::internal::Benchmark::Instance> > const&, benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:430:11 (bm_chttp2_transport+0x5ce6f5)
    #10 benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:544:5 (bm_chttp2_transport+0x5cddda)
    #11 benchmark::RunSpecifiedBenchmarks() /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:486:10 (bm_chttp2_transport+0x5cd811)
    #12 benchmark::RunTheBenchmarksNamespaced() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:641:37 (bm_chttp2_transport+0x5b5955)
    #13 main /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:647:3 (bm_chttp2_transport+0x5b5a13)
  Previous read of size 8 at 0x7b7c00000008 by thread T1:
    #0 grpc_chttp2_stream::~grpc_chttp2_stream() /var/local/git/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:692:7 (bm_chttp2_transport+0x6516c9)
    #1 destroy_stream_locked(void*, grpc_error*) /var/local/git/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:754:6 (bm_chttp2_transport+0x664cd1)
    #2 grpc_combiner_continue_exec_ctx() /var/local/git/grpc/src/core/lib/iomgr/combiner.cc:268:5 (bm_chttp2_transport+0x6a31c2)
    #3 grpc_core::ExecCtx::Flush() /var/local/git/grpc/src/core/lib/iomgr/exec_ctx.cc:151:17 (bm_chttp2_transport+0x630cba)
    #4 grpc_core::Executor::RunClosures(char const*, grpc_closure_list) /var/local/git/grpc/src/core/lib/iomgr/executor.cc:138:32 (bm_chttp2_transport+0x6a58e0)
    #5 grpc_core::Executor::ThreadMain(void*) /var/local/git/grpc/src/core/lib/iomgr/executor.cc:258:22 (bm_chttp2_transport+0x6a5fe8)
    #6 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::operator()(void*) const /var/local/git/grpc/src/core/lib/gprpp/thd_posix.cc:114:27 (bm_chttp2_transport+0x72f605)
    #7 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::__invoke(void*) /var/local/git/grpc/src/core/lib/gprpp/thd_posix.cc:84:25 (bm_chttp2_transport+0x72f488)
  Thread T1 'default-executo' (tid=31564, running) created by main thread at:
    #0 pthread_create /tmp/clang-build/src/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992:3 (bm_chttp2_transport+0x549d25)
    #1 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&) /var/local/git/grpc/src/core/lib/gprpp/thd_posix.cc:83:10 (bm_chttp2_transport+0x72f179)
    #2 grpc_core::(anonymous namespace)::ThreadInternalsPosix* grpc_core::New<grpc_core::(anonymous namespace)::ThreadInternalsPosix, char const*&, void (*&)(void*), void*&, bool*, grpc_core::Thread::Options const&>(char const*&, void (*&)(void*), void*&, bool*&&, grpc_core::Thread::Options const&) /var/local/git/grpc/./src/core/lib/gprpp/memory.h:47:18 (bm_chttp2_transport+0x72ed15)
    #3 grpc_core::Thread::Thread(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&) /var/local/git/grpc/src/core/lib/gprpp/thd_posix.cc:160:11 (bm_chttp2_transport+0x72eae7)
    #4 grpc_core::Executor::SetThreading(bool) /var/local/git/grpc/src/core/lib/iomgr/executor.cc:174:9 (bm_chttp2_transport+0x6a50fa)
    #5 grpc_core::Executor::Init() /var/local/git/grpc/src/core/lib/iomgr/executor.cc:109:25 (bm_chttp2_transport+0x6a4c2d)
    #6 grpc_core::Executor::InitAll() /var/local/git/grpc/src/core/lib/iomgr/executor.cc:408:58 (bm_chttp2_transport+0x6a6cfb)
    #7 grpc_iomgr_init() /var/local/git/grpc/src/core/lib/iomgr/iomgr.cc:55:3 (bm_chttp2_transport+0x6a81b1)
    #8 grpc_init /var/local/git/grpc/src/core/lib/surface/init.cc:145:5 (bm_chttp2_transport+0x765950)
    #9 grpc::internal::GrpcLibrary::init() /var/local/git/grpc/include/grpcpp/impl/grpc_library.h:34:26 (bm_chttp2_transport+0x5caaf4)
    #10 Library::Library() /var/local/git/grpc/test/cpp/microbenchmarks/helpers.cc:30:13 (bm_chttp2_transport+0x5ca340)
    #11 Library::get() /var/local/git/grpc/./test/cpp/microbenchmarks/helpers.h:35:20 (bm_chttp2_transport+0x5bacb7)
    #12 __cxx_global_var_init.2 /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:39:38 (bm_chttp2_transport+0x5206f5)
    #13 _GLOBAL__sub_I_bm_chttp2_transport.cc /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc (bm_chttp2_transport+0x520c09)
    #14 __libc_csu_init <null> (bm_chttp2_transport+0x8d85dc)
SUMMARY: ThreadSanitizer: data race /var/local/git/grpc/src/core/lib/gpr/alloc.cc:77:3 in gpr_free
==================

@apolcyn
Copy link
Contributor

apolcyn commented Apr 15, 2019

Possibly related, though slightly different error:

cpp_linux_dbg_native keyboard_arrow_right bins/dbg/bm_chttp2_transport --benchmark_filter=BM_TransportStreamSend/16777216$ GRPC_POLL_STRATEGY=poll

D0410 18:10:56.890676112   22362 ev_posix.cc:169]            Using polling engine: poll
D0410 18:10:56.890830390   22362 dns_resolver_ares.cc:440]   Using ares dns resolver
2019-04-10 18:10:56
Running bins/dbg/bm_chttp2_transport
Run on (32 X 2300 MHz CPU s)
CPU Caches:
  L1 Data 32K (x16)
  L1 Instruction 32K (x16)
  L2 Unified 256K (x16)
  L3 Unified 46080K (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.

https://source.cloud.google.com/results/invocations/e97c93f9-694c-4ae1-a83a-1c3882e520cf/targets/github%2Fgrpc%2Fcpp_linux_dbg_native/tests

@rmstar
Copy link
Contributor

rmstar commented May 22, 2019

Running bins/asan/bm_chttp2_transport
Run on (32 X 2300 MHz CPU s)
CPU Caches:
  L1 Data 32K (x16)
  L1 Instruction 32K (x16)
  L2 Unified 256K (x16)
  L3 Unified 46080K (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
=================================================================
[1m[31m==11241==ERROR: AddressSanitizer: heap-use-after-free on address 0x6200000040a0 at pc 0x0000008b22f3 bp 0x7ffdb2965280 sp 0x7ffdb2965278
[1m[0m[1m[34mWRITE of size 8 at 0x6200000040a0 thread T0[1m[0m
    #0 0x8b22f2 in gpr_mpscq_push(gpr_mpscq*, gpr_mpscq_node*) /var/local/git/grpc/src/core/lib/gpr/mpscq.cc:40:3
    #1 0x7c2d39 in combiner_exec(grpc_closure*, grpc_error*) /var/local/git/grpc/src/core/lib/iomgr/combiner.cc:188:3
    #2 0x61d29f in grpc_closure_sched(char const*, int, grpc_closure*, grpc_error*) /var/local/git/grpc/./src/core/lib/iomgr/closure.h:299:5
    #3 0x74e640 in destroy_transport(grpc_transport*) /var/local/git/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:569:3
    #4 0x723240 in grpc_transport_destroy(grpc_transport*) /var/local/git/grpc/src/core/lib/transport/transport.cc:122:3
    #5 0x61ac77 in Fixture::~Fixture() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:141:16
    #6 0x60c7dc in BM_TransportStreamSend(benchmark::State&) /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:459:1
    #7 0x66d667 in benchmark::internal::FunctionBenchmark::Run(benchmark::State&) /var/local/git/grpc/third_party/benchmark/src/benchmark_register.cc:453:42
    #8 0x63dd93 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::Benchmark::Instance const*, unsigned long, int, benchmark::internal::ThreadManager*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:166:17
    #9 0x63b421 in benchmark::internal::(anonymous namespace)::RunBenchmark(benchmark::internal::Benchmark::Instance const&, std::vector<benchmark::BenchmarkReporter::Run, std::allocator<benchmark::BenchmarkReporter::Run> >*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:210:7
    #10 0x638c93 in benchmark::internal::(anonymous namespace)::RunBenchmarks(std::vector<benchmark::internal::Benchmark::Instance, std::allocator<benchmark::internal::Benchmark::Instance> > const&, benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:430:11
    #11 0x6378ee in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:544:5
    #12 0x636c24 in benchmark::RunSpecifiedBenchmarks() /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:486:10
    #13 0x60f2c8 in benchmark::RunTheBenchmarksNamespaced() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:640:37
    #14 0x60f4ca in main /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:647:3
    #15 0x7fc51d2deb44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
    #16 0x534558 in _start (/var/local/git/grpc/bins/asan/bm_chttp2_transport+0x534558)
[1m[32m0x6200000040a0 is located 32 bytes inside of 3536-byte region [0x620000004080,0x620000004e50)
[1m[0m[1m[35mfreed by thread T0 here:[1m[0m
    #0 0x5d7f22 in free /tmp/clang-build/src/compiler-rt/lib/asan/asan_malloc_linux.cc:68:3
    #1 0x8afcb7 in gpr_free /var/local/git/grpc/src/core/lib/gpr/alloc.cc:77:3
    #2 0x61ab3c in Stream::~Stream() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:200:5
    #3 0x627264 in std::default_delete<Stream>::operator()(Stream*) const /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/unique_ptr.h:76:2
    #4 0x620cb5 in std::unique_ptr<Stream, std::default_delete<Stream> >::reset(Stream*) /usr/lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/bits/unique_ptr.h:344:4
    #5 0x60c5f8 in BM_TransportStreamSend(benchmark::State&) /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:455:5
    #6 0x66d667 in benchmark::internal::FunctionBenchmark::Run(benchmark::State&) /var/local/git/grpc/third_party/benchmark/src/benchmark_register.cc:453:42
    #7 0x63dd93 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::Benchmark::Instance const*, unsigned long, int, benchmark::internal::ThreadManager*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:166:17
    #8 0x63b421 in benchmark::internal::(anonymous namespace)::RunBenchmark(benchmark::internal::Benchmark::Instance const&, std::vector<benchmark::BenchmarkReporter::Run, std::allocator<benchmark::BenchmarkReporter::Run> >*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:210:7
    #9 0x638c93 in benchmark::internal::(anonymous namespace)::RunBenchmarks(std::vector<benchmark::internal::Benchmark::Instance, std::allocator<benchmark::internal::Benchmark::Instance> > const&, benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:430:11
    #10 0x6378ee in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:544:5
    #11 0x636c24 in benchmark::RunSpecifiedBenchmarks() /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:486:10
    #12 0x60f2c8 in benchmark::RunTheBenchmarksNamespaced() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:640:37
    #13 0x60f4ca in main /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:647:3
    #14 0x7fc51d2deb44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
[1m[35mpreviously allocated by thread T0 here:[1m[0m
    #0 0x5d8263 in __interceptor_malloc /tmp/clang-build/src/compiler-rt/lib/asan/asan_malloc_linux.cc:88:3
    #1 0x8afb76 in gpr_malloc /var/local/git/grpc/src/core/lib/gpr/alloc.cc:57:7
    #2 0x619f2c in Stream::Stream(Fixture*) /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:194:15
    #3 0x60b2aa in BM_TransportStreamSend(benchmark::State&) /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:390:40
    #4 0x66d667 in benchmark::internal::FunctionBenchmark::Run(benchmark::State&) /var/local/git/grpc/third_party/benchmark/src/benchmark_register.cc:453:42
    #5 0x63dd93 in benchmark::internal::(anonymous namespace)::RunInThread(benchmark::internal::Benchmark::Instance const*, unsigned long, int, benchmark::internal::ThreadManager*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:166:17
    #6 0x63b421 in benchmark::internal::(anonymous namespace)::RunBenchmark(benchmark::internal::Benchmark::Instance const&, std::vector<benchmark::BenchmarkReporter::Run, std::allocator<benchmark::BenchmarkReporter::Run> >*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:210:7
    #7 0x638c93 in benchmark::internal::(anonymous namespace)::RunBenchmarks(std::vector<benchmark::internal::Benchmark::Instance, std::allocator<benchmark::internal::Benchmark::Instance> > const&, benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:430:11
    #8 0x6378ee in benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*) /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:544:5
    #9 0x636c24 in benchmark::RunSpecifiedBenchmarks() /var/local/git/grpc/third_party/benchmark/src/benchmark.cc:486:10
    #10 0x60f2c8 in benchmark::RunTheBenchmarksNamespaced() /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:640:37
    #11 0x60f4ca in main /var/local/git/grpc/test/cpp/microbenchmarks/bm_chttp2_transport.cc:647:3
    #12 0x7fc51d2deb44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)

https://source.cloud.google.com/results/invocations/39180363-63c7-4101-8dfc-34286f4e616a/targets/github%2Fgrpc%2Fcpp_linux_asan_native_x64_clang7.0/tests

@jtattermusch
Copy link
Contributor

On linux I'm seeing a similar problem:

I0531 15:27:06.603509337   19410 ev_epoll1_linux.cc:116]     grpc epoll fd: 3
D0531 15:27:06.603647167   19410 ev_posix.cc:174]            Using polling engine: epoll1
D0531 15:27:06.603750931   19410 dns_resolver_ares.cc:480]   Using ares dns resolver
2019-05-31 15:27:06
Running bins/dbg/bm_chttp2_transport
Run on (32 X 2300 MHz CPU s)
CPU Caches:
  L1 Data 32K (x16)
  L1 Instruction 32K (x16)
  L2 Unified 256K (x16)
  L3 Unified 46080K (x1)
***WARNING*** Library was built as DEBUG. Timings may be affected.
E0531 15:27:07.107980144   19410 closure.h:285]              Closure already scheduled. (closure: 0x27b30a0, created: [test/cpp/microbenchmarks/bm_chttp2_transport.cc:187], previously scheduled at: [src/core/ext/transport/chttp2/transport/chttp2_transport.cc: 742], newly scheduled at [src/core/ext/transport/chttp2/transport/chttp2_transport.cc: 742], run?: false

@jtattermusch
Copy link
Contributor

This is currently our top flake - it's failing >50% builds on macos (measured over the last week).

It's flaky mostly on Mac, but it seems that it can be reproduced on Linux as well.

@jtattermusch
Copy link
Contributor

@AspirinSJL can you take a look?

@markdroth
Copy link
Member Author

@yashykt owns chttp2, so he's probably the right person to look at this.

@yashykt
Copy link
Member

yashykt commented Jun 14, 2019

The issue is that we are relying on the closures being scheduled on the exec_ctx for correctness. If the closures end up being scheduled on the executor thread, then we might end up destroying the Stream object before the grpc stream is freed.

The fix is to simply delete the Stream after the grpc stream is done being destroyed.

@yashykt
Copy link
Member

yashykt commented Jun 17, 2019

I think #19342 partially solves this. There still seems to be a race.

@hcaseyal
Copy link
Contributor

This was fixed by #20091

@lock lock bot locked as resolved and limited conversation to collaborators Dec 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
disposition/BUILDNURSE For all buildnurse related build/test failures and flakes kind/bug lang/c++ priority/P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants