Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadStatsIntegrationTest flake #11784

Closed
mattklein123 opened this issue Jun 27, 2020 · 4 comments · Fixed by #12265
Closed

LoadStatsIntegrationTest flake #11784

mattklein123 opened this issue Jun 27, 2020 · 4 comments · Fixed by #12265

Comments

@mattklein123
Copy link
Member

2020-06-27T00:21:11.6210865Z [ RUN      ] IpVersionsClientType/LoadStatsIntegrationTest.InProgress/1
2020-06-27T00:21:11.6211464Z #0 Envoy::SignalAction::sigHandler() [0x43c11c4]
2020-06-27T00:21:11.6211970Z #1 __restore_rt [0x7feac4b2c890]
2020-06-27T00:21:11.6212436Z #2 __llvm_coverage_mapping [0x1ef8191]
2020-06-27T00:21:11.6212984Z #3 std::__1::__invoke_void_return_wrapper<>::__call<>() [0x1ef8111]
2020-06-27T00:21:11.6213537Z #4 __llvm_coverage_mapping [0x1ef80d1]
2020-06-27T00:21:11.6214072Z #5 std::__1::__function::__func<>::operator()() [0x1ef6ae3]
2020-06-27T00:21:11.6214602Z #6 __llvm_coverage_mapping [0x1e7dd1e]
2020-06-27T00:21:11.6215086Z #7 std::__1::function<>::operator()() [0x1e7db39]
2020-06-27T00:21:11.6215676Z #8 Envoy::Common::CallbackManager<>::runCallbacks() [0x1e7d996]
2020-06-27T00:21:11.6216279Z #9 Envoy::SharedConnectionWrapper::onEvent() [0x1e7cfb0]
2020-06-27T00:21:11.6216914Z #10 Envoy::Network::ConnectionImplBase::raiseConnectionEvent() [0x3d6d8da]
2020-06-27T00:21:11.6217777Z #11 Envoy::Network::ConnectionImpl::raiseEvent() [0x3d5615a]
2020-06-27T00:21:11.6218390Z #12 Envoy::Network::ConnectionImpl::closeSocket() [0x3d55eb8]
2020-06-27T00:21:11.6219011Z #13 Envoy::Network::ConnectionImpl::onFileEvent() [0x3d57a6c]
2020-06-27T00:21:11.6219655Z #14 Envoy::Network::ConnectionImpl::ConnectionImpl()::$_6::operator()() [0x3d67132]
2020-06-27T00:21:11.6220272Z #15 std::__1::__invoke<>() [0x3d670c5]
2020-06-27T00:21:11.6220851Z #16 std::__1::__invoke_void_return_wrapper<>::__call<>() [0x3d67036]
2020-06-27T00:21:11.6221509Z #17 std::__1::__function::__alloc_func<>::operator()() [0x3d66fe6]
2020-06-27T00:21:11.6222116Z #18 std::__1::__function::__func<>::operator()() [0x3d659f8]
2020-06-27T00:21:11.6222750Z #19 std::__1::__function::__value_func<>::operator()() [0x2f5d6d6]
2020-06-27T00:21:11.6223347Z #20 std::__1::function<>::operator()() [0x2f5d613]
2020-06-27T00:21:11.6223984Z #21 Envoy::Event::FileEventImpl::assignEvents()::$_0::operator()() [0x3d4a271]
2020-06-27T00:21:11.6224915Z #22 Envoy::Event::FileEventImpl::assignEvents()::$_0::__invoke() [0x3d4a186]
2020-06-27T00:21:11.6225518Z #23 event_persist_closure [0x58447a0]
2020-06-27T00:21:11.6226017Z #24 event_process_active_single_queue [0x5843c77]
2020-06-27T00:21:11.6226512Z #25 event_process_active [0x583d9f3]
2020-06-27T00:21:11.6226941Z #26 event_base_loop [0x583c493]
2020-06-27T00:21:11.6227476Z #27 Envoy::Event::LibeventScheduler::run() [0x3f6984a]
2020-06-27T00:21:11.6228063Z #28 Envoy::Event::DispatcherImpl::run() [0x3d3adff]
2020-06-27T00:21:11.6228636Z #29 Envoy::FakeUpstream::threadRoutine() [0x1e93ecd]
2020-06-27T00:21:11.6229143Z #30 __llvm_coverage_mapping [0x1eafc8c]
2020-06-27T00:21:11.6229627Z #31 __llvm_coverage_mapping [0x1eafc21]
2020-06-27T00:21:11.6230266Z #32 __llvm_coverage_mapping [0x1eafba1]
2020-06-27T00:21:11.6230734Z #33 __llvm_coverage_mapping [0x1eafb61]
2020-06-27T00:21:11.6231175Z #34 __llvm_coverage_mapping [0x1eae583]
2020-06-27T00:21:11.6231647Z #35 __llvm_coverage_mapping [0x1e7dd1e]
2020-06-27T00:21:11.6232161Z #36 std::__1::function<>::operator()() [0x1e7db39]
2020-06-27T00:21:11.6232790Z #37 Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::operator()() [0x582bb56]
2020-06-27T00:21:11.6233553Z #38 Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x582bb15]
2020-06-27T00:21:11.6234166Z #39 start_thread [0x7feac4b216db]
2020-06-27T00:21:11.6234784Z external/bazel_tools/tools/test/collect_coverage.sh: line 131:   861 Aborted                 (core dumped) "$@"
@alyssawilk
Copy link
Contributor

also
https://dev.azure.com/cncf/4684fb3d-0389-4e0b-8251-221942316e06/_apis/build/builds/43751/logs/31

2020-07-01T18:50:30.2092982Z [ RUN ] IpVersionsClientType/LoadStatsIntegrationTest.InProgress/0
2020-07-01T18:50:30.2093534Z #0 Envoy::SignalAction::sigHandler() [0x43f6a44]
2020-07-01T18:50:30.2093980Z #1 __restore_rt [0x7fc097d58890]
2020-07-01T18:50:30.2094409Z #2 __llvm_coverage_mapping [0x1f086d1]
2020-07-01T18:50:30.2094967Z #3 std::__1::__invoke_void_return_wrapper<>::__call<>() [0x1f08651]
2020-07-01T18:50:30.2095507Z #4 __llvm_coverage_mapping [0x1f08611]
2020-07-01T18:50:30.2096017Z #5 std::__1::__function::__func<>::operator()() [0x1f07023]
2020-07-01T18:50:30.2096516Z #6 __llvm_coverage_mapping [0x1e8d59e]
2020-07-01T18:50:30.2096981Z #7 std::__1::function<>::operator()() [0x1e8d3b9]
2020-07-01T18:50:30.2097542Z #8 Envoy::Common::CallbackManager<>::runCallbacks() [0x1e8d216]
2020-07-01T18:50:30.2098119Z #9 Envoy::SharedConnectionWrapper::onEvent() [0x1e8c830]
2020-07-01T18:50:30.2098742Z #10 Envoy::Network::ConnectionImplBase::raiseConnectionEvent() [0x3da0cda]
2020-07-01T18:50:30.2099402Z #11 Envoy::Network::ConnectionImpl::raiseEvent() [0x3d89aea]
2020-07-01T18:50:30.2100007Z #12 Envoy::Network::ConnectionImpl::closeSocket() [0x3d89848]
2020-07-01T18:50:30.2100588Z #13 Envoy::Network::ConnectionImpl::onFileEvent() [0x3d8b3fc]
2020-07-01T18:50:30.2101240Z #14 Envoy::Network::ConnectionImpl::ConnectionImpl()::$_6::operator()() [0x3d9ab12]
2020-07-01T18:50:30.2101837Z #15 std::__1::__invoke<>() [0x3d9aaa5]
2020-07-01T18:50:30.2102376Z #16 std::__1::__invoke_void_return_wrapper<>::__call<>() [0x3d9aa16]
2020-07-01T18:50:30.2103005Z #17 std::__1::__function::__alloc_func<>::operator()() [0x3d9a9c6]
2020-07-01T18:50:30.2103593Z #18 std::__1::__function::__func<>::operator()() [0x3d993d8]
2020-07-01T18:50:30.2104225Z #19 std::__1::__function::__value_func<>::operator()() [0x2f79766]
2020-07-01T18:50:30.2104811Z #20 std::__1::function<>::operator()() [0x2f796a3]
2020-07-01T18:50:30.2105392Z #21 Envoy::Event::FileEventImpl::mergeInjectedEventsAndRunCb() [0x3d7ba68]
2020-07-01T18:50:30.2106066Z #22 Envoy::Event::FileEventImpl::assignEvents()::$_1::operator()() [0x3d7bb78]
2020-07-01T18:50:30.2106750Z #23 Envoy::Event::FileEventImpl::assignEvents()::$_1::__invoke() [0x3d7ba96]
2020-07-01T18:50:30.2107462Z #24 event_persist_closure [0x5884640]
2020-07-01T18:50:30.2107941Z #25 event_process_active_single_queue [0x5883b17]
2020-07-01T18:50:30.2108396Z #26 event_process_active [0x587d893]
2020-07-01T18:50:30.2108796Z #27 event_base_loop [0x587c333]
2020-07-01T18:50:30.2109285Z #28 Envoy::Event::LibeventScheduler::run() [0x3f9d69a]
2020-07-01T18:50:30.2109896Z #29 Envoy::Event::DispatcherImpl::run() [0x3d6c35f]
2020-07-01T18:50:30.2110422Z #30 Envoy::FakeUpstream::threadRoutine() [0x1ea374d]
2020-07-01T18:50:30.2110973Z #31 __llvm_coverage_mapping [0x1ebf50c]
2020-07-01T18:50:30.2111509Z #32 __llvm_coverage_mapping [0x1ebf4a1]
2020-07-01T18:50:30.2112217Z #33 __llvm_coverage_mapping [0x1ebf421]
2020-07-01T18:50:30.2112726Z #34 __llvm_coverage_mapping [0x1ebf3e1]
2020-07-01T18:50:30.2113367Z #35 __llvm_coverage_mapping [0x1ebde03]
2020-07-01T18:50:30.2113850Z #36 __llvm_coverage_mapping [0x1e8d59e]

@mattklein123
Copy link
Member Author

[2020-07-01 21:44:04.635][32][debug][router] [source/common/router/router.cc:789] [C3][S0] upstream timeout
[2020-07-01 21:44:04.639][32][debug][router] [source/common/router/upstream_request.cc:284] [C3][S0] resetting pool request
[2020-07-01 21:44:04.639][32][debug][client] [source/common/http/codec_client.cc:115] [C4] request reset
[2020-07-01 21:44:04.639][32][trace][main] [source/common/event/dispatcher_impl.cc:175] item added to deferred deletion list (size=1)
[2020-07-01 21:44:04.639][32][debug][connection] [source/common/network/connection_impl.cc:112] [C4] closing data_to_write=0 type=1
[2020-07-01 21:44:04.639][32][debug][connection] [source/common/network/connection_impl.cc:208] [C4] closing socket: 1
[2020-07-01 21:44:04.639][32][debug][client] [source/common/http/codec_client.cc:92] [C4] disconnect. resetting 0 pending requests
[2020-07-01 21:44:04.639][21][trace][connection] [source/common/network/connection_impl.cc:506] [C5] socket event: 6
[2020-07-01 21:44:04.639][32][debug][pool] [source/common/conn_pool/conn_pool_base.cc:255] [C4] client disconnected, failure reason: 
[2020-07-01 21:44:04.639][21][debug][connection] [source/common/network/connection_impl.cc:527] [C5] remote early close
[2020-07-01 21:44:04.639][21][debug][connection] [source/common/network/connection_impl.cc:208] [C5] closing socket: 0
[2020-07-01 21:44:04.639][21][critical][assert] [./test/integration/fake_upstream.h:365] assert failure: parented_ || allow_unexpected_disconnects_. Details: An queued upstream connection was torn down without being associated with a fake connection. Either manage the connection via waitForRawConnection() or waitForHttpConnection(), or set_allow_unexpected_disconnects(true).
 See https://github.com/envoyproxy/envoy/blob/master/test/integration/README.md#unparented-upstream-connections
[2020-07-01 21:44:04.639][21][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:104] Caught Aborted, suspect faulting address 0x3e800000011

The issue is:

  1. We are losing critical assert logs on coverage. I will open an issue on this.
  2. The response is actually timing out somehow, I guess due to the machine being pegged, and then the connection is getting closed, causing an unexpected disconnect.

@alyssawilk @antoniovicente I don't think I'm going to have time to fix this before I take off for vacation. This might be the cause of similar coverage flakes. cc @lizan @jmarantz

@mattklein123
Copy link
Member Author

Opened #11860 to track why it's so hard to debug failures on coverage.

@alyssawilk
Copy link
Contributor

If it's a timeout it could be that the response is timing out, or that some other infinite wait is causing us to not get to the point where the upstream operates (one more reason I like autonomous upstream).
I'll see if I can fix the infinite timeout issue and see if that gets us more data.

alyssawilk added a commit that referenced this issue Jul 2, 2020
Replacing an infinite timeout in load stats test with a "fast" fail of 5s.

Risk Level: n/a
Testing: test passes
Ideally gets debug info for #11784

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
mattklein123 added a commit that referenced this issue Jul 24, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes #11784

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Jul 24, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes #11784

Signed-off-by: Matt Klein <mklein@lyft.com>
lambdai pushed a commit to lambdai/envoy-dai that referenced this issue Jul 24, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
istio-testing pushed a commit to istio/envoy that referenced this issue Jul 25, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>

Co-authored-by: Matt Klein <mklein@lyft.com>
KBaichoo pushed a commit to KBaichoo/envoy that referenced this issue Jul 30, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
chaoqin-li1123 pushed a commit to chaoqin-li1123/envoy that referenced this issue Aug 7, 2020
Waiting on a load stats response can race with resetting
the counters when initializing a watch. Moving the counter
increment prevents the race.

Fixes envoyproxy#11784

Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants