Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash at RDS onConfigUpdate #3953

Closed
qiwzhang opened this issue Jul 25, 2018 · 5 comments
Closed

Crash at RDS onConfigUpdate #3953

qiwzhang opened this issue Jul 25, 2018 · 5 comments
Assignees
Labels
Milestone

Comments

@qiwzhang
Copy link
Contributor

Description:

Please see crash backtrace in https://github.com/istio/proxy/issues/1877

It crashed at this line:

last_updated_ = factory_context_.systemTimeSource().currentTime();

factory_context_ is gone. One possible cause is this RdsRouteConfigProvider is created by one ListenerImpl which is stored as factory_context_ in the provider. But this provider is shared by the second ListenerImpl since provider_manager is singleton or global. The first ListenerImpl object maybe gone when the provider::onConfigUpdate is called.

Repro steps: None

Call Stack:

#0 0x0000000000d71379 in uw_frame_state_for () at src/base/logging.h:229
#1 0x0000000000d72fb8 in _Unwind_Backtrace () at src/base/logging.h:229
#2 0x0000000000ce46ac in backward::details::Unwinderbackward::StackTraceImplbackward::system_tag::linux_tag::callback::operator() (depth=72,
f=..., this=0x7fa881f95a00) at external/com_github_bombela_backward/backward.hpp:647
#3 backward::details::unwindbackward::StackTraceImplbackward::system_tag::linux_tag::callback (depth=72, f=...)
at external/com_github_bombela_backward/backward.hpp:688
#4 backward::StackTraceImplbackward::system_tag::linux_tag::load_here (this=this@entry=0x7fa881f95a60, depth=depth@entry=72)
at external/com_github_bombela_backward/backward.hpp:704
#5 0x0000000000ce5c65 in backward::StackTraceImplbackward::system_tag::linux_tag::load_from (depth=64, addr=0x55, this=0x7fa881f95a60)
at external/com_github_bombela_backward/backward.hpp:710
#6 Envoy::BackwardsTrace::captureFrom (address=0x55, this=0x7fa881f95a60)
at bazel-out/k8-opt/bin/external/envoy/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:66
#7 Envoy::SignalAction::sigHandler (sig=11, info=, context=) at external/envoy/source/exe/signal_action.cc:33
#8
#9 0x0000000000000055 in ?? ()
#10 0x00000000007c089d in Envoy::Router::RdsRouteConfigProviderImpl::onConfigUpdate (this=0x28e0c40, resources=..., version_info=...)
at external/envoy/source/common/router/rds_impl.cc:100
#11 0x00000000007c2188 in Envoy::Config::GrpcMuxSubscriptionImplenvoy::api::v2::RouteConfiguration::onConfigUpdate (this=0x300dc70,
resources=..., version_info=...)
at bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:55
#12 0x00000000008d2f3f in Envoy::Config::GrpcMuxImpl::onReceiveMessage(std::unique_ptr<envoy::api::v2::DiscoveryResponse, std::default_deleteenvoy::api::v2::DiscoveryResponse >&&) (this=0x28b67e0,
message=<unknown type in /home/qiwzhang/debug_proxy/sym/usr/local/bin/envoy, CU 0x8ac27a2, DIE 0x8b7e86d>)
at external/envoy/source/common/config/grpc_mux_impl.cc:217
#13 0x00000000008d06e3 in Envoy::Grpc::TypedAsyncStreamCallbacksenvoy::api::v2::DiscoveryResponse::onReceiveMessageUntyped(std::unique_ptr<google::protobuf::Message, std::default_deletegoogle::protobuf::Message >&&) (this=0x28b67e8, message=)
at bazel-out/k8-opt/bin/external/envoy/include/envoy/grpc/_virtual_includes/async_client_interface/envoy/grpc/async_client.h:172
#14 0x00000000008f8a66 in Envoy::Grpc::AsyncStreamImpl::onData (this=0x296d7c0, data=..., end_stream=)
at external/envoy/source/common/grpc/async_client_impl.cc:138
#15 0x00000000008fd73c in Envoy::Http::AsyncStreamImpl::encodeData (this=0x2949b00, data=..., end_stream=false)
at external/envoy/source/common/http/async_client_impl.cc:103
#16 0x0000000000932f54 in Envoy::Http::Http2::ConnectionImpl::onFrameReceived (this=0x2979c08, frame=0x2b8c198)
at external/envoy/source/common/http/http2/codec_impl.cc:426
#17 0x0000000000936bc7 in session_call_on_frame_received (frame=0x2b8c198, session=0x2b8c000) at nghttp2_session.c:3295
#18 nghttp2_session_on_data_received (session=session@entry=0x2b8c000, frame=frame@entry=0x2b8c198) at nghttp2_session.c:4941
#19 0x000000000093a942 in session_process_data_frame (session=0x2b8c000) at nghttp2_session.c:4961
#20 nghttp2_session_mem_recv (session=0x2b8c000, in=0x2ba4b67 "", inlen=13246) at nghttp2_session.c:6581
#21 0x0000000000932b1f in Envoy::Http::Http2::ConnectionImpl::dispatch (this=0x2979c08, data=...)
at external/envoy/source/common/http/http2/codec_impl.cc:285
#22 0x00000000008df32f in Envoy::Http::CodecClient::onData (this=0x2b881b0, data=...) at external/envoy/source/common/http/codec_client.cc:115
#23 0x00000000008df4ad in Envoy::Http::CodecClient::CodecReadFilter::onData (this=, data=...)
at bazel-out/k8-opt/bin/external/envoy/source/common/http/_virtual_includes/codec_client_lib/common/http/codec_client.h:162
#24 0x00000000007b0b97 in Envoy::Network::FilterManagerImpl::onContinueReading (this=this@entry=0x297bb98, filter=filter@entry=0x0)
at external/envoy/source/common/network/filter_manager_impl.cc:56
#25 0x00000000007b0c6c in Envoy::Network::FilterManagerImpl::onRead (this=this@entry=0x297bb98)
at external/envoy/source/common/network/filter_manager_impl.cc:66
#26 0x00000000007af21f in Envoy::Network::ConnectionImpl::onRead (read_buffer_size=13246, this=0x297bb80)
at external/envoy/source/common/network/connection_impl.cc:213
#27 Envoy::Network::ConnectionImpl::onReadReady (this=this@entry=0x297bb80) at external/envoy/source/common/network/connection_impl.cc:446
#28 0x00000000007afa0e in Envoy::Network::ConnectionImpl::onFileEvent (this=0x297bb80, events=3)
at external/envoy/source/common/network/connection_impl.cc:422
#29 0x00000000007a8e68 in std::function<void (unsigned int)>::operator()(unsigned int) const (__args#0=3, this=)
at /usr/include/c++/5/functional:2267
#30 Envoy::Event::FileEventImpl::<lambda(int, short int, void*)>::operator() (__closure=0x0, arg=, what=)
at external/envoy/source/common/event/file_event_impl.cc:60
#31 Envoy::Event::FileEventImpl::<lambda(int, short int, void*)>::_FUN(int, short, void *) ()
at external/envoy/source/common/event/file_event_impl.cc:61
#32 0x0000000000aa07e2 in event_persist_closure (ev=, base=0x28b82c0) at event.c:1580
#33 event_process_active_single_queue (base=base@entry=0x28b82c0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0,
activeq=) at event.c:1639
#34 0x0000000000aa0f3f in event_process_active (base=0x28b82c0) at event.c:1738
#35 event_base_loop (base=0x28b82c0, flags=0) at event.c:1961
#36 0x00000000007795ee in Envoy::Server::InstanceImpl::run (this=0x286ec80) at external/envoy/source/server/server.cc:413
#37 0x0000000000572b81 in Envoy::MainCommonBase::run (this=this@entry=0x28a69a8) at external/envoy/source/exe/main_common.cc:83
#38 0x00000000004227d3 in Envoy::MainCommon::run (this=)
at bazel-out/k8-opt/bin/external/envoy/source/exe/_virtual_includes/envoy_main_common_lib/exe/main_common.h:44
#39 main (argc=18, argv=0x7ffc9a3284c8) at external/envoy/source/exe/main.cc:37

@lizan
Copy link
Member

lizan commented Jul 25, 2018

provider_manager_ is a singleton instantiated here:
https://github.com/envoyproxy/envoy/blob/master/source/extensions/filters/network/http_connection_manager/config.cc#L67

It could be the case, do you have a minimum setup to reproduce this?

@qiwzhang
Copy link
Contributor Author

I will see if I can create a test to reproduce the crash.

@lizan
Copy link
Member

lizan commented Jul 26, 2018

#3955 reproduces this consistently, I'm working on a fix.

PiotrSikora added a commit to PiotrSikora/envoy that referenced this issue Jul 26, 2018
This doesn't fix the underlying issue (that factory_context_ is being
referenced after being freed), but it stops Envoy from crashing.

*Risk Level*: Low
*Testing*: bazel test //test/... (new test crashing without this patch)
*Docs Changes*: n/a
*Release Notes*: n/a

Workaround for envoyproxy#3953.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
@PiotrSikora
Copy link
Contributor

Thanks to @qiwzhang's test (#3955) and some bisecting, I can confirm that the regression was introduced in #3691 (commit a22d960).

I pushed a workaround in #3957 (including a minimal reproducible test with some comments, based on #3955), but it doesn't fix the underlying issue of factory_context_ being used after being freed.

@derekargueta
Copy link
Member

Thanks for your work on this! We started experiencing the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants