Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile again #6

Closed
bodleytunes opened this issue Nov 21, 2017 · 6 comments
Closed

Compile again #6

bodleytunes opened this issue Nov 21, 2017 · 6 comments

Comments

@bodleytunes
Copy link

Hi,

Latest version doesn't seem to want to compile:

Tried on Debian 8 and also ubuntu 17.04, seems to stop in the same place.

Getting a problem when it tries to make a reference to thrift:

Scanning dependencies of target openrlib [ 50%] Building CXX object CMakeFiles/openrlib.dir/allocators/PrefixAllocator.cpp.o [ 51%] Building CXX object CMakeFiles/openrlib.dir/common/KnownKeysStore.cpp.o [ 51%] Building CXX object CMakeFiles/openrlib.dir/common/ExponentialBackoff.cpp.o [ 52%] Building CXX object CMakeFiles/openrlib.dir/common/Util.cpp.o [ 53%] Building CXX object CMakeFiles/openrlib.dir/common/Constants.cpp.o [ 53%] Building CXX object CMakeFiles/openrlib.dir/config-store/PersistentStore.cpp.o [ 54%] Building CXX object CMakeFiles/openrlib.dir/config-store/PersistentStoreClient.cpp.o [ 54%] Building CXX object CMakeFiles/openrlib.dir/decision/Decision.cpp.o [ 55%] Building CXX object CMakeFiles/openrlib.dir/fib/Fib.cpp.o [ 56%] Building CXX object CMakeFiles/openrlib.dir/health-checker/HealthChecker.cpp.o [ 56%] Building CXX object CMakeFiles/openrlib.dir/kvstore/KvStoreClient.cpp.o [ 57%] Building CXX object CMakeFiles/openrlib.dir/kvstore/KvStore.cpp.o [ 57%] Building CXX object CMakeFiles/openrlib.dir/kvstore/KvStoreWrapper.cpp.o [ 58%] Building CXX object CMakeFiles/openrlib.dir/link-monitor/LinkMonitor.cpp.o [ 59%] Building CXX object CMakeFiles/openrlib.dir/nl/NetlinkIfSocket.cpp.o [ 59%] Building CXX object CMakeFiles/openrlib.dir/nl/NetlinkRouteSocket.cpp.o [ 60%] Building CXX object CMakeFiles/openrlib.dir/nl/NetlinkSubscriber.cpp.o [ 60%] Building CXX object CMakeFiles/openrlib.dir/platform/NetlinkFibHandler.cpp.o [ 61%] Building CXX object CMakeFiles/openrlib.dir/platform/NetlinkSystemHandler.cpp.o [ 62%] Building CXX object CMakeFiles/openrlib.dir/platform/PlatformPublisher.cpp.o [ 62%] Building CXX object CMakeFiles/openrlib.dir/prefix-manager/PrefixManager.cpp.o [ 63%] Building CXX object CMakeFiles/openrlib.dir/prefix-manager/PrefixManagerClient.cpp.o [ 64%] Building CXX object CMakeFiles/openrlib.dir/spark/IoProvider.cpp.o [ 64%] Building CXX object CMakeFiles/openrlib.dir/spark/SparkWrapper.cpp.o [ 65%] Building CXX object CMakeFiles/openrlib.dir/spark/Spark.cpp.o [ 65%] Linking CXX static library libopenrlib.a [ 65%] Built target openrlib Scanning dependencies of target fib_test [ 66%] Building CXX object CMakeFiles/fib_test.dir/fib/tests/FibTest.cpp.o [ 66%] Building CXX object CMakeFiles/fib_test.dir/fib/tests/MockNetlinkFibHandler.cpp.o [ 67%] Linking CXX executable fib_test /usr/local/lib/libthriftcpp2.so: undefined reference to apache::thrift::RequestRpcMetadata::~RequestRpcMetadata()'
collect2: error: ld returned 1 exit status
CMakeFiles/fib_test.dir/build.make:138: recipe for target 'fib_test' failed
make[2]: *** [fib_test] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/fib_test.dir/all' failed
make[1]: *** [CMakeFiles/fib_test.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2
root@hub2-proxhub2-prox /home/jon/openr/build # `

image

@cmpt-mnt
Copy link

I had same issue and error messages on Ubuntu16.04...

@jstrizich
Copy link
Contributor

Hi @bodleytunes , @cmpt-mnt: addressing this now. should have a fix by tomorrow

@bodleytunes
Copy link
Author

OK cheers :)

@jstrizich
Copy link
Contributor

jstrizich commented Nov 27, 2017

The fix is landed here and build_openr.sh should build with this version or later of fbthrift

@bodleytunes
Copy link
Author

Hi

I just updated the repo with a git pull and tried install again but now get this :(

`Install the project...
-- Install configuration: "Release"
-- Up-to-date: /usr/local/lib/libmstch.a
-- Up-to-date: /usr/local/include/mstch/mstch.hpp
-- Up-to-date: /usr/local/lib/cmake/mstch/mstch-targets.cmake
-- Installing: /usr/local/lib/cmake/mstch/mstch-targets-release.cmake
-- Up-to-date: /usr/local/lib/cmake/mstch/mstch-config.cmake
-- Up-to-date: /usr/local/lib/cmake/mstch/mstch-config-version.cmake

  • ldconfig
  • popd
    /home/jon/openr/build/deps
  • install_zstd
  • pushd .
    /home/jon/openr/build/deps /home/jon/openr/build/deps
  • [[ ! -e zstd ]]
  • cd zstd
  • make
    make[1]: Entering directory '/home/jon/openr/build/deps/zstd/lib'
    make[1]: Nothing to be done for 'default'.
    make[1]: Leaving directory '/home/jon/openr/build/deps/zstd/lib'
    make[1]: Entering directory '/home/jon/openr/build/deps/zstd/programs'
    make[1]: Nothing to be done for 'default'.
    make[1]: Leaving directory '/home/jon/openr/build/deps/zstd/programs'
    cp programs/zstd .
  • make install
    make[1]: Entering directory '/home/jon/openr/build/deps/zstd/lib'
    Installing libraries
    Installing includes
    zstd static and shared library installed
    make[1]: Leaving directory '/home/jon/openr/build/deps/zstd/lib'
    make[1]: Entering directory '/home/jon/openr/build/deps/zstd/programs'
    Installing binaries
    Installing man pages
    zstd installation completed
    make[1]: Leaving directory '/home/jon/openr/build/deps/zstd/programs'
  • ldconfig
  • popd
    /home/jon/openr/build/deps
  • install_folly
  • pushd .
    /home/jon/openr/build/deps /home/jon/openr/build/deps
  • [[ ! -e folly ]]
    ++ find_github_hash facebook/folly
    ++ [[ 1 -eq 1 ]]
    ++ rev_file=github_hashes/facebook/folly-rev.txt
    ++ [[ -f github_hashes/facebook/folly-rev.txt ]]
    ++ head -1 github_hashes/facebook/folly-rev.txt
    ++ awk '{ print $3 }'
  • rev=ee207f19892790e091082cc7c7ab8c5df0398061
  • cd folly/folly
  • [[ ! -z ee207f19892790e091082cc7c7ab8c5df0398061 ]]
  • git checkout ee207f19892790e091082cc7c7ab8c5df0398061
    fatal: reference is not a tree: ee207f19892790e091082cc7c7ab8c5df0398061`

image

@bodleytunes
Copy link
Author

Sorry it works, re-cloned again from scratch and is perfect.

Running tests...
Test project /home/jon/openr/build
Start 1: OpenrSystemTest
1/19 Test #1: OpenrSystemTest .................. Passed 80.68 sec
Start 2: PrefixAllocatorTest
2/19 Test #2: PrefixAllocatorTest .............. Passed 2.28 sec
Start 3: RangeAllocatorTest
3/19 Test #3: RangeAllocatorTest ............... Passed 2.42 sec
Start 4: ExponentialBackoffTest
4/19 Test #4: ExponentialBackoffTest ........... Passed 0.02 sec
Start 5: KnownKeysStoreTest
5/19 Test #5: KnownKeysStoreTest ............... Passed 0.05 sec
Start 6: UtilTest
6/19 Test #6: UtilTest ......................... Passed 0.03 sec
Start 7: PersistentStoreTest
7/19 Test #7: PersistentStoreTest .............. Passed 0.02 sec
Start 8: DecisionTest
8/19 Test #8: DecisionTest ..................... Passed 41.84 sec
Start 9: HealthCheckerTest
9/19 Test #9: HealthCheckerTest ................ Passed 10.02 sec
Start 10: KvStoreTest
10/19 Test #10: KvStoreTest ...................... Passed 6.76 sec
Start 11: KvStoreClientTest
11/19 Test #11: KvStoreClientTest ................ Passed 1.67 sec
Start 12: LinkMonitorTest
12/19 Test #12: LinkMonitorTest .................. Passed 39.09 sec
Start 13: FibTest
13/19 Test #13: FibTest .......................... Passed 11.12 sec
Start 14: NetlinkIfSocketTest
14/19 Test #14: NetlinkIfSocketTest .............. Passed 0.11 sec
Start 15: NetlinkRouteSocketTest
15/19 Test #15: NetlinkRouteSocketTest ........... Passed 28.53 sec
Start 16: NetlinkSubscriberTest
16/19 Test #16: NetlinkSubscriberTest ............ Passed 0.33 sec
Start 17: PrefixManagerTest
17/19 Test #17: PrefixManagerTest ................ Passed 0.47 sec
Start 18: SparkTest
18/19 Test #18: SparkTest ........................ Passed 13.98 sec
Start 19: MockIoProviderTest
19/19 Test #19: MockIoProviderTest ............... Passed 0.25 sec

100% tests passed, 0 tests failed out of 19

Total Test time (real) = 239.67 sec

  • popd
    /home/jon/openr/build/deps
  • echo 'OpenR built and installed successfully'
    OpenR built and installed successfully

Cheers!

Jon.

facebook-github-bot pushed a commit that referenced this issue Nov 27, 2017
Summary:
to fix the issue raised [here](#6 (comment)).
if we pull a new version of openr we can get in a state where we try to checkout
a version of another repo without fetching it first

Reviewed By: saifhhasan

Differential Revision: D6414432

fbshipit-source-id: d9dbfce5691251b938da241bb339bccf18cdb365
facebook-github-bot pushed a commit that referenced this issue Feb 28, 2020
Summary:
There is a race condition when KvStoreWrapper and ZMQ background threads are
destroyed. This diff is an attempt to explicitly destroy KvStoreWrapper in TearDown

Also use unique_ptr instead of shared_ptr

```
WARNING: ThreadSanitizer: data race (pid=49594)
  Write of size 8 at 0x7ba000000220 by thread T1:
    #0 close <null> (link_monitor_test+0xdb1f6a)
    #1 zmq::signaler_t::~signaler_t() /home/engshare/third-party2/zeromq/4.3.1/src/zeromq-4.3.1/src/signaler.cpp:114:20 (link_monitor_test+0xbd1274)

  Previous read of size 8 at 0x7ba000000220 by main thread:
    #0 epoll_ctl <null> (link_monitor_test+0xd9bdef)
    #1 epoll_del /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/epoll.c:485 (link_monitor_test+0xec4253)
    #2 folly::EventBaseEvent::eb_event_del() <null> (link_monitor_test+0xa747ef)
    #3 folly::EventHandler::~EventHandler() <null> (link_monitor_test+0xa74a70)
    #4 std::_Hashtable<int, std::pair<int const, openr::OpenrEventBase::ZmqEventHandler>, std::allocator<std::pair<int const, openr::OpenrEventBase::ZmqEventHandler> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::~_Hashtable() <null> (link_monitor_test+0x219620)
    #5 openr::OpenrEventBase::~OpenrEventBase() <null> (link_monitor_test+0x212d8d)
    #6 openr::KvStore::~KvStore() <null> (link_monitor_test+0x1bfbb7)
    #7 openr::KvStore::~KvStore() <null> (link_monitor_test+0x1bfbe9)
    #8 openr::KvStoreWrapper::~KvStoreWrapper() <null> (link_monitor_test+0x117533)
    #9 std::_Sp_counted_ptr_inplace<openr::KvStoreWrapper, std::allocator<openr::KvStoreWrapper>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() <null> (link_monitor_test+0x116fbd)
    #10 LinkMonitorTestFixture::~LinkMonitorTestFixture() <null> (link_monitor_test+0xf8285)
    #11 LinkMonitorTestFixture_BasicOperation_Test::~LinkMonitorTestFixture_BasicOperation_Test() <null> (link_monitor_test+0xf7ee9)
    #12 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (link_monitor_test+0xc4b2ae)
    #13 __libc_start_main /home/engshare/third-party2/glibc/2.26/src/glibc-2.26/csu/libc-start.c:308:16 (libc.so.6+0x211a5)
```

Reviewed By: yi-xian

Differential Revision: D20148667

fbshipit-source-id: 71634c08df8ebbc98a6b8c4aa3329166834453f2
facebook-github-bot pushed a commit that referenced this issue Oct 12, 2020
Summary:
As titled, several different approaches have been added to address original issue of crash when eventbase is destructed. See P135204939

```
(gdb) bt
#0  0x000000000187ea8e in re2::RE2::Set::Match (this=0x2, text=..., v=0x7fffd8c0d500, error_info=0x0) at re2/set.cc:110
#1  0x000000000072d304 in openr::KeyPrefix::keyMatch (this=<optimized out>, key=...) at openr/common/Util.cpp:50
#2  0x00000000006d3649 in openr::KvStoreFilters::keyMatch (this=0x7fb22c613660, key=..., value=...) at openr/kvstore/KvStore.cpp:69
#3  0x00000000006d60e4 in openr::KvStore::mergeKeyValues (kvStore=..., keyVals=..., filters=...) at openr/kvstore/KvStore.cpp:246
#4  0x00000000006e3813 in openr::KvStoreDb::mergePublication (this=0x7fb232fbad28, rcvdPublication=..., senderId=...)
    at openr/kvstore/KvStore.cpp:2835
#5  0x00000000006e32d7 in openr::KvStoreDb::processThriftSuccess (this=0x7fb232fbad28, peerName=..., pub=..., timeDelta=...)
    at openr/kvstore/KvStore.cpp:1395
#6  0x000000000070487b in openr::KvStoreDb::requestThriftPeerSync()::$_19::operator()(openr::thrift::Publication&&) const (
    this=<optimized out>, pub=...) at openr/kvstore/KvStore.cpp:1338
#7  folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}::operator()() const (this=<optimized out>)
    at folly/futures/Future-inl.h:99
#8  folly::futures::detail::InvokeResultWrapper<void>::wrapResult<folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}>(folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}) (fn=...) at folly/futures/Future-inl.h:91
#9  folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&) (t=..., f=...) at folly/futures/Future-inl.h:109
#10 folly::Future<openr::thrift::Publication>::thenValue<openr::KvStoreDb::requestThriftPeerSync()::$_19>(openr::KvStoreDb::requestThriftPeerSync()::$_19&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&) (this=<optimized out>, t=...)
    at folly/futures/Future-inl.h:1033
#11 folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<openr::thrift::Publication>::thenValue<openr::KvStoreDb::requestThriftPeerSync()::$_19>(openr::KvStoreDb::requestThriftPeerSync()::$_19&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&)#1}>::invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<openr::thrift::Publication> >(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&) (this=<optimized out>, args=..., args=...)
    at folly/futures/Future-inl.h:145
...
#19 0x00000000015d00ee in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7fb22c63b1f0) at folly/Function.h:416
#20 folly::EventBase::FunctionLoopCallback::runLoopCallback (this=0x7fb22c63b1c0) at folly/io/async/EventBase.h:188
#21 folly::EventBase::runLoopCallbacks (this=<optimized out>) at folly/io/async/EventBase.cpp:703
#22 folly::EventBase::loopBody (this=0x7fb23ecb9410, flags=1, ignoreKeepAlive=false) at folly/io/async/EventBase.cpp:402
#23 0x00000000015cdc60 in folly::EventBase::loopOnce (this=0x7fb23ecb9410, flags=0) at folly/io/async/EventBase.cpp:330
#24 folly::EventBase::~EventBase (this=0x7fb23ecb9410, vtt=<optimized out>) at folly/io/async/EventBase.cpp:211
#25 0x00000000006f284e in openr::KvStore::~KvStore (this=0x7fb23ecb9400) at openr/kvstore/KvStore.h:532
```

To fix this, we introduced map to hold every individual future from thrift client. However, this occasionally will make KvStore destruction stuck when waiting for all futures to be fulfilled.

We should NOT track every individual future, which is NOT necessary at all.

From the crash trace, clearly, we are doing `mergePublications()` when invoking `processThriftPublication()` before checking if the peer is valid or NOT.

Fix:
Ignore the rest logic of callback `processThriftPublication()` if peerName is NOT valid.

Reviewed By: saifhhasan

Differential Revision: D24262710

fbshipit-source-id: fa69aaa5c6e43cfc861de7431b9c1e26195684a0
facebook-github-bot pushed a commit that referenced this issue Jul 26, 2023
…_ destroyed

Summary:
# background
Open/R has been haunted with unclean exit issue carried over multiple oncall rotations. Especially the notorious one in `KvStore`, like the following gdb traces happened inside T158806075
```
[xiangxu1121@devvm1867.nao0 ~/local/fbsource (470cb762a)]$ fboss-dbg-helper gdb fsw008.p062.f01.rva2
0) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58
Enter the file's number: 0
0) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/dogpile.unknown_tw_task.servicerouter.1583324.QUEUE_LAG.230715-025056 - Sat Jul 15 02:50:56 2023
1) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/dynocat - Fri Jul 21 15:05:59 2023
2) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/openr - Wed Jul 19 09:07:29 2023Enter the file's number: 2
INFO:root:Received output from file command:
INFO:root:/home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/openr: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/openr --v=1 --vmodule=BgpSerializer*=1,FiberBgp*=2 --logging=DBG1;def', real uid: 36662, effective uid: 36662, real gid: 36337, effective gid: 36337, execfn: '/usr/sbin/openr', platform: 'x86_64'
(gdb) bt
#0  0x00000000035aa5b4 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::KvStorePeer>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::KvStorePeer> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
    at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/hashtable_policy.h:431
#1  0x00000000035baf56 in openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::processThriftFailure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, folly::basic_fbstring<char, std::char_traits<char>, std::allocator<char>, folly::fbstring_core<char> > const&, std::chrono::duration<long, std::ratio<1l, 1000l> >) ()
    at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/unordered_map.h:869
#2  0x00000000035c4fac in openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::requestThriftPeerSync()::{lambda(folly::exception_wrapper const&)#1}::operator()(folly::exception_wrapper const&) const ()
    at fbcode/openr/kvstore/KvStore-inl.h:1753
#3  0x00000000035c4d0c in _ZZNO5folly6FutureINS_4UnitEE13thenErrorImplIZN5openr9KvStoreDbIN6apache6thrift6ClientINS4_6thrift12OpenrCtrlCppEEEE21requestThriftPeerSyncEvEUlRKNS_17exception_wrapperEE_EENSt9enable_ifIXntsr20isFutureOrSemiFutureINS_13invoke_detail6traitsIT_E6resultISD_EEEE5valueES2_E4typeEOSK_NS_7futures6detail18InlineContinuationEENUlONS_8Executor9KeepAliveISU_EEONS_3TryIS1_EEE_clESX_S10_ () at fbcode/folly/futures/Future-inl.h:137
#4  0x00000000035c4bea in _ZN5folly6detail8function14FunctionTraitsIFvRNS_7futures6detail8CoreBaseEONS_8Executor9KeepAliveIS7_EEPNS_17exception_wrapperEEE7callBigIZNS4_4CoreINS_4UnitEE11setCallbackIZNOS_6FutureISH_E13thenErrorImplIZN5openr9KvStoreDbIN6apache6thrift6ClientINSN_6thrift12OpenrCtrlCppEEEE21requestThriftPeerSyncEvEUlRKSB_E_EENSt9enable_ifIXntsr20isFutureOrSemiFutureINS_13invoke_detail6traitsIT_E6resultISB_EEEE5valueESL_E4typeEOS12_NS4_18InlineContinuationEEUlSA_ONS_3TryISH_EEE_EEvS18_OSt10shared_ptrINS_14RequestContextEES19_EUlS6_SA_SC_E_EEvS6_SA_SC_RNS1_4DataE () at fbcode/folly/futures/detail/Core.h:619
warning: Could not find DWO CU buck-out/v2/gen/fbcode/ad98fc21e927c889/folly/futures/detail/__core__/__objects__/Core.cpp.o(0xb8563f2cc6214b1a) referenced by CU at offset 0x10ac73 [in module /data/users/xiangxu1121/fboss_dbg/packages/openr:325/openr.debuginfo]
#5  0x0000000002408dc9 in folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0::operator()(folly::Executor::KeepAlive<folly::Executor>&&) () at fbcode/folly/Function.h:375
warning: Could not find DWO CU buck-out/v2/gen/fbcode/ad98fc21e927c889/folly/io/async/__async_base__/__objects__/EventBase.cpp.o(0x7990ef623e52da16) referenced by CU at offset 0x1029d6 [in module /data/users/xiangxu1121/fboss_dbg/packages/openr:325/openr.debuginfo]
#6  0x000000000266571d in folly::EventBase::loopMain(int, bool) () at fbcode/folly/Function.h:375
#7  0x0000000002d20cc3 in folly::EventBase::loopOnce(int) () at fbcode/folly/io/async/EventBase.cpp:345
#8  0x0000000002542c12 in folly::EventBase::~EventBase() () at fbcode/folly/io/async/EventBase.cpp:195
#9  0x00000000035a420e in openr::KvStore<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::~KvStore() ()
    at fbcode/openr/kvstore/KvStore.h:629
#10 0x0000000003585a9e in main ()
    at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/unique_ptr.h:85
```
# RCA
High level speaking, the issue happens with the following order:
- Step1: `KvStoreDb::stop()` destructed `thriftPeers_` obj
- Step2: `KvStore::stop()` invokes `OpenrEventBase::stop()`, which destructs `folly::EventBase`, which will call `loopOnce` to clean up injected callbacks in the eventbase.
- Step3: In crash scenario, there are >1 callbacks being invoked(e.g. `processThriftSuccess` or `processThriftFailure`).
- Step4: Sample failures are:
   - `processThriftSuccess` P576623566
   - `processThriftFailure` P592005831
We can see there are callbacks being invoked with:
   - `requestThriftPeerSync`
   - `keepAlive`

# Fix
The fix should be straight-forward, which is to cancel the scheduled cb from `folly::AsyncTimeout` perspective via `reset()` or `cancelTimeout()` call.

NOTE: for the keepAlive `AsyncTimeout`, we can consider remove it by leveraging the socket keepalive option and simplify this logic.

Reviewed By: TangoRoxy

Differential Revision: D47691545

fbshipit-source-id: 67108399a7ce9f96d08682d45d03799e6ef608fd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants