-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Build Scripts not working OOB #3
Comments
Maybe it's the ubuntu16.04 version of GCC......too high |
Hi @remingtonc: Apologies for the misinformation in the top level readme. For docker builds, it actually requires a bit more than just running that script. can you take a look at build/fbcode_builder/README.md. That script is meant only for building on aTravis CI host. If you're trying to build with docker locally you will need to generate the docker context locally and point the build script at it by setting the appropriate variables. There is still a minor error with that script that we should have a fix for soon. In the meantime, if you can, I would suggest just building in your standard environment. You can run build/build_openr.sh to build and install all dependencies plus openr on your machine. This works well on Ubuntu 16.04. |
Summary: This should fix the Travis CI builds. It adds rust toolchain support inside docker and sets the required THRIFT env variable. Pull Request resolved: facebookexperimental/rust-shed#3 Reviewed By: krallin Differential Revision: D18905608 Pulled By: lukaspiatkowski fbshipit-source-id: 5db1eff6f215a6617d8acaa0c99a62d45225956b
Summary: There is a race condition when KvStoreWrapper and ZMQ background threads are destroyed. This diff is an attempt to explicitly destroy KvStoreWrapper in TearDown Also use unique_ptr instead of shared_ptr ``` WARNING: ThreadSanitizer: data race (pid=49594) Write of size 8 at 0x7ba000000220 by thread T1: #0 close <null> (link_monitor_test+0xdb1f6a) #1 zmq::signaler_t::~signaler_t() /home/engshare/third-party2/zeromq/4.3.1/src/zeromq-4.3.1/src/signaler.cpp:114:20 (link_monitor_test+0xbd1274) Previous read of size 8 at 0x7ba000000220 by main thread: #0 epoll_ctl <null> (link_monitor_test+0xd9bdef) #1 epoll_del /home/engshare/third-party2/libevent/1.4.14b_hphp/src/libevent-1.4.14b-stable/epoll.c:485 (link_monitor_test+0xec4253) #2 folly::EventBaseEvent::eb_event_del() <null> (link_monitor_test+0xa747ef) #3 folly::EventHandler::~EventHandler() <null> (link_monitor_test+0xa74a70) #4 std::_Hashtable<int, std::pair<int const, openr::OpenrEventBase::ZmqEventHandler>, std::allocator<std::pair<int const, openr::OpenrEventBase::ZmqEventHandler> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::~_Hashtable() <null> (link_monitor_test+0x219620) #5 openr::OpenrEventBase::~OpenrEventBase() <null> (link_monitor_test+0x212d8d) #6 openr::KvStore::~KvStore() <null> (link_monitor_test+0x1bfbb7) #7 openr::KvStore::~KvStore() <null> (link_monitor_test+0x1bfbe9) #8 openr::KvStoreWrapper::~KvStoreWrapper() <null> (link_monitor_test+0x117533) #9 std::_Sp_counted_ptr_inplace<openr::KvStoreWrapper, std::allocator<openr::KvStoreWrapper>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() <null> (link_monitor_test+0x116fbd) #10 LinkMonitorTestFixture::~LinkMonitorTestFixture() <null> (link_monitor_test+0xf8285) #11 LinkMonitorTestFixture_BasicOperation_Test::~LinkMonitorTestFixture_BasicOperation_Test() <null> (link_monitor_test+0xf7ee9) #12 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) <null> (link_monitor_test+0xc4b2ae) #13 __libc_start_main /home/engshare/third-party2/glibc/2.26/src/glibc-2.26/csu/libc-start.c:308:16 (libc.so.6+0x211a5) ``` Reviewed By: yi-xian Differential Revision: D20148667 fbshipit-source-id: 71634c08df8ebbc98a6b8c4aa3329166834453f2
Summary: - replace ZMQTimeout with folly::AsyncTimeout - fix the flaky UT RangeAllocatorFixture.InsufficentRange: ## #1 issue: - `heap-use-after-free` issue: P132339616 (see the `cb->secondAddress: 0x30`). The allocator callback is deconstructed before it's called again. - FIX: add the `runImmediatelyOrRunInEventBaseThreadAndWait` to `~RangeAllocator()`. ## #2 issue: Flaky: `folly::Baton` could be post() more than one time: P132813435 ``` headers/folly/synchronization/Baton.h:139: void folly::Baton<true, std::atomic>::post() [MayBlock = true, Atom = std::atomic]: Assertion `before == INIT || before == WAITING || before == TIMED_OUT' failed. ``` - post() could be called from: - `KvStoreClientInternal::processPublication()` callback: https://fburl.com/diffusion/liktp9pv - `RangeAllocator<T>::keyValUpdated()` callback: https://fburl.com/diffusion/iw0l201i - FIX: add a `baton_post_once` lambda function into UT, make sure for each lifetime of a Baton, there can be at most one call to post(). ## #3 issue: Flaky/Race: calling `KvStoreClientInternal::dumpAllWithPrefix()` before all the keys were set, then `isRangeConsumed()` got less keys than expected. It happened 7 times when `stress-runs=20000`: ``` openr/allocators/tests/RangeAllocatorTest.cpp:391: Failure Value of: allocators.front()->isRangeConsumed() Actual: false Expected: true ... ✓ Pass: openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/1 (0.255) ✓ Pass: openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/1 (0.291) ✓ Pass: openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/1 (0.312) ✓ Pass: openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/1 (0.336) Summary Pass: 19992 Fail: 7 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/0 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/0 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/0 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/0 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/1 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/0 ✗ openr/allocators/tests:range_allocator_test - RangeAllocatorInstance/RangeAllocatorFixture.InsufficentRange/0 log: https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849891490018/ ``` - failure log analysis with 5 KvStoreClients: P132822741 The last `setKey()` called at: ```11:57:44.401295 565757 KvStoreClientInternal.cpp:321] KvStoreClientInternal: setKey called for key value:63 ``` After 1.559ms `isRangeConsumed()` called `dumpAllWithPrefix()` to get and iterate all prefixes: ``` 11:57:44.402854 565742 RangeAllocator-inl.h:176] val: 62, allocRange_.first ,61 allocRange_.second64 ``` I think `KvStore::setKvStoreKeyVals()` for that key was still in progress, so the dump() could not get all the keys. - FIX: `isRangeConsumed()` is called right after `Baton.wait()`, so wait for 500ms before calling the `Baton.post()` Reviewed By: saifhhasan Differential Revision: D21933504 fbshipit-source-id: c55b2561ce986778f41ad6a76deac25ea04589c1
Summary: As titled, several different approaches have been added to address original issue of crash when eventbase is destructed. See P135204939 ``` (gdb) bt #0 0x000000000187ea8e in re2::RE2::Set::Match (this=0x2, text=..., v=0x7fffd8c0d500, error_info=0x0) at re2/set.cc:110 #1 0x000000000072d304 in openr::KeyPrefix::keyMatch (this=<optimized out>, key=...) at openr/common/Util.cpp:50 #2 0x00000000006d3649 in openr::KvStoreFilters::keyMatch (this=0x7fb22c613660, key=..., value=...) at openr/kvstore/KvStore.cpp:69 #3 0x00000000006d60e4 in openr::KvStore::mergeKeyValues (kvStore=..., keyVals=..., filters=...) at openr/kvstore/KvStore.cpp:246 #4 0x00000000006e3813 in openr::KvStoreDb::mergePublication (this=0x7fb232fbad28, rcvdPublication=..., senderId=...) at openr/kvstore/KvStore.cpp:2835 #5 0x00000000006e32d7 in openr::KvStoreDb::processThriftSuccess (this=0x7fb232fbad28, peerName=..., pub=..., timeDelta=...) at openr/kvstore/KvStore.cpp:1395 #6 0x000000000070487b in openr::KvStoreDb::requestThriftPeerSync()::$_19::operator()(openr::thrift::Publication&&) const ( this=<optimized out>, pub=...) at openr/kvstore/KvStore.cpp:1338 #7 folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}::operator()() const (this=<optimized out>) at folly/futures/Future-inl.h:99 #8 folly::futures::detail::InvokeResultWrapper<void>::wrapResult<folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}>(folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&)::{lambda()#1}) (fn=...) at folly/futures/Future-inl.h:91 #9 folly::futures::detail::wrapInvoke<openr::thrift::Publication, openr::KvStoreDb::requestThriftPeerSync()::$_19>(folly::Try<openr::thrift::Publication>&&, openr::KvStoreDb::requestThriftPeerSync()::$_19&&) (t=..., f=...) at folly/futures/Future-inl.h:109 #10 folly::Future<openr::thrift::Publication>::thenValue<openr::KvStoreDb::requestThriftPeerSync()::$_19>(openr::KvStoreDb::requestThriftPeerSync()::$_19&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&)#1}::operator()(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&) (this=<optimized out>, t=...) at folly/futures/Future-inl.h:1033 #11 folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<openr::thrift::Publication>::thenValue<openr::KvStoreDb::requestThriftPeerSync()::$_19>(openr::KvStoreDb::requestThriftPeerSync()::$_19&&) &&::{lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&)#1}>::invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<openr::thrift::Publication> >(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<openr::thrift::Publication>&&) (this=<optimized out>, args=..., args=...) at folly/futures/Future-inl.h:145 ... #19 0x00000000015d00ee in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7fb22c63b1f0) at folly/Function.h:416 #20 folly::EventBase::FunctionLoopCallback::runLoopCallback (this=0x7fb22c63b1c0) at folly/io/async/EventBase.h:188 #21 folly::EventBase::runLoopCallbacks (this=<optimized out>) at folly/io/async/EventBase.cpp:703 #22 folly::EventBase::loopBody (this=0x7fb23ecb9410, flags=1, ignoreKeepAlive=false) at folly/io/async/EventBase.cpp:402 #23 0x00000000015cdc60 in folly::EventBase::loopOnce (this=0x7fb23ecb9410, flags=0) at folly/io/async/EventBase.cpp:330 #24 folly::EventBase::~EventBase (this=0x7fb23ecb9410, vtt=<optimized out>) at folly/io/async/EventBase.cpp:211 #25 0x00000000006f284e in openr::KvStore::~KvStore (this=0x7fb23ecb9400) at openr/kvstore/KvStore.h:532 ``` To fix this, we introduced map to hold every individual future from thrift client. However, this occasionally will make KvStore destruction stuck when waiting for all futures to be fulfilled. We should NOT track every individual future, which is NOT necessary at all. From the crash trace, clearly, we are doing `mergePublications()` when invoking `processThriftPublication()` before checking if the peer is valid or NOT. Fix: Ignore the rest logic of callback `processThriftPublication()` if peerName is NOT valid. Reviewed By: saifhhasan Differential Revision: D24262710 fbshipit-source-id: fa69aaa5c6e43cfc861de7431b9c1e26195684a0
…_ destroyed Summary: # background Open/R has been haunted with unclean exit issue carried over multiple oncall rotations. Especially the notorious one in `KvStore`, like the following gdb traces happened inside T158806075 ``` [xiangxu1121@devvm1867.nao0 ~/local/fbsource (470cb762a)]$ fboss-dbg-helper gdb fsw008.p062.f01.rva2 0) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58 Enter the file's number: 0 0) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/dogpile.unknown_tw_task.servicerouter.1583324.QUEUE_LAG.230715-025056 - Sat Jul 15 02:50:56 2023 1) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/dynocat - Fri Jul 21 15:05:59 2023 2) /home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/openr - Wed Jul 19 09:07:29 2023Enter the file's number: 2 INFO:root:Received output from file command: INFO:root:/home/xiangxu1121/local/fboss_dbg/fsw008.p062.f01.rva2/snapshot_2023-07-21-15:42:58/cores/openr: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/openr --v=1 --vmodule=BgpSerializer*=1,FiberBgp*=2 --logging=DBG1;def', real uid: 36662, effective uid: 36662, real gid: 36337, effective gid: 36337, execfn: '/usr/sbin/openr', platform: 'x86_64' (gdb) bt #0 0x00000000035aa5b4 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::KvStorePeer>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::KvStorePeer> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::find(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/hashtable_policy.h:431 #1 0x00000000035baf56 in openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::processThriftFailure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, folly::basic_fbstring<char, std::char_traits<char>, std::allocator<char>, folly::fbstring_core<char> > const&, std::chrono::duration<long, std::ratio<1l, 1000l> >) () at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/unordered_map.h:869 #2 0x00000000035c4fac in openr::KvStoreDb<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::requestThriftPeerSync()::{lambda(folly::exception_wrapper const&)#1}::operator()(folly::exception_wrapper const&) const () at fbcode/openr/kvstore/KvStore-inl.h:1753 #3 0x00000000035c4d0c in _ZZNO5folly6FutureINS_4UnitEE13thenErrorImplIZN5openr9KvStoreDbIN6apache6thrift6ClientINS4_6thrift12OpenrCtrlCppEEEE21requestThriftPeerSyncEvEUlRKNS_17exception_wrapperEE_EENSt9enable_ifIXntsr20isFutureOrSemiFutureINS_13invoke_detail6traitsIT_E6resultISD_EEEE5valueES2_E4typeEOSK_NS_7futures6detail18InlineContinuationEENUlONS_8Executor9KeepAliveISU_EEONS_3TryIS1_EEE_clESX_S10_ () at fbcode/folly/futures/Future-inl.h:137 #4 0x00000000035c4bea in _ZN5folly6detail8function14FunctionTraitsIFvRNS_7futures6detail8CoreBaseEONS_8Executor9KeepAliveIS7_EEPNS_17exception_wrapperEEE7callBigIZNS4_4CoreINS_4UnitEE11setCallbackIZNOS_6FutureISH_E13thenErrorImplIZN5openr9KvStoreDbIN6apache6thrift6ClientINSN_6thrift12OpenrCtrlCppEEEE21requestThriftPeerSyncEvEUlRKSB_E_EENSt9enable_ifIXntsr20isFutureOrSemiFutureINS_13invoke_detail6traitsIT_E6resultISB_EEEE5valueESL_E4typeEOS12_NS4_18InlineContinuationEEUlSA_ONS_3TryISH_EEE_EEvS18_OSt10shared_ptrINS_14RequestContextEES19_EUlS6_SA_SC_E_EEvS6_SA_SC_RNS1_4DataE () at fbcode/folly/futures/detail/Core.h:619 warning: Could not find DWO CU buck-out/v2/gen/fbcode/ad98fc21e927c889/folly/futures/detail/__core__/__objects__/Core.cpp.o(0xb8563f2cc6214b1a) referenced by CU at offset 0x10ac73 [in module /data/users/xiangxu1121/fboss_dbg/packages/openr:325/openr.debuginfo] #5 0x0000000002408dc9 in folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0::operator()(folly::Executor::KeepAlive<folly::Executor>&&) () at fbcode/folly/Function.h:375 warning: Could not find DWO CU buck-out/v2/gen/fbcode/ad98fc21e927c889/folly/io/async/__async_base__/__objects__/EventBase.cpp.o(0x7990ef623e52da16) referenced by CU at offset 0x1029d6 [in module /data/users/xiangxu1121/fboss_dbg/packages/openr:325/openr.debuginfo] #6 0x000000000266571d in folly::EventBase::loopMain(int, bool) () at fbcode/folly/Function.h:375 #7 0x0000000002d20cc3 in folly::EventBase::loopOnce(int) () at fbcode/folly/io/async/EventBase.cpp:345 #8 0x0000000002542c12 in folly::EventBase::~EventBase() () at fbcode/folly/io/async/EventBase.cpp:195 #9 0x00000000035a420e in openr::KvStore<apache::thrift::Client<openr::thrift::OpenrCtrlCpp> >::~KvStore() () at fbcode/openr/kvstore/KvStore.h:629 #10 0x0000000003585a9e in main () at ./fbcode/third-party-buck/platform010-compat/build/libgcc/include/c++/trunk/bits/unique_ptr.h:85 ``` # RCA High level speaking, the issue happens with the following order: - Step1: `KvStoreDb::stop()` destructed `thriftPeers_` obj - Step2: `KvStore::stop()` invokes `OpenrEventBase::stop()`, which destructs `folly::EventBase`, which will call `loopOnce` to clean up injected callbacks in the eventbase. - Step3: In crash scenario, there are >1 callbacks being invoked(e.g. `processThriftSuccess` or `processThriftFailure`). - Step4: Sample failures are: - `processThriftSuccess` P576623566 - `processThriftFailure` P592005831 We can see there are callbacks being invoked with: - `requestThriftPeerSync` - `keepAlive` # Fix The fix should be straight-forward, which is to cancel the scheduled cb from `folly::AsyncTimeout` perspective via `reset()` or `cancelTimeout()` call. NOTE: for the keepAlive `AsyncTimeout`, we can consider remove it by leveraging the socket keepalive option and simplify this logic. Reviewed By: TangoRoxy Differential Revision: D47691545 fbshipit-source-id: 67108399a7ce9f96d08682d45d03799e6ef608fd
Supplied build instructions/script via Docker do not work out of the box on Ubuntu 16.04.
Initially Travis was not installed, installed the Travis gem thinking it might be the issue - but did not remedy the problem. As someone without much Travis experience, some build clarification would be incredibly helpful.
Are there extra configuration steps required before script execution?
Build commands in the fbcode_builder do work, however.
The text was updated successfully, but these errors were encountered: