-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel names should be required to start with / #5
Comments
frc971-automation
pushed a commit
that referenced
this issue
Aug 17, 2023
This patch came out of trying to make `message_bridge_test` less flaky. The test currently fails occasionally without an error message. It just stops. When run under `gdb`, we see the following: ... [New Thread 0x7fffeda496c0 (LWP 59)] I20230806 21:23:50.636907 59 message_bridge_test_lib.cc:28] Started pong [New Thread 0x7fffed3e06c0 (LWP 60)] I20230806 21:23:50.648432 60 message_bridge_test_lib.cc:28] Started ping [New Thread 0x7fffecd776c0 (LWP 61)] I20230806 21:23:50.658921 61 message_bridge_test_lib.cc:28] Started pi1_message_bridge_server I20230806 21:23:50.702045 61 shm_event_loop.cc:1026] Setting priority to 1 [New Thread 0x7fffec70e6c0 (LWP 62)] I20230806 21:23:50.705500 62 message_bridge_test_lib.cc:28] Started pi1_message_bridge_client I20230806 21:23:50.718152 62 shm_event_loop.cc:1026] Setting priority to 1 [New Thread 0x7fffec0a56c0 (LWP 63)] I20230806 21:23:50.724790 63 message_bridge_test_lib.cc:28] Started pi2_message_bridge_client I20230806 21:23:50.735529 63 shm_event_loop.cc:1026] Setting priority to 1 [New Thread 0x7fffeba3c6c0 (LWP 64)] I20230806 21:23:50.739696 64 message_bridge_test_lib.cc:28] Started pi2_message_bridge_server I20230806 21:23:50.751094 64 shm_event_loop.cc:1026] Setting priority to 1 Thread 3 "ping" received signal SIG36, Real-time event 36. [Switching to Thread 0x7fffed3e06c0 (LWP 60)] __GI___pthread_sigmask (how=1, newmask=<optimized out>, oldmask=0x7fffed3de2f8) at ./nptl/pthread_sigmask.c:43 43 ./nptl/pthread_sigmask.c: No such file or directory. #0 __GI___pthread_sigmask (how=1, newmask=<optimized out>, oldmask=0x7fffed3de2f8) at ./nptl/pthread_sigmask.c:43 #1 0x00007ffff530c503 in aos::ipc_lib::(anonymous namespace)::wrapped_pthread_sigmask (how=1, set=0xb2cf38, oldset=0x7fffed3de2f8) at aos/ipc_lib/signalfd.cc:65 #2 0x00007ffff530c558 in aos::ipc_lib::SignalFd::~SignalFd (this=0xb2cf30) at aos/ipc_lib/signalfd.cc:113 #3 0x00007ffff58b2c9b in std::default_delete<aos::ipc_lib::SignalFd>::operator() (this=0x7fffffffd4e0, __ptr=0xb2cf30) at external/amd64_debian_sysroot//usr/include/c++/10/bits/unique_ptr.h:85 #4 0x00007ffff58b473c in std::__uniq_ptr_impl<aos::ipc_lib::SignalFd, std::default_delete<aos::ipc_lib::SignalFd> >::reset (this=0x7fffffffd4e0, __p=0x0) at external/amd64_debian_sysroot//usr/include/c++/10/bits/unique_ptr.h:182 #5 0x00007ffff58a535d in std::unique_ptr<aos::ipc_lib::SignalFd, std::default_delete<aos::ipc_lib::SignalFd> >::reset (this=0x7fffffffd4e0, __p=0x0) at external/amd64_debian_sysroot//usr/include/c++/10/bits/unique_ptr.h:456 #6 0x00007ffff58a213e in aos::ShmEventLoop::Run (this=0x7fffffffd200) at aos/events/shm_event_loop.cc:1063 #7 0x00007ffff7eb3980 in aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0::operator()() const ( this=0xaef998) at aos/network/message_bridge_test_lib.cc:30 #8 0x00007ffff7eb382d in std::__invoke_impl<void, aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0>(std::__invoke_other, aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0&&) (__f=...) at external/amd64_debian_sysroot//usr/include/c++/10/bits/invoke.h:60 #9 0x00007ffff7eb37bd in std::__invoke<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0>(aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0&&) (__fn=...) at external/amd64_debian_sysroot//usr/include/c++/10/bits/invoke.h:95 #10 0x00007ffff7eb3795 in std::thread::_Invoker<std::tuple<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0xaef998) at external/amd64_debian_sysroot//usr/include/c++/10/thread:264 #11 0x00007ffff7eb3765 in std::thread::_Invoker<std::tuple<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0> >::operator()() (this=0xaef998) at external/amd64_debian_sysroot//usr/include/c++/10/thread:271 #12 0x00007ffff7eb36c9 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<aos::message_bridge::testing::ThreadedEventLoopRunner::ThreadedEventLoopRunner(aos::ShmEventLoop*)::$_0> > >::_M_run() (this=0xaef990) at external/amd64_debian_sysroot//usr/include/c++/10/thread:215 #13 0x00007ffff7cd44a3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x00007ffff48a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #15 0x00007ffff49285fc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Sometimes, as soon as the `ping` thread starts exiting, a wakeup signal arrives. That causes the default signal handler to be called. Which for SIGRT2 is to terminate the program. For most applications that's probably fine in the grand scheme of things, but it's a bit annoying for tests. Especially for `message_bridge_test` which has multiple threads with watchers. I am not 100% convinced that the issue is happening because of a race condition in `lockless_queue.cc`, but it seems the most likely candidate. Quoted from the `LocklessQueueWakeUpper::Wakeup` function: // Grab a copy so it won't change out from underneath us, and we can sort it // nicely in C++. // Do note that there is still a window where the process can die *after* we // read everything. We will still PI boost and send a signal to the thread in // question. There is no way without pidfd's to close this window, and // creating a pidfd is likely not RT. The way I'm reading this is that there's a window between the sender grabbing a copy of the watchers and the watcher unblocking the signal. That window allows for the receiver to be killed before it's finished tearing down. Rough sequence of events: 1. The sender reads the list of watchers from SHMEM. 2. The reader shuts down and unblocks the wakeup signal. 3. The sender sends the wakeup signal to all watchers. 4. The reader invokes the default signal handler for the wakeup signal. By default this causes the process to exit uncleanly immediately. Before this patch, I saw 20-30 tests out of 1000 fail due to this error. After this patch, there are no failures of this kind. Change-Id: I12a18e2977b50b9cb33334373e131ae9774e7579 Signed-off-by: Philipp Schrader <philipp.schrader@gmail.com>
frc971-automation
pushed a commit
that referenced
this issue
Feb 17, 2024
We're currently seeing the following undefined behaviour in the code that converts `realtime_clock::time_point` values into human-readable strings. The issue is that the computation overflows what we can represent in 64 bits when stringifying `realtime_clock::min_time`. $ bazel test --config=ubsan @aos//aos/events/logging:logfile_utils_test --test_filter=PartsMessageReaderDeathTest.TooFarOutOfOrder --test_output=streamed --test_env=UBSAN_SYMBOLIZER_PATH="$(readlink -f external/llvm_k8/bin/llvm-symbolizer)" ... [ RUN ] PartsMessageReaderDeathTest.TooFarOutOfOrder external/aos/aos/events/logging/logfile_utils_test.cc:169: Failure Death test: { reader.ReadMessage(); } Result: died but not with expected error. Expected: contains regular expression "-0.000000001sec vs. 0.000000000sec" Actual msg: [ DEATH ] external/llvm_k8/include/c++/v1/__chrono/duration.h:102:59: runtime error: signed integer overflow: -9223372037 * 1000000000 cannot be represented in type '_Ct' (aka 'long long') ... [ DEATH ] #4 0x7f6788f54d9a in aos::operator<<(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, std::__1::chrono::time_point<aos::realtime_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) external/aos/aos/time/time.cc:189:55 [ DEATH ] #5 0x7f67938da571 in aos::logger::operator<<(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, aos::logger::LogParts const&) external/aos/aos/events/logging/logfile_sorting.cc:2163:50 [ DEATH ] #6 0x7f6793aeec53 in aos::logger::PartsMessageReader::ReadMessage() external/aos/aos/events/logging/logfile_utils.cc:1679:32 [ DEATH ] #7 0x55ed45105cb0 in aos::logger::testing::PartsMessageReaderDeathTest_TooFarOutOfOrder_Test::TestBody() external/aos/aos/events/logging/logfile_utils_test.cc:169:3 ... [ DEATH ] [ DEATH ] SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior external/llvm_k8/include/c++/v1/__chrono/duration.h:102:59 in [ DEATH ] [ FAILED ] PartsMessageReaderDeathTest.TooFarOutOfOrder (200 ms) [----------] 1 test from PartsMessageReaderDeathTest (200 ms total) This patch fixes the issue by detecting potential overflow and return a mostly non-human-readable string version of the time stamp. Change-Id: I4bff6272ac6f4d48c646face2c13fbc7dd3bb0ed Signed-off-by: James Kuszmaul <james.kuszmaul@bluerivertech.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've seen evidence that we don't enforce that channel names start with /. We should fix that.
The text was updated successfully, but these errors were encountered: