-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pybind11 for signal handling, and delete now unused rclpy_common, pycapsule, and handle code #814
Conversation
@ivanpauno may I ask for a review of this one, especially since changes to handle sigterm like sigint are probably coming? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm worried about signal-safety of some parts of the code, e.g. I don't think that triggering a guard condition is necessarily signal-safe, but that's preexistent code.
rcl_guard_condition_t ** old_gcs = g_guard_conditions.exchange(new_gcs); | ||
if (NULL != old_gcs) { | ||
allocator.deallocate(old_gcs, allocator.state); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is preexistent code, but why are we using atomics here?
If this is for thread-safety, it doesn't look okay.
We're maybe alraedy taking a mutex in the python code, if that's the case this is okay.
If not we should be taking a global mutex here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're maybe alraedy taking a mutex in the python code, if that's the case this is okay.
There is a global mutex in the Python code; it's holding onto the GIL. I think the atomics are to make it safe to be interrupted by the signal handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a global mutex in the Python code; it's holding onto the GIL.
Yes, but the signal handler might get executed without the GIL locked, and that might be an issue here (main thread executing the signal handler and another thread deallocating the array).
I think we should be using the python signal module instead of C signal handlers, which only flags the main thread to run the signal handler later, avoiding most of this issues.
I would left this comment unaddressed here, as it's unrelated to this PR and I want to avoid conflicts with #830.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the signal handler might get executed without the GIL locked, and that might be an issue here (main thread executing the signal handler and another thread deallocating the array)
Ah, I didn't realize other threads kept running when a signal is received.
@sloretz friendly ping |
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>
Signed-off-by: Shane Loretz <sloretz@openrobotics.org>
959fe99
to
0cbce01
Compare
Windows failures re not in the most recent nightly Windows repeated job, so investigating these.
OSX Failures are also all in the most recent nightlty OSX repeated job, so it's unlikely they're caused by this PR: https://ci.ros2.org/view/nightly/job/nightly_osx_repeated/2493/#showFailuresLink
|
Windows CI failure is definitely not due to this PR. It's due to a client not getting a service response. I don't know why that is, but I opened ros2/launch_ros#273 to improve how the test responds to that case. |
@ivanpauno if this still looks good to you after the new commit, I think CI is good enough to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
🕵️♂️ I think is causing test lots of test regressions in windows CI.
It seems that Windows can't find the correct path to Can I ask you to take a quick look? @sloretz You may have more context about what's going on. |
This uses pybind11 for the signal handler APIs. It replaces
rcutils
atomics with withstd::atomic
because C and C++ atomics headers can't be mixed (until C++23?). Finally this deletes the now unused code for handle, pycapsule, and rclpy_common.Replaces #728
Closes #665 (finally 🎉)