-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when reallocating in rcl_clock_remove_jump_callback #938
Comments
Based on your latter two stack traces it looks like the double-free is coming from traceProgram terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f66f8bc4380 (LWP 23341))] (gdb) backtrace #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f66f6c60801 in __GI_abort () at abort.c:79 #2 0x00007f66f6ca9897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f66f6dd6b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181 #3 0x00007f66f6cb090a in malloc_printerr (str=str@entry=0x7f66f6dd8828 "double free or corruption (fasttop)") at malloc.c:5350 #4 0x00007f66f6cb31d5 in _int_free (av=0x7f66f700bc40 , p=0x5606adb31200, have_lock=) at malloc.c:4230 #5 0x00007f66f6cb579e in _int_realloc (av=av@entry=0x7f66f700bc40 , oldp=oldp@entry=0x5606adb2fc40, oldsize=oldsize@entry=5616, nb=nb@entry=5568) at malloc.c:4650 #6 0x00007f66f6cb8f9b in __GI___libc_realloc (oldmem=0x5606adb2fc50, bytes=5560) at malloc.c:3230 #7 0x00007f66f6562349 in rcl_clock_remove_jump_callback () from /opt/ros/eloquent/lib/librcl.so #8 0x00007f66f6563d62 in rcl_timer_fini () from /opt/ros/eloquent/lib/librcl.so #9 0x00007f66f7d29a92 in ?? () from /opt/ros/eloquent/lib/librclcpp.so #10 0x00007f66f7cb05d2 in rclcpp::memory_strategies::allocator_memory_strategy::AllocatorMemoryStrategy >::remove_null_handles(rcl_wait_set_t*) () from /opt/ros/eloquent/lib/librclcpp.so #11 0x00007f66f7ca2327 in rclcpp::executor::Executor::wait_for_work(std::chrono::duration >) () from /opt/ros/eloquent/lib/librclcpp.so #12 0x00007f66f7ca2775 in rclcpp::executor::Executor::get_next_executable(rclcpp::executor::AnyExecutable&, std::chrono::duration >) () from /opt/ros/eloquent/lib/librclcpp.so #13 0x00007f66f7ca66e0 in rclcpp::executors::SingleThreadedExecutor::spin() () from /opt/ros/eloquent/lib/librclcpp.so #14 0x00007f66f7ca33e2 in rclcpp::spin(std::shared_ptr) () from /opt/ros/eloquent/lib/librclcpp.so In particular the function Though, nothing wrong about the implementation stands out to me. It's possible there's a race between threads at the |
So I now throttled the scan rate from 25Hz to 12.5Hz, and it runs for about 10 hours now. |
It sounds like this rclcpp/rclcpp/src/rclcpp/clock.cpp Lines 129 to 131 in 4a5eed9
|
Now I tried amcl also, and it resultet in a seg fault after ~5 hours.
|
@maxlein That error seems unrelated to the previous one. About the first one, check my previous comment #938 (comment). |
I don't feel I have the knowledge about rcl to fix this, I may break more than fix ;) So either it happens only on my system or I am the only person who uses ROS2 for longer periods of time? |
Thanks! This will take time to test, as I have no source build on the robot yet ... |
@maxlein Friendly ping. Are you going to be able to test this? |
Problem is on my dev notebook I can't reproduce this bug although nearly the same system... Is there a date when this fix is released in a deb pkg? |
The problem is that the fix breaks ABI ... If that PR is really fixing something, I can ask it to be backported, and discuss if breaking ABI worth it. Do you have a way to test the fix without a deb pkg? |
I have time to set up a source build on the robot in about a month earliest. |
Sounds good. Thanks @maxlein! |
…#938) * Handle RCL_WARN_UNUSED * Avoid duplicate type names Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
I don't really know how I can describe this best.
My robot is running 24/7. About 7 nodes running. (driver, laser, imu, localization, navigation, etc)
And somehow the
localization_node
from slam_toolbox is dying after ~1 to 3 hours, no other node.It seems obvious that the toolbox does something wrong, but the stacktraces say something different.
-> toolbox issue
Crashes on executor, tf subscriber, message_filter ( issue )
Maybe someone of you guys has any idea what's going on here.
I am a bit clueless here.
Bug report
Required Info:
Steps to reproduce issue
Some setup for getting slam_toolbox working is needed.
I am running a "standard" robot stack.
No simulation, real hardware.
scan->base_link->odom->map
localization_node
of slam_toolbox needs to run and after some time ( 1to 3 hours ) it crashes.Expected behavior
No node crashes.
Actual behavior
localization_node
crashes with different errorsThe text was updated successfully, but these errors were encountered: