Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cartographer_node" received signal SIGILL, Illegal instruction. #10

Closed
cschuet opened this issue Jun 6, 2018 · 16 comments · Fixed by cartographer-project/cartographer_ros#893

Comments

@cschuet
Copy link

cschuet commented Jun 6, 2018

We installed the ros-melodic-cartographer-ros package from the ros-shadow-fixed repository and it seems none of the binaries can be run as they crash with an "Illegal instruction" error.

Reproduced on two systems:

uname -a
Linux lenovo-Y50 4.15.0-22-generic #24-Ubuntu SMP Wed May 16 12:15:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

and

uname -a
Linux ubuntu 4.15.0-22-generic #24-Ubuntu SMP Wed May 16 12:15:17 UTC 
2018 x86_64 x86_64 x86_64 GNU/Linux
lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04 LTS
Release:	18.04
Codename:	bionic

Observation: file produces different signatures for the cartographer and the other ROS melodic binaries

file /opt/ros/melodic/bin/cartographer_occupancy_grid_node 
/opt/ros/melodic/bin/cartographer_occupancy_grid_node: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=b46d61360a0b88eda46f3dd4e9526899e769f18c, stripped
file /opt/ros/melodic/bin/rviz
/opt/ros/melodic/bin/rviz: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=586128baac0a81db175b2968490a8c9bc8538944, stripped

cartographer_node

Starting program: /opt/ros/melodic/lib/cartographer_ros/cartographer_node -configuration_directory /opt/ros/melodic/share/cartographer_ros/configuration_files -configuration_basename backpack_2d.lua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe9dc0700 (LWP 18715)]
[New Thread 0x7fffe95bf700 (LWP 18716)]
[New Thread 0x7fffe8dbe700 (LWP 18717)]
[New Thread 0x7fffe3fff700 (LWP 18718)]

Thread 1 "cartographer_no" received signal SIGILL, Illegal instruction.
0x0000555555666598 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, char const*) ()
(gdb) bt
#0  0x0000555555666598 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, char const*) ()
#1  0x00007ffff5b315a1 in ros::start() () from /opt/ros/melodic/lib/libroscpp.so
#2  0x000055555564e167 in main ()

cartographer_rosbag_validate

Starting program: /opt/ros/melodic/lib/cartographer_ros/cartographer_rosbag_validate -bag_filename /home/gaschler/Downloads/b2-2016-04-05-14-44-52.bag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
cartographer_ros::(anonymous namespace)::LaserScanToPointCloudWithIntensities<sensor_msgs::MultiEchoLaserScan_<std::allocator<void> > > (msg=...)
    at ./cartographer_ros/msg_conversion.cc:104
104	./cartographer_ros/msg_conversion.cc: No such file or directory.
(gdb) bt
#0  cartographer_ros::(anonymous namespace)::LaserScanToPointCloudWithIntensities<sensor_msgs::MultiEchoLaserScan_<std::allocator<void> > > (msg=...)
    at ./cartographer_ros/msg_conversion.cc:104
#1  0x0000555555588a7d in cartographer_ros::ToPointCloudWithIntensities (msg=...) at ./cartographer_ros/msg_conversion.cc:181
#2  0x000055555556e94f in cartographer_ros::(anonymous namespace)::RangeDataChecker::ReadRangeMessage<sensor_msgs::MultiEchoLaserScan_<std::allocator<void> > > (
    to=<synthetic pointer>, from=<synthetic pointer>, range_checksum=0x7fffffffd460, message=...) at ./cartographer_ros/rosbag_validate_main.cc:218
#3  cartographer_ros::(anonymous namespace)::RangeDataChecker::CheckMessage<sensor_msgs::MultiEchoLaserScan_<std::allocator<void> > > (message=..., this=0x7fffffffd370)
    at ./cartographer_ros/rosbag_validate_main.cc:155
#4  cartographer_ros::(anonymous namespace)::Run (bag_filename=..., dump_timing=<optimized out>) at ./cartographer_ros/rosbag_validate_main.cc:275
#5  0x000055555556c7b2 in main (argc=<optimized out>, argv=<optimized out>) at ./cartographer_ros/rosbag_validate_main.cc:423
@cschuet
Copy link
Author

cschuet commented Jun 6, 2018

@mikaelarguedas Let us know how we can help to debug this.

@cschuet
Copy link
Author

cschuet commented Jun 6, 2018

Illegal instruction seems to be vmovdqu64

Dump of assembler code for function 
_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEOS8_PKS5_:
    0x0000555555695cb0 <+0>:     push   %r12
    0x0000555555695cb2 <+2>:     push   %rbp
    0x0000555555695cb3 <+3>:     mov    %rsi,%rbp
    0x0000555555695cb6 <+6>:     push   %rbx
    0x0000555555695cb7 <+7>:     mov    %rdi,%rbx
    0x0000555555695cba <+10>:    mov    %rdx,%rdi
    0x0000555555695cbd <+13>:    mov    %rdx,%r12
    0x0000555555695cc0 <+16>:    callq  0x555555658ec0 <strlen@plt>
    0x0000555555695cc5 <+21>:    movabs $0x7fffffffffffffff,%rcx
    0x0000555555695ccf <+31>:    sub    0x8(%rbp),%rcx
    0x0000555555695cd3 <+35>:    cmp    %rcx,%rax
    0x0000555555695cd6 <+38>:    ja     0x555555695d36 
<_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEOS8_PKS5_+134>
    0x0000555555695cd8 <+40>:    mov    %rax,%rdx
    0x0000555555695cdb <+43>:    mov    %r12,%rsi
    0x0000555555695cde <+46>:    mov    %rbp,%rdi
    0x0000555555695ce1 <+49>:    callq  0x555555659ef0 
<_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm@plt>
    0x0000555555695ce6 <+54>:    lea    0x10(%rbx),%rdx
    0x0000555555695cea <+58>:    mov    %rdx,(%rbx)
    0x0000555555695ced <+61>:    mov    (%rax),%rcx
    0x0000555555695cf0 <+64>:    lea    0x10(%rax),%rdx
    0x0000555555695cf4 <+68>:    cmp    %rdx,%rcx
    0x0000555555695cf7 <+71>:    je     0x555555695d28 
<_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEOS8_PKS5_+120>
    0x0000555555695cf9 <+73>:    mov    %rcx,(%rbx)
    0x0000555555695cfc <+76>:    mov    0x10(%rax),%rcx
    0x0000555555695d00 <+80>:    mov    %rcx,0x10(%rbx)
    0x0000555555695d04 <+84>:    mov    0x8(%rax),%rcx
    0x0000555555695d08 <+88>:    mov    %rcx,0x8(%rbx)
    0x0000555555695d0c <+92>:    mov    %rdx,(%rax)
    0x0000555555695d0f <+95>:    movq   $0x0,0x8(%rax)
    0x0000555555695d17 <+103>:   movb   $0x0,0x10(%rax)
    0x0000555555695d1b <+107>:   mov    %rbx,%rax
    0x0000555555695d1e <+110>:   pop    %rbx
    0x0000555555695d1f <+111>:   pop    %rbp
    0x0000555555695d20 <+112>:   pop    %r12
    0x0000555555695d22 <+114>:   retq
    0x0000555555695d23 <+115>:   nopl   0x0(%rax,%rax,1)
=> 0x0000555555695d28 <+120>:   vmovdqu64 0x10(%rax),%xmm0
    0x0000555555695d2f <+127>:   vmovups %xmm0,0x10(%rbx)
    0x0000555555695d34 <+132>:   jmp    0x555555695d04 
<_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EEOS8_PKS5_+84>
    0x0000555555695d36 <+134>:   lea    0x15a2dc(%rip),%rdi        # 
0x5555557f0019
    0x0000555555695d3d <+141>:   callq  0x5555556591a0 
<_ZSt20__throw_length_errorPKc@plt>
End of assembler dump.

@cschuet
Copy link
Author

cschuet commented Jun 6, 2018

@mikaelarguedas vmovdqu64 is part of AVX-512 which is only supported by a range of server CPUs (see here).

Could it be possible that your build servers have one of those CPUs and you had

-march=native

as a flag in the compilation call which would cause it to generate code for the host machines instruction code? Just a wild guess.

@cschuet
Copy link
Author

cschuet commented Jun 6, 2018

The cartographer binary was compiled for CPUs only supporting the following instruction sets

MODE64 (call)
CMOV (cmovg)
SSE1 (movss)
SSE2 (movsd)
AVX (vzeroupper)
AVX512 (vmovdqu64)
VLX (vmovdqu64)
FMA (vfmsub231sd)

@mikaelarguedas
Copy link
Contributor

@cschuet thanks for reporting it and investigating!

I'll have a look tomorrow.

If you get a chance can you compare the assembly instructions and signature of the same executables on 0.3.0 (currently on the ros main repository) ?

If it helps, the build log of the cartographer_ros 0.3.0 is here, and the one of 1.0.0 is here. We dont invoke make in verbos mode so we may not gain much from it

@cschuet
Copy link
Author

cschuet commented Jun 7, 2018

@mikaelarguedas Thanks.

I could not really find the 0.3.0 package deb, but 0.2.0 only requires

MODE64 (call)
CMOV (cmovg)
SSE1 (movss)
SSE2 (movsd)

@cschuet
Copy link
Author

cschuet commented Jun 7, 2018

@mikaelarguedas Just checked the logs. Indeed for 1.0.0

-march=native -msse4.2

but actually also for 0.3.0

-march=native -msse4.2

Could it be that 0.3.0 suffers from the same problem? Can you send me link to a deb?

@cschuet cschuet closed this as completed Jun 7, 2018
@cschuet cschuet reopened this Jun 7, 2018
@mikaelarguedas
Copy link
Contributor

mikaelarguedas commented Jun 7, 2018

@cschuet the cartographer_ros 0.3.0 debs can be found here: http://packages.ros.org/ros/ubuntu/pool/main/r/ros-melodic-cartographer-ros/

@gaschler
Copy link

gaschler commented Jun 7, 2018

@mikaelarguedas I installed ros-melodic-cartographer-ros deb 0.3.0 from the stable packages (in a bionic docker container) and observed the same illegal instruction in cartographer executables, but not in rviz.

@gaschler
Copy link

gaschler commented Jun 7, 2018

Seeing the build log http://build.ros.org/view/Mbin_uB64/job/Mbin_uB64__cartographer_ros__ubuntu_bionic_amd64__binary/11/consoleFull
-g -O2 -fdebug-prefix-map=/tmp/binarydeb/ros-melodic-cartographer-ros-1.0.0=. -fstack-protector-strong -Wformat -Werror=format-security -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 -march=native -msse4.2 -mfpmath=sse -pthread -std=c++11 -fPIC -Wall -Wpedantic -Werror=format-security -Werror=missing-braces -Werror=reorder -Werror=return-type -Werror=switch -Werror=uninitialized -o CMakeFiles/cartographer_ros.dir/cartographer_ros/assets_writer.cc

It seems like -march=native -msse4.2 -mfpmath=sse is added before cartographer's build flags -pthread -std=c++11 ....

One possibility is, we add ${TARGET_COMPILE_FLAGS} in front of our flags in
https://github.com/googlecartographer/cartographer/blob/master/cmake/functions.cmake#L26
(which is empty in my local setup), do you set that somewhere?

@mikaelarguedas
Copy link
Contributor

So it looks like the same problem happens with 0.3.0 as well.
If I compile cartographer_ros with catkin_make_isolated I can also see the -march=native -msse4.2 flags being passed.
Instruction set of the cartographer_node executable (on my machine)

MODE64 (call)
SSE2 (movq)
SSE1 (movhps)
CMOV (cmovg)
AVX (vzeroupper)
FMA (vfmadd231sd)
AVX2 (vbroadcastsd)
BMI2 (shlx)

If I explicitly set the CMake build type None:

MODE64 (call)
SSE1 (movss)
SSE2 (pxor)
AVX (vmovsd)
CMOV (cmovne)

I don't see these flags in any of the ROS packages I have around.
My guess is that there is something in the cartographer cmake functions that is introducing these flags.
Looking into it..

@gaschler
Copy link

gaschler commented Jun 7, 2018

@mikaelarguedas Interesting, on my Lunar on Debian stretch the compile flags are only -O3 -DNDEBUG -pthread -std=c++11 -fPIC -Wall -Wpedantic...
I tested both with catkin_make_isolated --use-ninja and catkin_make_isolated --install (with make).

To track this down, could you add

  message(warning "${NAME} TARGET_COMPILE_FLAGS ${TARGET_COMPILE_FLAGS}")
  message(warning "${NAME} GOOG_CXX_FLAGS ${GOOG_CXX_FLAGS}")

to the beginning of cartographer/cmake/functions.cmake:_common_compile_stuff?

@mikaelarguedas
Copy link
Contributor

To track this down, could you add

👍 Great minds think alike :), and the TARGET_COMPILE_FLAGS are empty on my side as well..
I'm testing in a bionic container.
I build using: VERBOSE=1 catkin_make_isolated --install --cmake-args -DCMAKE_BUILD_TYPE=None

cmake diff applied:

diff --git a/cmake/functions.cmake b/cmake/functions.cmake
index 3bfd343..5693dd4 100644
--- a/cmake/functions.cmake
+++ b/cmake/functions.cmake
@@ -23,6 +23,8 @@ macro(_parse_arguments ARGS)
 endmacro(_parse_arguments)
 
 macro(_common_compile_stuff VISIBILITY)
+  message(WARNING "TARGET_COMPILE_FLAGS: ${TARGET_COMPILE_FLAGS}")
+  message(WARNING "GOOG_CXX_FLAGS: ${GOOG_CXX_FLAGS}")
   set(TARGET_COMPILE_FLAGS "${TARGET_COMPILE_FLAGS} ${GOOG_CXX_FLAGS}")
 
   set_target_properties(${NAME} PROPERTIES
@@ -49,6 +51,7 @@ function(google_binary NAME)
 
   add_executable(${NAME} ${ARG_SRCS})
 
+  message(WARNING "NAME: ${NAME}")
   _common_compile_stuff("PRIVATE")
 
   install(TARGETS "${NAME}" RUNTIME DESTINATION bin)

assets_writer compile line (that does contain -march=native -msse4.2 -mfpmath=sse):

[  2%] Building CXX object CMakeFiles/cartographer_ros.dir/cartographer_ros/assets_writer.cc.o
/usr/bin/c++  -DGFLAGS_IS_A_DLL=0 -DROSCONSOLE_BACKEND_LOG4CXX -DROS_PACKAGE_NAME=\"cartographer_ros\" -DURDFDOM_HEADERS_HAS_SHARED_PTR_DEFS -isystem /usr/include/lua5.2 -isystem /usr/include/pcl-1.8 -isystem /usr/include/eigen3 -isystem /root/catkin_ws/install_isolated/include -isystem /opt/ros/melodic/include -isystem /opt/ros/melodic/share/orocos_kdl/cmake/../../../include -isystem /opt/ros/melodic/share/xmlrpcpp/cmake/../../../include/xmlrpcpp -isystem /usr/include/ni -isystem /usr/include/openni2 -isystem /usr/include/vtk-6.3 -isystem /usr/include/freetype2 -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem /usr/include/python2.7 -isystem /usr/include/x86_64-linux-gnu -isystem /usr/include/hdf5/openmpi -isystem /usr/include/libxml2 -isystem /usr/include/jsoncpp -isystem /usr/include/tcl -I/root/catkin_ws/build_isolated/cartographer_ros -I/root/catkin_ws/src/cartographer_ros/cartographer_ros -I/usr/src/googletest/googlemock/gtest/include -isystem /usr/include/cairo -isystem /usr/include/glib-2.0 -isystem /usr/lib/x86_64-linux-gnu/glib-2.0/include -isystem /usr/include/pixman-1 -isystem /usr/include/libpng16  -march=native -msse4.2 -mfpmath=sse  -pthread -std=c++11 -fPIC  -Wall -Wpedantic -Werror=format-security -Werror=missing-braces -Werror=reorder -Werror=return-type -Werror=switch -Werror=uninitialized -o CMakeFiles/cartographer_ros.dir/cartographer_ros/assets_writer.cc.o -c /root/catkin_ws/src/cartographer_ros/cartographer_ros/cartographer_ros/assets_writer.cc

CMake variables values for the target cartographer_assets_writer:

CMake Warning at /root/catkin_ws/install_isolated/share/cartographer/cmake/functions.cmake:54 (message):
  NAME: cartographer_assets_writer
Call Stack (most recent call first):
  cartographer_ros/CMakeLists.txt:15 (google_binary)


CMake Warning at /root/catkin_ws/install_isolated/share/cartographer/cmake/functions.cmake:26 (message):
  TARGET_COMPILE_FLAGS:
Call Stack (most recent call first):
  /root/catkin_ws/install_isolated/share/cartographer/cmake/functions.cmake:55 (_common_compile_stuff)
  cartographer_ros/CMakeLists.txt:15 (google_binary)


CMake Warning at /root/catkin_ws/install_isolated/share/cartographer/cmake/functions.cmake:27 (message):
  GOOG_CXX_FLAGS: -pthread -std=c++11 -fPIC -Wall -Wpedantic
  -Werror=format-security -Werror=missing-braces -Werror=reorder
  -Werror=return-type -Werror=switch -Werror=uninitialized
Call Stack (most recent call first):
  /root/catkin_ws/install_isolated/share/cartographer/cmake/functions.cmake:55 (_common_compile_stuff)
  cartographer_ros/CMakeLists.txt:15 (google_binary)

@mikaelarguedas
Copy link
Contributor

Looking at other platforms, this flag doesnt show up on Debian Stretch but is present on Ubuntu Artful and Bionic. (I'm looking at the "msg_conversion.cc.o" compiler line)

So it may be a gcc 7 thing. That still doesn't explain why it doesnt show up on the other packages built for that platform though..

@gaschler
Copy link

gaschler commented Jun 7, 2018

Perhaps this is related to PointCloudLibrary/pcl#2239?
Does the string "-march=native -msse4.2 -mfpmath=sse" appear in your PCLConfig.cmake?

@mikaelarguedas
Copy link
Contributor

mikaelarguedas commented Jun 7, 2018

Good point.
I didn't find any cmake Config file containing "march=native".

Edit: @gaschler well that's embarrassing.. I was in the stretch container...
Indeed PCLConfig.cmake is the culprit:

root@98f7eafd9eaa:/usr# cat `find -name PCLConfig.cmake` | grep msse4
list(APPEND PCL_DEFINITIONS " -march=native -msse4.2 -mfpmath=sse ")

mikaelarguedas added a commit to mikaelarguedas/cartographer_ros that referenced this issue Jun 7, 2018
This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100
mikaelarguedas added a commit to mikaelarguedas/cartographer_ros that referenced this issue Jun 7, 2018
This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100
mikaelarguedas added a commit to mikaelarguedas/cartographer_ros that referenced this issue Jun 7, 2018
This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100
mikaelarguedas added a commit to mikaelarguedas/cartographer_ros that referenced this issue Jun 7, 2018
This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100
cschuet pushed a commit to cartographer-project/cartographer_ros that referenced this issue Jun 7, 2018
* remove architecture specific definitions exported by PCL

This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100
gaschler pushed a commit to cartographer-project/cartographer_ros that referenced this issue Jun 8, 2018
* remove architecture specific definitions exported by PCL

This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100
ojura pushed a commit to larics/cartographer_combined that referenced this issue Jun 12, 2018
…by PCL (#893)

* remove architecture specific definitions exported by PCL

This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100

Original commit:
cartographer-project/cartographer_ros@4fd904d
ojura pushed a commit to larics/cartographer_combined that referenced this issue Aug 11, 2018
…by PCL (#893)

* remove architecture specific definitions exported by PCL

This is an issue on PCL 1.8.X causing SIGILL, Illegal instruction crashes: ros-gbp/cartographer_ros-release#10
Should be fixed in future PCL version with PointCloudLibrary/pcl#2100

Original commit:
cartographer-project/cartographer_ros@4fd904d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants