Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

segfault in all connext request-response tests on Windows #31

Closed
wjwwood opened this issue Apr 30, 2015 · 7 comments
Closed

segfault in all connext request-response tests on Windows #31

wjwwood opened this issue Apr 30, 2015 · 7 comments
Assignees

Comments

@wjwwood
Copy link
Member

wjwwood commented Apr 30, 2015

All of the connext based request-response tests and examples crash in the std::string implementation with this error:

Unhandled exception at 0x00007FFF11C61332 (vcruntime140d.dll) in test_server__rmw_connext_cpp.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

The back trace (of the connext version of test_server):

>   userland_msgs__rosidl_typesupport_connext_cpp.dll!std::char_traits<char>::copy(char * _First1, const char * _First2, unsigned __int64 _Count) Line 529  C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right, unsigned __int64 _Roff, unsigned __int64 _Count) Line 1132  C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right) Line 1115   C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::operator=(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right) Line 1003    C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!connext::ReplierParams<userland_msgs::dds_::AddTwoIntsRequest_,userland_msgs::dds_::AddTwoIntsResponse_>::service_name(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name) Line 72 C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!userland_msgs::service_type_support::create_replier__AddTwoInts(void * untyped_participant, const char * service_name, void * * untyped_reader) Line 74   C++
    rmw_connext_cpp.dll!rmw_create_service(const rmw_node_t * node, const rosidl_service_type_support_t * type_support, const char * service_name) Line 640 C++
    test_server__rmw_connext_cpp.exe!rclcpp::node::Node::create_service<userland_msgs::AddTwoInts>(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name, std::function<void __cdecl(std::shared_ptr<rmw_request_id_t> const &,std::shared_ptr<userland_msgs::AddTwoIntsRequest_<std::allocator<void> > > const &,std::shared_ptr<userland_msgs::AddTwoIntsResponse_<std::allocator<void> > > &)> callback_with_header, std::shared_ptr<rclcpp::callback_group::CallbackGroup> group) Line 219  C++
    test_server__rmw_connext_cpp.exe!main(int argc, char * * argv) Line 36  C++

It's my belief that this is related to ros2/ros2#31, because when researching that issue I remember a discussion online about how the libc++ developers intentionally caused a link time error even though they could have made it work in order to avoid a run time error which would occur when creating a std::string in one library and passing it to a library built with a different version. Here is the summary:

http://stackoverflow.com/questions/8454329/why-cant-clang-with-libc-in-c0x-mode-link-this-boostprogram-options-examp/8457799#8457799

In order to turn this run time crash into a link time error, libc++ uses a C++11 language feature called inline namespace to change the ABI of std::string without impacting the API of std::string. That is, to you std::string looks the same. But to the linker, std::string is being mangled as if it is in namespace std::__1. Thus the linker knows that std::basic_string and std::__1::basic_string are two different data structures (the former coming from gcc's libstdc++ and the latter coming from libc++).

And I think that the VS2013 - VS2015 Preview headers do not use this trick, but potentially do change the implementation of basic_string, which would cause an issue. This might be expected since they do not encourage you to use binaries built with one VC with binaries built from another.

So, it looks to me that we have no bug in our code, but simply that we need newer or from source versions of the RTI libraries to make this work.

@wjwwood
Copy link
Member Author

wjwwood commented Apr 30, 2015

Also, I'd be happy to walk anyone who's interested through the setup I have for a second opinion.

@wjwwood
Copy link
Member Author

wjwwood commented Apr 30, 2015

Here's the back trace for the connext version of test_client too:

    appcrt140d.dll!00007fff0652d7f1()   Unknown
    appcrt140d.dll!00007fff0652d48a()   Unknown
    appcrt140d.dll!00007fff0654dea6()   Unknown
    vcruntime140d.dll!00007fff11c670f6()    Unknown
    test_client__rmw_connext_cpp.exe!__scrt_unhandled_exception_filter(_EXCEPTION_POINTERS * const pointers) Line 78    C++
    KernelBase.dll!00007fff1bf41c72()   Unknown
    ntdll.dll!00007fff1ecdfb33()    Unknown
    ntdll.dll!00007fff1ecc2896()    Unknown
    ntdll.dll!00007fff1ecd3f0d()    Unknown
    ntdll.dll!00007fff1ec94887()    Unknown
    ntdll.dll!00007fff1ec93aad()    Unknown
    KernelBase.dll!00007fff1be68b9c()   Unknown
    msvcr120.dll!00007fff060de3d9() Unknown
    ntdll.dll!00007fff1ecd3573()    Unknown
    rticonnextmsgcpp.dll!`std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Copy'::`1'::catch$1() Line 2219    C++
    msvcr120.dll!00007fff060e6920() Unknown
    msvcr120.dll!00007fff060de36d() Unknown
    ntdll.dll!00007fff1ecd3573()    Unknown
    rticonnextmsgcpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Copy(unsigned __int64 _Newsize, unsigned __int64 _Oldlen) Line 2215 C++
    rticonnextmsgcpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right, unsigned __int64 _Roff, unsigned __int64 _Count) Line 1138   C++
>   rticonnextmsgcpp.dll!connext::RequesterParams::service_name(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & name) Line 66  C++
    userland_msgs__rosidl_typesupport_connext_cpp.dll!userland_msgs::service_type_support::create_requester__AddTwoInts(void * untyped_participant, const char * service_name, void * * untyped_reader) Line 43 C++
    rmw_connext_cpp.dll!rmw_create_client(const rmw_node_t * node, const rosidl_service_type_support_t * type_support, const char * service_name) Line 566  C++
    test_client__rmw_connext_cpp.exe!rclcpp::node::Node::create_client<userland_msgs::AddTwoInts>(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name, std::shared_ptr<rclcpp::callback_group::CallbackGroup> group) Line 161 C++
    test_client__rmw_connext_cpp.exe!request<userland_msgs::AddTwoInts>(std::shared_ptr<rclcpp::node::Node> node, std::function<void __cdecl(std::shared_ptr<userland_msgs::AddTwoIntsRequest_<std::allocator<void> > > &,unsigned __int64)> set_data_func, unsigned __int64 number_of_requests) Line 31    C++
    test_client__rmw_connext_cpp.exe!main(int argc, char * * argv) Line 83  C++

The back trace is different, but the problem traces to this line in our code:

https://github.com/ros2/rmw_connext/blob/master/rosidl_typesupport_connext_cpp/resource/srv_ServiceTypeSupport.cpp.template#L56

And stepping through code I can confirm that the service_name c-string is valid and that the requester_params object looks valid as well.

@esteve esteve self-assigned this May 19, 2015
@esteve esteve removed their assignment Jun 16, 2015
@esteve esteve self-assigned this Oct 23, 2015
@esteve esteve added the in progress Actively being worked on (Kanban column) label Oct 23, 2015
@esteve
Copy link
Member

esteve commented Oct 27, 2015

As an experiment, I've ported our custom implementation of services from OpenSplice to Connext (only for the static variant) and the tests now pass:

http://ci.ros2.org/job/ros2_batch_ci_windows/534/testReport/%28root%29/test_services_cpp__rmw_connext_cpp/

As a side benefit, OpenSplice and Connext can interoperate.

I also agree with @wjwwood that this is due to a change in std::basic_string in VS2015.

@esteve esteve added in review Waiting for review (Kanban column) ready Work is about to start (Kanban column) and removed in progress Actively being worked on (Kanban column) in review Waiting for review (Kanban column) labels Oct 28, 2015
@tfoote
Copy link
Contributor

tfoote commented Nov 17, 2015

Holding off on this hoping for an upstream fix to work with VS2015

@dirk-thomas
Copy link
Member

The first engineering build of Connext with VS2015 does not magically make it work (as we had hoped). I have debugged some more on the Windows machine (running test_services_server_cpp__rmw_connext_cpp). The application crashed with an access violation within the connext::details::EntityParams. I have asked our contacts at RTI for support.

@dirk-thomas
Copy link
Member

This has been addressed by building Release on Windows: ament/ament_tools#72

The followup ticket #89 will allow to build Debug again by selecting the corresponding Connext libraries.

@gerkey
Copy link
Member

gerkey commented Jan 19, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants