Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irodsServer occasionally segfaults on startup when built against libstdc++ #7747

Closed
2 tasks done
SwooshyCueb opened this issue May 15, 2024 · 8 comments
Closed
2 tasks done
Assignees
Labels
Milestone

Comments

@SwooshyCueb
Copy link
Member

SwooshyCueb commented May 15, 2024

  • main
  • 4-3-stable

When iRODS is built against libstdc++, sometimes (maybe like 5% of the time?) the server will segfault during startup.

Log output before segfault:

{"log_category":"server","log_level":"info","log_message":"Initializing server ...","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.788Z","server_type":"server","server_zone":"tempZone"}
{"log_category":"server","log_level":"info","log_message":"Setting up UNIX domain socket for agent factory ...","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.793Z","server_type":"server","server_zone":"tempZone"}
{"log_category":"server","log_level":"info","log_message":"Forking agent factory ...","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.794Z","server_type":"server","server_zone":"tempZone"}
{"log_category":"server","log_level":"info","log_message":"Connecting to agent factory [agent_factory_pid=1606] ...","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.795Z","server_type":"server","server_zone":"tempZone"}
{"log_category":"agent_factory","log_level":"info","log_message":"Initializing agent factory ...","server_host":"93c23b681ce5","server_pid":1606,"server_timestamp":"2024-05-15T20:12:25.799Z","server_type":"agent_factory","server_zone":"tempZone"}
{"log_category":"server","log_level":"debug","log_message":"Setting stacktrace dump signal handler for process [1606].","server_host":"93c23b681ce5","server_pid":1606,"server_timestamp":"2024-05-15T20:12:25.800Z","server_type":"agent_factory","server_zone":"tempZone"}
{"log_category":"server","log_level":"debug","log_message":"Stacktraces will be dumped to [/var/lib/irods/stacktraces].","server_host":"93c23b681ce5","server_pid":1606,"server_timestamp":"2024-05-15T20:12:25.800Z","server_type":"agent_factory","server_zone":"tempZone"}
{"local_hostname":"localhost, 93c23b681ce5","log_category":"server","log_level":"info","port":"1247","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.876Z","server_type":"server","server_zone":"tempZone"}
{"log_category":"server","log_level":"info","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.915Z","server_type":"server","server_zone":"tempZone","zone_info.host":"localhost","zone_info.name":"tempZone","zone_info.port":"1247","zone_info.type":"LOCAL_ICAT"}
{"log_category":"server","log_level":"info","log_message":"rodsServer Release version rods4.90.0 - API Version d is up","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.916Z","server_type":"server","server_zone":"tempZone"}
{"log_category":"legacy","log_level":"info","log_message":">>> control plane :: listening on port 1248\n","server_host":"93c23b681ce5","server_pid":1605,"server_timestamp":"2024-05-15T20:12:25.919Z","server_type":"server","server_zone":"tempZone"}

Demangled backtrace from catchsegv:

/lib/libirods_plugin_dependencies.so.4.90.0(boost::unordered::detail::grouped_bucket_array<boost::unordered::detail::bucket<boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> > >, boost::unordered::detail::prime_fmod_size<void> >::append_bucket_group(boost::unordered::detail::grouped_bucket_iterator<boost::unordered::detail::bucket<boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>, void*> >)+0x2d)[0x736f2e8774bd]
/lib/libirods_plugin_dependencies.so.4.90.0(boost::unordered::detail::grouped_bucket_array<boost::unordered::detail::bucket<boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>, void*>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> > >, boost::unordered::detail::prime_fmod_size<void> >::insert_node(boost::unordered::detail::grouped_bucket_iterator<boost::unordered::detail::bucket<boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>, void*> >, boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>*)+0x39)[0x736f2e877429]
/lib/libirods_plugin_dependencies.so.4.90.0(std::pair<boost::unordered::detail::iterator_detail::iterator<boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>, boost::unordered::detail::bucket<boost::unordered::detail::node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> >, void*>, void*> >, bool> boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::shared_ptr<irods::network>, irods::irods_string_hash, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::try_emplace_unique<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x247)[0x736f2e87a5e7]
/lib/libirods_plugin_dependencies.so.4.90.0(boost::unordered::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::shared_ptr<irods::network>, irods::irods_string_hash, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<irods::network> > > >::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x21)[0x736f2e87a381]
/lib/libirods_plugin_dependencies.so.4.90.0(irods::lookup_table<boost::shared_ptr<irods::network>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, irods::irods_string_hash>::operator[](std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x19)[0x736f2e875b09]
/lib/libirods_plugin_dependencies.so.4.90.0(irods::network_manager::init_from_type(int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::shared_ptr<irods::network>&)+0x30c)[0x736f2e8751ec]
/lib/libirods_plugin_dependencies.so.4.90.0(irods::tcp_object::resolve(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::shared_ptr<irods::plugin_base>&)+0x341)[0x736f2e881941]
/lib/libirods_server.so.4.90.0(sendRodsMsg(boost::shared_ptr<irods::network_object>, char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol)+0x7a)[0x736f2ffac2ba]
/lib/libirods_server.so.4.90.0(sendStartupPack+0x5e4)[0x736f2ffab8c4]
/lib/libirods_server.so.4.90.0(connectToRhost+0x99)[0x736f2ffaa279]
/lib/libirods_server.so.4.90.0(_rcConnect+0x3b0)[0x736f2ff84730]
/lib/libirods_server.so.4.90.0(rcConnect+0x80)[0x736f2ff84370]
/lib/libirods_server.so.4.90.0(irods::experimental::client_connection::only_connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, irods::experimental::fully_qualified_username const&)+0xa0)[0x736f2fef4e30]
/lib/libirods_server.so.4.90.0(irods::experimental::client_connection::connect_and_login(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, irods::experimental::fully_qualified_username const&)+0x35)[0x736f2fef49b5]
/lib/libirods_server.so.4.90.0(irods::experimental::client_connection::client_connection()+0x130)[0x736f2fef48b0]
irodsServer[0x563a1b]
irodsServer[0x562a05]
irodsServer[0x5629bd]
irodsServer[0x56295d]
irodsServer[0x56285d]
irodsServer(std::function<void ()>::operator()() const+0x35)[0x616265]
irodsServer(irods::experimental::cron::cron_task::operator()()+0x88)[0x6160c8]
irodsServer(irods::experimental::cron::cron::run()+0x87)[0x6153f7]
irodsServer[0x562740]
irodsServer[0x56269d]
irodsServer[0x56262d]
irodsServer[0x562605]
irodsServer[0x5625d5]
irodsServer[0x562519]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x736f2def2253]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x736f2dc61ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x736f2dcf3850]

This problem is hampering my ability to test the main branch of irods with the new externals, so I'm going to try and fix it as part of my work towards Ubuntu 24.04 support.

@SwooshyCueb
Copy link
Member Author

Switching from boost::unordered_map to std::unordered_map seems to reduce the frequency of this occurring, but doesn't entirely fix it.

@korydraughn
Copy link
Collaborator

The fact that lookup_table is listed in the stacktrace sticks out to me.

I remember Asan reporting a memory leak relating to that type, but I never tracked down what was causing it. Perhaps it is related to this.

@SwooshyCueb
Copy link
Member Author

SwooshyCueb commented May 16, 2024

The fact that lookup_table is listed in the stacktrace sticks out to me.

I remember Asan reporting a memory leak relating to that type, but I never tracked down what was causing it. Perhaps it is related to this.

That is a distinct possibility. Is there an open issue about that? Did Asan provide any information about that leak that might be useful?
EDIT: It just occurred to me that by "Asan", you mean the address sanitizer, not the name of an iRODS user. 🤦🏼‍♂️

@korydraughn
Copy link
Collaborator

I didn't create an issue for it since it falls under general improvements and I figured we'd see it again the next time someone attacks memory leaks.

There are a few issues which reference the lookup_table indirectly though. These appear to be the most relevant.

PR #6673 may contain comments about the lookup_table, but I'm not sure of that.

@SwooshyCueb
Copy link
Member Author

Looks like the crash mentioned in #6954 (comment) is very similar to the one I'm running into here. If I can get this fixed, it might be a two-birds-with-one-stone situation.

@korydraughn
Copy link
Collaborator

Got it.

@SwooshyCueb
Copy link
Member Author

Replacing boost::unordered_map with std::unordered_map and boost::shared_ptr with std::shared_ptr didn't fix the problem, but it did make it a lot easier to pick at. With the help of a modified version of libdebugme and gdb, I found a clue. At the point of the segfault, the unordered_map contains two entries with the key "tcp".
Predictably, a lot more people have used the std:: versions of these classes than the boost:: versions, and therefore more people have run into this kind of problem and posted about it online. This stackoverflow answer gives me one lead that makes sense given the clue I found: concurrent access. If that leads nowhere, it looks like I might be able to use _GLIBCXX_DEBUG to get more information about the problem.

@trel
Copy link
Member

trel commented May 17, 2024

hot on the trail. will be very interested to learn if we were holding it wrong, or it's a real bug that will require a workaround.

SwooshyCueb added a commit to SwooshyCueb/irods that referenced this issue Jun 12, 2024
SwooshyCueb added a commit to SwooshyCueb/irods that referenced this issue Jun 12, 2024
- `typedef`s converted to `using`s
- type aliases added
    - `key_type`
    - `value_type`
    - `size_type`
    - `hasher`
    - `iterator_value_type`
    - `const_iterator_value_type`
- references to template type arguments in member function declarations swapped for aliases
- `operator[](key_type)` overload replaced with `operator[](const key_type&)` and `operator[](key_type&&)`
- `size()` is now `noexcept` and returns `size_type` instead of `int`
- `has_entry(key_type)` overload replaced with `has_entry(const key_type&)`
- `erase(key_type)` overload replaced with `erase(const key_type&)` that returns `size_type` instead of `size_t`
- `clear()` is now `noexcept`
- `empty()` is now `noexcept` and `nodiscard`
- `begin()` and `end()` are now `noexcept` and have `const` overloads that return a `const_iterator`
- `cbegin()` and `cend()` are now `noexcept`
- `find(key_type)` overload replaced with `find(const key_type&)` and a `const` overload that returns a `const_iterator`
- `get()` implementaiton optimized - uses `find()` instead of `has_entry()`, allowing re-use of the iterator.
- `set()` implementation now calls `insert_or_assign()` instead of the `[]` operator.

The implementation is now more similar to that of `std::unordered_map`.
SwooshyCueb added a commit to SwooshyCueb/irods that referenced this issue Jun 12, 2024
alanking pushed a commit that referenced this issue Jun 12, 2024
- `typedef`s converted to `using`s
- type aliases added
    - `key_type`
    - `value_type`
    - `size_type`
    - `hasher`
    - `iterator_value_type`
    - `const_iterator_value_type`
- references to template type arguments in member function declarations swapped for aliases
- `operator[](key_type)` overload replaced with `operator[](const key_type&)` and `operator[](key_type&&)`
- `size()` is now `noexcept` and returns `size_type` instead of `int`
- `has_entry(key_type)` overload replaced with `has_entry(const key_type&)`
- `erase(key_type)` overload replaced with `erase(const key_type&)` that returns `size_type` instead of `size_t`
- `clear()` is now `noexcept`
- `empty()` is now `noexcept` and `nodiscard`
- `begin()` and `end()` are now `noexcept` and have `const` overloads that return a `const_iterator`
- `cbegin()` and `cend()` are now `noexcept`
- `find(key_type)` overload replaced with `find(const key_type&)` and a `const` overload that returns a `const_iterator`
- `get()` implementaiton optimized - uses `find()` instead of `has_entry()`, allowing re-use of the iterator.
- `set()` implementation now calls `insert_or_assign()` instead of the `[]` operator.

The implementation is now more similar to that of `std::unordered_map`.
SwooshyCueb added a commit to SwooshyCueb/irods that referenced this issue Jun 12, 2024
- `typedef`s converted to `using`s
- type aliases added
    - `key_type`
    - `value_type`
    - `size_type`
    - `hasher`
    - `iterator_value_type`
    - `const_iterator_value_type`
- references to template type arguments in member function declarations swapped for aliases
- `operator[](key_type)` overload replaced with `operator[](const key_type&)` and `operator[](key_type&&)`
- `size()` is now `noexcept` and returns `size_type` instead of `int`
- `has_entry(key_type)` overload replaced with `has_entry(const key_type&)`
- `erase(key_type)` overload replaced with `erase(const key_type&)` that returns `size_type` instead of `size_t`
- `clear()` is now `noexcept`
- `empty()` is now `noexcept` and `nodiscard`
- `begin()` and `end()` are now `noexcept` and have `const` overloads that return a `const_iterator`
- `cbegin()` and `cend()` are now `noexcept`
- `find(key_type)` overload replaced with `find(const key_type&)` and a `const` overload that returns a `const_iterator`
- `get()` implementaiton optimized - uses `find()` instead of `has_entry()`, allowing re-use of the iterator.
- `set()` implementation now calls `insert_or_assign()` instead of the `[]` operator.

The implementation is now more similar to that of `std::unordered_map`.
SwooshyCueb added a commit to SwooshyCueb/irods that referenced this issue Jun 12, 2024
alanking pushed a commit that referenced this issue Jun 12, 2024
- `typedef`s converted to `using`s
- type aliases added
    - `key_type`
    - `value_type`
    - `size_type`
    - `hasher`
    - `iterator_value_type`
    - `const_iterator_value_type`
- references to template type arguments in member function declarations swapped for aliases
- `operator[](key_type)` overload replaced with `operator[](const key_type&)` and `operator[](key_type&&)`
- `size()` is now `noexcept` and returns `size_type` instead of `int`
- `has_entry(key_type)` overload replaced with `has_entry(const key_type&)`
- `erase(key_type)` overload replaced with `erase(const key_type&)` that returns `size_type` instead of `size_t`
- `clear()` is now `noexcept`
- `empty()` is now `noexcept` and `nodiscard`
- `begin()` and `end()` are now `noexcept` and have `const` overloads that return a `const_iterator`
- `cbegin()` and `cend()` are now `noexcept`
- `find(key_type)` overload replaced with `find(const key_type&)` and a `const` overload that returns a `const_iterator`
- `get()` implementaiton optimized - uses `find()` instead of `has_entry()`, allowing re-use of the iterator.
- `set()` implementation now calls `insert_or_assign()` instead of the `[]` operator.

The implementation is now more similar to that of `std::unordered_map`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants