Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random fatal errors in the OpenGL visualizer #4692

Open
jngrad opened this issue Mar 21, 2023 · 0 comments
Open

Random fatal errors in the OpenGL visualizer #4692

jngrad opened this issue Mar 21, 2023 · 0 comments

Comments

@jngrad
Copy link
Member

jngrad commented Mar 21, 2023

The OpenGL visualizer randomly segfaults during live visualization. The backtrace often ends up in the OpenMPI shared objects (on Ubuntu) or the malloc dynamic libraries or the OpenMPI dynamic libraries (macOS).

Reproducible on both Ubuntu 22.04 / Python 3.10.6 / PyOpenGL 3.1.6 and macOS / Python 3.11.2 / PyOpenGL 3.1.6 using this sample from 62d1479:

./pypresso ../samples/visualization_constraints.py --sphere

After a few minutes, the visualizer crashes. Here is a backtrace from a sanitizers build on Ubuntu:

terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Free_mem: MPI_ERR_NO_MEM: out of memory
[lama:3166139] *** Process received signal ***
[lama:3166139] Signal: Aborted (6)
[lama:3166139] Signal code:  (-6)
[lama:3166139] [ 0] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x6ab79)[0x14c1d09b7b79]
[lama:3166139] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x14c21d042520]
[lama:3166139] [ 2] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x14c21d096a7c]
[lama:3166139] [ 3] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x14c21d042476]
[lama:3166139] [ 4] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x14c21d0287f3]
[lama:3166139] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2bbe)[0x14c21cca2bbe]
[lama:3166139] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae24c)[0x14c21ccae24c]
[lama:3166139] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae2b7)[0x14c21ccae2b7]
[lama:3166139] [ 8] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae518)[0x14c21ccae518]
[lama:3166139] [ 9] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(void boost::throw_exception<boost::mpi::exception>(boost::mpi::exception const&)+0x54)[0x14c1d38b32f4]
[lama:3166139] [10] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(std::allocator_traits<boost::mpi::allocator<char> >::deallocate(boost::mpi::allocator<char>&, char*, unsigned long)+0x134)[0x14c1d38b3eb4]
[lama:3166139] [11] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(void std::vector<char, boost::mpi::allocator<char> >::_M_range_insert<char const*>(__gnu_cxx::__normal_iterator<char*, std::vector<char, boost::mpi::allocator<char> > >, char const*, char const*, std::forward_iterator_tag)+0x634)[0x14c1d38b2584]
[lama:3166139] [12] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(__gnu_cxx::__normal_iterator<char*, std::vector<char, boost::mpi::allocator<char> > > std::vector<char, boost::mpi::allocator<char> >::insert<char const*, void>(__gnu_cxx::__normal_iterator<char const*, std::vector<char, boost::mpi::allocator<char> > >, char const*, char const*)+0x171)[0x14c1d38b1e11]
[lama:3166139] [13] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(boost::mpi::binary_buffer_oprimitive::save_impl(void const*, int)+0xc3)[0x14c1d38b1c33]
[lama:3166139] [14] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(void boost::mpi::packed_oarchive::save_override<int>(int const&)+0x7f)[0x14c1d38b5a1f]
[lama:3166139] [15] /work/jgrad/espresso/build-lama-sanitizers/src/script_interface/espresso_script_interface.so(boost::mpi::packed_oarchive& boost::archive::detail::interface_oarchive<boost::mpi::packed_oarchive>::operator<< <int>(int const&)+0xc8)[0x14c1d38b0d88]
[lama:3166139] [16] /work/jgrad/espresso/build-lama-sanitizers/src/core/espresso_core.so(void Utils::detail::for_each_impl<std::tuple<int&, int&>, Communication::MpiCallbacks::call<int&, int&>(int, int&, int&) const::{lambda(auto:1&&)#1}, 0ul, 1ul>(Communication::MpiCallbacks::call<int&, int&>(int, int&, int&) const::{lambda(auto:1&&)#1}&&, std::tuple<int&, int&>, std::integer_sequence<unsigned long, 0ul, 1ul>)+0x9a)[0x14c1d1a7589a]
[lama:3166139] [17] /work/jgrad/espresso/build-lama-sanitizers/src/core/espresso_core.so(void Utils::for_each<Communication::MpiCallbacks::call<int&, int&>(int, int&, int&) const::{lambda(auto:1&&)#1}, std::tuple<int&, int&> >(Communication::MpiCallbacks::call<int&, int&>(int, int&, int&) const::{lambda(auto:1&&)#1}&&, std::tuple<int&, int&>&&)+0xb0)[0x14c1d1a75750]
[lama:3166139] [18] /work/jgrad/espresso/build-lama-sanitizers/src/core/espresso_core.so(void Communication::MpiCallbacks::call<int&, int&>(int, int&, int&) const+0x1b9)[0x14c1d1a75429]
[lama:3166139] [19] /work/jgrad/espresso/build-lama-sanitizers/src/core/espresso_core.so(std::remove_reference<decltype ({parm#2}(std::declval<int>(), std::declval<int>()))>::type Communication::MpiCallbacks::call<std::plus<int>, int, int, int>(Communication::Result::Reduction, std::plus<int>, int (*)(int, int), int, int) const+0x156)[0x14c1d1a75056]
[lama:3166139] [20] /work/jgrad/espresso/build-lama-sanitizers/src/core/espresso_core.so(auto mpi_call<Communication::Result::Reduction, std::plus<int>, int, int, int, int&, int&>(Communication::Result::Reduction, std::plus<int>&&, int (*)(int, int), int&, int&)+0x52)[0x14c1d1a5fd92]
[lama:3166139] [21] /work/jgrad/espresso/build-lama-sanitizers/src/core/espresso_core.so(python_integrate(int, bool, bool)+0x6e9)[0x14c1d1a5e4b9]
[lama:3166139] [22] /work/jgrad/espresso/build-lama-sanitizers/src/python/espressomd/integrate.so(+0x4999a)[0x14c0b41f799a]
[lama:3166139] [23] /work/jgrad/espresso/build-lama-sanitizers/src/python/espressomd/integrate.so(+0x466f7)[0x14c0b41f46f7]
[lama:3166139] [24] /work/jgrad/espresso/build-lama-sanitizers/src/python/espressomd/script_interface.so(+0x137855)[0x14c0c71e0855]
[lama:3166139] [25] /work/jgrad/espresso/build-lama-sanitizers/src/python/espressomd/script_interface.so(+0x134b83)[0x14c0c71ddb83]
[lama:3166139] [26] /usr/bin/python3.10(_PyObject_MakeTpCall+0x25b)[0x562b49e844ab]
[lama:3166139] [27] /usr/bin/python3.10(+0x16b060)[0x562b49e9c060]
[lama:3166139] [28] /work/jgrad/espresso/build-lama-sanitizers/src/python/espressomd/integrate.so(+0x3cbb7)[0x14c0b41eabb7]
[lama:3166139] [29] /work/jgrad/espresso/build-lama-sanitizers/src/python/espressomd/integrate.so(+0x41443)[0x14c0b41ef443]
[lama:3166139] *** End of error message ***
Aborted (core dumped)

No ASAN or UBSAN report was generated. The macOS backtrace is almost identical.

Removing the calls to OpenGL.GLUT.glutSolidSphere() in both openGLLive_draw_system_particles() and Sphere.draw() makes fatal errors more difficult to reproduce on Ubuntu and macOS. But then, other sources of fatal errors become visible, such as this backtrace from an Ubuntu release build:

[lama:3177480] *** Process received signal ***
[lama:3177480] Signal: Segmentation fault (11)
[lama:3177480] Signal code: Address not mapped (1)
[lama:3177480] Failing at address: 0x10
[lama:3177480] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x1461a1042520]
[lama:3177480] [ 1] /lib/x86_64-linux-gnu/libopen-pal.so.40(opal_rb_tree_insert+0x47)[0x1461594bb357]
[lama:3177480] [ 2] /lib/x86_64-linux-gnu/libopen-pal.so.40(mca_mpool_base_tree_insert+0x2a)[0x14615950e7fa]
[lama:3177480] [ 3] /lib/x86_64-linux-gnu/libopen-pal.so.40(mca_mpool_base_alloc+0x68)[0x14615950eb58]
[lama:3177480] [ 4] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Alloc_mem+0xc4)[0x146159d317c4]
[lama:3177480] [ 5] /work/jgrad/espresso/build-lama-release/src/core/espresso_core.so(void std::vector<char, boost::mpi::allocator<char> >::_M_range_insert<char const*>(__gnu_cxx::__normal_iterator<char*, std::vector<char, boost::mpi::allocator<char> > >, char const*, char const*, std::forward_iterator_tag)+0x7d7)[0x14615ac80627]
[lama:3177480] [ 6] /work/jgrad/espresso/build-lama-release/src/core/espresso_core.so(mpi_integrate(int, int)+0x157)[0x14615acbad27]
[lama:3177480] [ 7] /work/jgrad/espresso/build-lama-release/src/core/espresso_core.so(python_integrate(int, bool, bool)+0xe5)[0x14615acbb375]
[lama:3177480] [ 8] /work/jgrad/espresso/build-lama-release/src/python/espressomd/integrate.so(+0x121a2)[0x146187e231a2]
[lama:3177480] [ 9] /usr/bin/python3.10(_PyObject_MakeTpCall+0x25b)[0x5628ec2bb4ab]
[lama:3177480] [10] /usr/bin/python3.10(+0x16b060)[0x5628ec2d3060]
[lama:3177480] [11] /work/jgrad/espresso/build-lama-release/src/python/espressomd/integrate.so(+0xb869)[0x146187e1c869]
[lama:3177480] [12] /usr/bin/python3.10(_PyObject_MakeTpCall+0x25b)[0x5628ec2bb4ab]
[lama:3177480] [13] /usr/bin/python3.10(+0x16af0b)[0x5628ec2d2f0b]
[lama:3177480] [14] /usr/bin/python3.10(_PyEval_EvalFrameDefault+0x63b2)[0x5628ec2b3462]
[lama:3177480] [15] /usr/bin/python3.10(_PyFunction_Vectorcall+0x7c)[0x5628ec2c51ec]
[lama:3177480] [16] /usr/bin/python3.10(_PyEval_EvalFrameDefault+0x2a40)[0x5628ec2afaf0]
[lama:3177480] [17] /usr/bin/python3.10(_PyFunction_Vectorcall+0x7c)[0x5628ec2c51ec]
[lama:3177480] [18] /usr/bin/python3.10(_PyEval_EvalFrameDefault+0x81b)[0x5628ec2ad8cb]
[lama:3177480] [19] /usr/bin/python3.10(_PyFunction_Vectorcall+0x7c)[0x5628ec2c51ec]
[lama:3177480] [20] /usr/bin/python3.10(_PyEval_EvalFrameDefault+0x81b)[0x5628ec2ad8cb]
[lama:3177480] [21] /usr/bin/python3.10(+0x16ae91)[0x5628ec2d2e91]
[lama:3177480] [22] /usr/bin/python3.10(+0x296e5b)[0x5628ec3fee5b]
[lama:3177480] [23] /usr/bin/python3.10(+0x28cf58)[0x5628ec3f4f58]
[lama:3177480] [24] /lib/x86_64-linux-gnu/libc.so.6(+0x94b43)[0x1461a1094b43]
[lama:3177480] [25] /lib/x86_64-linux-gnu/libc.so.6(+0x126a00)[0x1461a1126a00]
[lama:3177480] *** End of error message ***
Segmentation fault (core dumped)

and a very similar one on macOS.

These backtraces almost always feature the ESPResSo custom MpiCallbacks framework, which we are progressively removing from the core in favor of standard MPI communication.

Thanks to Le Qiao for helping investigate the macOS errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant