Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support boost 1.69.0-1.70.0 #1296

Closed
3 tasks done
vincentchabannes opened this issue Mar 20, 2019 · 5 comments
Closed
3 tasks done

Support boost 1.69.0-1.70.0 #1296

vincentchabannes opened this issue Mar 20, 2019 · 5 comments

Comments

@vincentchabannes
Copy link
Member

vincentchabannes commented Mar 20, 2019

Require some fixes :

  • Install of boost add an archicteture prefix (x64 ), detection require a recent >=3.12 cmake and we must define cmake variable Boost_ARCHITECTURE
  • Lambda expression or functor can't be pass in boost mpi argument ( all arg/parameters of a functor should be static) . This was used in feel/feells/reinit_fms_impl.hpp
  • mpi serialization with std::vector fails when mpi::wait_all() is called
@vincentchabannes vincentchabannes self-assigned this Mar 20, 2019
vincentchabannes added a commit that referenced this issue Mar 20, 2019
- add default value for Boost_ARCHITECTURE (+little clean of boost detect)
- fix use of cmake machine configuration
vincentchabannes added a commit that referenced this issue Mar 20, 2019
@vincentchabannes
Copy link
Member Author

I have found a bug in boost mpi, I have send a fix with this pull request boostorg/mpi#84 . I hope it will be accepted before the 1.70 release.
It means that we need to patch the release 1.69 of boost.

@vincentchabannes vincentchabannes changed the title Support boost 1.69 Support boost 1.69-1.70 Nov 27, 2019
@vincentchabannes vincentchabannes changed the title Support boost 1.69-1.70 Support boost 1.69.0-1.70.0 Nov 27, 2019
vincentchabannes added a commit that referenced this issue Nov 27, 2019
…d to #1296 (@tmetivet)

we need to ask to boost mpi developers why these changes are required
@vincentchabannes
Copy link
Member Author

unfortunately, boost mpi has still some issues. quite hard to reproduce because not appears all the times.

terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000386c) received by PID 14444 (TID 0x7fa9aba2b500) from PID 14444; stack trace: ***
terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
terminate called after throwing an instance of '    @     0x7fa973909390 (unknown)
boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated

*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000386b) received by PID 14443 (TID 0x7f24b20f1500) from PID 14443; stack trace: ***
    @     0x7fa96f8c5428 gsignal
    @     0x7fa96f8c702a abort
    @     0x7f2479fcf390 (unknown)
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000386d) received by PID 14445 (TID 0x7f7be22b8500) from PID 14445; stack trace: ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000255b) received by PID 9563 (TID 0x7f8719331500) from PID 9563; stack trace: ***
terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated
    @     0x7f2475f8b428 gsignal
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
    @     0x7fa9731b9af3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f7baa196390 (unknown)
    @     0x7f86e120d390 (unknown)
    @     0x7f2475f8d02a abort
    @     0x7f7ba6152428 gsignal
    @     0x7fa9731bfa56 __cxxabiv1::__terminate()
    @     0x7f86dd0fe428 gsignal
terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
    @     0x7f247987faf3 __gnu_cxx::__verbose_terminate_handler()
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncatedPC: @                0x0 (unknown)

*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000386b) received by PID 14443 (TID 0x7f24b20f1500) from PID 14443; stack trace: ***
    @     0x7fa96f8c5428 gsignal
    @     0x7fa96f8c702a abort
    @     0x7f2479fcf390 (unknown)
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000386d) received by PID 14445 (TID 0x7f7be22b8500) from PID 14445; stack trace: ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000255b) received by PID 9563 (TID 0x7f8719331500) from PID 9563; stack trace: ***
terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated
    @     0x7f2475f8b428 gsignal
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
    @     0x7fa9731b9af3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f7baa196390 (unknown)
    @     0x7f86e120d390 (unknown)
    @     0x7f2475f8d02a abort
    @     0x7f7ba6152428 gsignal
    @     0x7fa9731bfa56 __cxxabiv1::__terminate()
    @     0x7f86dd0fe428 gsignal
terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
    @     0x7f247987faf3 __gnu_cxx::__verbose_terminate_handler()
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncatedPC: @                0x0 (unknown)

*** SIGABRT (@0x5c00000255c) received by PID 9564 (TID 0x7fc8f9875500) from PID 9564; stack trace: ***
*** Aborted at 1576576705 (unix time) try "date -d @1576576705" if you are using GNU date ***
    @     0x7f7ba615402a abort
    @     0x7f86dd10002a abort
    @     0x7fa9731bfa91 std::terminate()
    @     0x7f2479885a56 __cxxabiv1::__terminate()
    @     0x7fc8c1751390 (unknown)
    @     0x7f7ba9a46af3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f86e0abdaf3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7fc8bd642428 gsignal
PC: @                0x0 (unknown)
*** SIGABRT (@0x5c00000255e) received by PID 9566 (TID 0x7f6621022500) from PID 9566; stack trace: ***
    @     0x7fa9731bfcc3 __cxa_throw
    @     0x7f2479885a91 std::terminate()
    @     0x7f7ba9a4ca56 __cxxabiv1::__terminate()
    @     0x7fc8bd64402a abort
    @     0x7f86e0ac3a56 __cxxabiv1::__terminate()
    @     0x7f65e8efe390 (unknown)
    @     0x7f2479885cc3 __cxa_throw
    @     0x7fc8c1001af3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f65e4def428 gsignal
    @     0x7f7ba9a4ca91 std::terminate()
    @     0x7f86e0ac3a91 std::terminate()
    @     0x7fa9aa430b65 boost::throw_exception<>()
    @     0x7f65e4df102a abort
    @     0x7fc8c1007a56 __cxxabiv1::__terminate()
    @     0x7f7ba9a4ccc3 __cxa_throw
    @     0x7f86e0ac3cc3 __cxa_throw
    @     0x7f65e87aeaf3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7f24b0af6b65 boost::throw_exception<>()
    @     0x7fc8c1007a91 std::terminate()
    @     0x7f65e87b4a56 __cxxabiv1::__terminate()
    @     0x7fa9a543e13f boost::mpi::request::handle_dynamic_primitive_array_irecv<>()

@vincentchabannes
Copy link
Member Author

The problem comes from overtaking msg of some irecv when applying with std::vector (the same tag is used to send the size and data). But the MPI API says: Order Messages are non-overtaking. So it should work and it's the case almost every time. The bug detected is due to the fact that in partitionio reader, there are two steps of isend/irecv and the first one seems sometimes disturb the second step. Put a barrier between can fix also the issue. I finally fix the bug by replacing vector by an array because the size was known.

I close the issue but we need to review our use of boost mpi (maybe it can appear in other part)

@prudhomm
Copy link
Member

thanks for the clear explanation

@romainhild
Copy link
Contributor

I ran into the same issue, here is the backtrace

terminate called after throwing an instance of 'boost::wrapexcept<boost::mpi::exception>'
  what():  MPI_Test: MPI_ERR_TRUNCATE: message truncated
*** Aborted at 1580981702 (unix time) try "date -d @1580981702" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x60100021fc4) received by PID 139204 (TID 0x7fc99f9b6500) from PID 139204; stack trace: ***
    @     0x7fc959cfb390 (unknown)
    @     0x7fc955bec428 gsignal
    @     0x7fc955bee02a abort
    @     0x7fc9595abaf3 __gnu_cxx::__verbose_terminate_handler()
    @     0x7fc9595b1a56 __cxxabiv1::__terminate()
    @     0x7fc9595b1a91 std::terminate()
    @     0x7fc9595b1cc3 __cxa_throw
    @           0x6605d5 boost::throw_exception<>()
    @           0x77cd9f boost::mpi::request::handle_dynamic_primitive_array_irecv<>()
    @     0x7fc98590f1d2 boost::mpi::request::test()
    @           0x77b9aa boost::mpi::wait_all<>()
    @     0x7fc99523ffbe Feel::CreateSubmeshTool<>::updateParallelInputRange()
    @     0x7fc99523d457 Feel::CreateSubmeshTool<>::build()
    @     0x7fc99523bcfb Feel::CreateSubmeshTool<>::build()
    @     0x7fc99523baec Feel::boost_param_default_1345createSubmesh<>()
    @     0x7fc994fdfd60 Feel::FeelModels::MixedPoisson<>::initSpaces()
    @     0x7fc994fdbc71 Feel::FeelModels::MixedPoisson<>::init()

vincentchabannes added a commit that referenced this issue Jul 2, 2020
- can pass a mesh support instead of a mesh : allow to force some elements extracted (that are isolated in mesh support) to be ghost (can fix some special case of partitioning)
- up mpi comm for #1296
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants