Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using deal.II dev crashes when loading any shared libraries #2794

Closed
gassmoeller opened this Issue Feb 1, 2019 · 4 comments

Comments

Projects
None yet
2 participants
@gassmoeller
Copy link
Contributor

gassmoeller commented Feb 1, 2019

I am in the process of fixing #2602 by rebasing and fixing #1957. I found a number of small things, but I have a more severe problem that I could not debug in several hours today. I will take another look tomorrow or next week, but maybe @tjhei or @bangerth immediately see the problem, so I thought I could ask.

With the recent deal.II master and my rebased ASPECT branch that fixes #2602 (and works normal models) all models loading shared libraries into ASPECT fail immediately (when the library is loaded) with

Loading shared library <./libsimple_nonlinear.so>
malloc(): memory corruption
[rene-laptop:27704] *** Process received signal ***
[rene-laptop:27704] Signal: Aborted (6)
[rene-laptop:27704] Signal code:  (-6)

gdb does not give helpful output, and when I run valgrind/memcheck for this model I get the following output (shortened, the part I left out is repeating the error message many times for different finite element classes):

==14966== Memcheck, a memory error detector
==14966== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==14966== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==14966== Command: ../aspect simple_nonlinear.x.prm
==14966== 
==14966== Warning: set address range perms: large range [0x4e3c000, 0x1ccb4000) (defined)
--14966-- warning: DiCfSI 0xe64e800 .. 0xed4011f is huge; length = 7280928 (libdeal_II.g.so.9.1.0-pre)
==14966== Conditional jump or move depends on uninitialised value(s)
==14966==    at 0x25568375: opal_value_unload (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.20.10.1)
==14966==    by 0x1DED897A: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==14966==    by 0x1DEDC8A4: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==14966==    by 0x1DEFD404: PMPI_Init_thread (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.20.10.1)
==14966==    by 0x15DF757F: dealii::Utilities::MPI::MPI_InitFinalize::MPI_InitFinalize(int&, char**&, unsigned int) (mpi.cc:541)
==14966==    by 0x1772367: main (main.cc:711)
==14966== 
-----------------------------------------------------------------------------
-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
--     . version 2.1.0-pre (use_dealii_particles, e37b0910a)
--     . using deal.II 9.1.0-pre (fix_serialization_of_variable_data, 1e94813d07)
--     . using Trilinos 12.10.1
--     . using p4est 2.0.0
--     . running in DEBUG mode
--     . running with 1 MPI process
-----------------------------------------------------------------------------

Loading shared library <./libsimple_nonlinear.so>
==14966== Invalid write of size 8
==14966==    at 0x1557261: std::_Vector_base<std::atomic<bool>*, std::allocator<std::atomic<bool>*> >::_Vector_impl::_Vector_impl() (stl_vector.h:89)
==14966==    by 0x15426FF: std::_Vector_base<std::atomic<bool>*, std::allocator<std::atomic<bool>*> >::_Vector_base() (stl_vector.h:127)
==14966==    by 0x1541B53: std::vector<std::atomic<bool>*, std::allocator<std::atomic<bool>*> >::vector() (stl_vector.h:263)
==14966==    by 0x1539AD0: dealii::Subscriptor::Subscriptor() (subscriptor.h:295)
==14966==    by 0x5ED80B4C: FEFactoryBase (fe_tools.h:81)
==14966==    by 0x5ED80B4C: make_unique<dealii::FETools::FEFactory<dealii::FE_Q_Hierarchical<1> > > (fe_tools.h:116)
==14966==    by 0x5ED80B4C: void dealii::(anonymous namespace)::fill_no_codim_fe_names<1>(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<dealii::Subscriptor const, std::default_delete<dealii::Subscriptor const> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unique_ptr<dealii::Subscriptor const, std::default_delete<dealii::Subscriptor const> > > > >&) (fe_tools.templates.h:1072)
==14966==    by 0x5ED8859B: dealii::(anonymous namespace)::fill_default_map() (fe_tools.templates.h:1174)
==14966==    by 0x5ED88699: __static_initialization_and_destruction_0(int, int) (fe_tools.templates.h:1221)
==14966==    by 0x5ED886CC: _GLOBAL__sub_I_fe_tools.cc (fe_tools.cc:24)
==14966==    by 0x4010732: call_init (dl-init.c:72)
==14966==    by 0x4010732: _dl_init (dl-init.c:119)
==14966==    by 0x40151FE: dl_open_worker (dl-open.c:522)
==14966==    by 0x1F0662DE: _dl_catch_exception (dl-error-skeleton.c:196)
==14966==    by 0x40147C9: _dl_open (dl-open.c:605)
==14966==  Address 0x4d1371c8 is 0 bytes after a block of size 72 alloc'd
==14966==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14966==    by 0x5ED80AFA: make_unique<dealii::FETools::FEFactory<dealii::FE_Q_Hierarchical<1> > > (unique_ptr.h:825)
==14966==    by 0x5ED80AFA: void dealii::(anonymous namespace)::fill_no_codim_fe_names<1>(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<dealii::Subscriptor const, std::default_delete<dealii::Subscriptor const> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unique_ptr<dealii::Subscriptor const, std::default_delete<dealii::Subscriptor const> > > > >&) (fe_tools.templates.h:1072)
==14966==    by 0x5ED8859B: dealii::(anonymous namespace)::fill_default_map() (fe_tools.templates.h:1174)
==14966==    by 0x5ED88699: __static_initialization_and_destruction_0(int, int) (fe_tools.templates.h:1221)
==14966==    by 0x5ED886CC: _GLOBAL__sub_I_fe_tools.cc (fe_tools.cc:24)
==14966==    by 0x4010732: call_init (dl-init.c:72)
==14966==    by 0x4010732: _dl_init (dl-init.c:119)
==14966==    by 0x40151FE: dl_open_worker (dl-open.c:522)
==14966==    by 0x1F0662DE: _dl_catch_exception (dl-error-skeleton.c:196)
==14966==    by 0x40147C9: _dl_open (dl-open.c:605)
==14966==    by 0x1CCB4F95: dlopen_doit (dlopen.c:66)
==14966==    by 0x1F0662DE: _dl_catch_exception (dl-error-skeleton.c:196)
==14966==    by 0x1F06636E: _dl_catch_error (dl-error-skeleton.c:215)
==14966== 

To me it looks like a static member of the FETools namespace is causing issues upon its construction. Could this be related to the fact that we load the shared library (i.e. do we create the static member twice and causing issues by that?)? I would be grateful for pointers, as I am completely new to this FETools namespace and it is quite confusing.

@tjhei

This comment has been minimized.

Copy link
Member

tjhei commented Feb 1, 2019

Yes, that looks like the static construction incorrectly happens again when loading the .so. I will try to reproduce this here...

@tjhei

This comment has been minimized.

Copy link
Member

tjhei commented Feb 1, 2019

I found a number of small things, but I have a more severe problem that I could not debug in several hours today

Do you have the patch to make things compile? I am seeing

../source/particle/particle_handler.cc:861:23: error: no matching function for call to ‘dealii::parallel::distributed::Triangulation<3, 3>::register_data_attach

and I assume you tackled those already...

@tjhei

This comment has been minimized.

Copy link
Member

tjhei commented Feb 1, 2019

well, I just commented out the particle stuff and I can compile. I can not reproduce the bug with gcc 7.3.1 and a deal.II from a few days ago. Can you try rebuilding deal.II from scratch?

@gassmoeller

This comment has been minimized.

Copy link
Contributor Author

gassmoeller commented Feb 1, 2019

I rebased #1957 that should now compile with any recent deal.II, but I also found the issue:
Thanks for the pointer that normal shared libraries work, it was indeed just the shared libraries created by ctest that did not work, because they were linked against the wrong deal.II version (see #2796).
Anyway, this is not something in deal.II, but in my ASPECT setup. Thanks for the help, sometimes you just need someone to point out the obvious (like 'normal shared libraries work') 😄.

@gassmoeller gassmoeller closed this Feb 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.