Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug in advection assemblers #2252

Closed
YiminJin opened this issue May 18, 2018 · 8 comments
Closed

A bug in advection assemblers #2252

YiminJin opened this issue May 18, 2018 · 8 comments

Comments

@YiminJin
Copy link
Contributor

I believe that I have found a bug in assemblers of advection fields across all versions of ASPECT.

In struct internal::Assembly::Scratch::AdvectionSystem, there are pointers of FEFaceValues and FESubfaceValues, which will be newed by the constructor if discontinuous Galerkin method is chosen to solve the advection system. However, the copy constructor copies the pointer rather than FEFaceValues itself to the new object. So if the user use multi-threads, there will be several threads visiting the same auxiliary arrays in FEFaceValues at the same time, which will lead to mistake.

Since FEFaceValues cannot be copied, I think the correct way is to new a same one in the copy constructor.

Please let me know if it is really a bug, or I've made something wrong.

Thank you very much!

@gassmoeller
Copy link
Member

Hi YiminJin, welcome, and thanks for reporting this! This looks indeed like it could be a bug. We only introduced the multithreading support recently, and apparently this case was not tested. Could you post a minimal case that seems to cause the error? Is it enough to run aspect with -j on one of the discontinuous tests?
Thanks.

@YiminJin
Copy link
Contributor Author

You can add a line "set Use discontinuous composition discretization = true" in /tests/multicomponent_harmonic.prm, and try to run this test with -j. The output on my computer is:


-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
-- . version 2.1.0-pre (master, 86d90e2)
-- . using deal.II 8.5.1
-- . using Trilinos 12.12.1
-- . using p4est 1.1.0
-- . running in DEBUG mode
-- . running with 1 MPI process
-- . using 8 threads
-- How to cite ASPECT: https://aspect.geodynamics.org/cite.html

Number of active cells: 4,096 (on 6 levels)
Number of degrees of freedom: 164,964 (33,410+4,257+16,705+36,864+36,864+36,864)

*** Timestep 0: t=0 years
Solving temperature system... 0 iterations.
Solving C_1 system ... 0 iterations.
Solving C_2 system ... 0 iterations.
Solving C_3 system ... 0 iterations.
Rebuilding Stokes preconditioner...
Solving Stokes system... 30+12 iterations.


An error occurred in line <3457> of file </home/.../build/dealii/dealii-8.5.1/source/fe/fe_values.cc> in function
void dealii::FEValuesBase<dim, spacedim>::invalidate_present_cell() [with int dim = 2; int spacedim = 2]
The violated condition was:
present_cell.get() != 0
Additional information:
This exception -- which is used in many places in the library -- usually indicates that some condition which the author of the code thought must be satisfied at a certain point in an algorithm, is not fulfilled. An example would be that the first part of an algorithm sorts elements of an array in ascending order, and a second part of the algorithm later encounters an element that is not larger than the previous one.

There is usually not very much you can do if you encounter such an exception since it indicates an error in deal.II, not in your own program. Try to come up with the smallest possible program that still demonstrates the error and contact the deal.II mailing lists with it to obtain help.

Stacktrace:

#0 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: dealii::FEValuesBase<2, 2>::invalidate_present_cell()
#1 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: boost::detail::function::void_function_obj_invoker0<std::_Bind<std::_Mem_fn<void (dealii::FEValuesBase<2, 2>::*)()> (std::reference_wrapper<dealii::FEValuesBase<2, 2> >)>, void>::invoke(boost::detail::function::function_buffer&)
#2 ./aspect: boost::signals2::detail::signal_impl<void (), boost::signals2::optional_last_value, int, std::less, boost::function<void ()>, boost::function<void (boost::signals2::connection const&)>, boost::signals2::mutex>::operator()()
#3 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: boost::signals2::signal<void (), boost::signals2::optional_last_value, int, std::less, boost::function<void ()>, boost::function<void (boost::signals2::connection const&)>, boost::signals2::mutex>::operator()()
#4 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: dealii::Triangulation<2, 2>::execute_coarsening_and_refinement()
#5 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: dealii::parallel::distributed::Triangulation<2, 2>::execute_coarsening_and_refinement()
#6 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: dealii::parallel::distributed::Triangulation<2, 2>::copy_local_forest_to_triangulation()
#7 /opt/dealii/dealii-8.5.1/lib/libdeal_II.g.so.8.5.1: dealii::parallel::distributed::Triangulation<2, 2>::execute_coarsening_and_refinement()
#8 ./aspect: aspect::Simulator<2>::refine_mesh(unsigned int)
#9 ./aspect: aspect::Simulator<2>::maybe_do_initial_refinement(unsigned int)
#10 ./aspect: aspect::Simulator<2>::run()
#11 ./aspect: void run_simulator<2>(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool, bool)
#12 ./aspect: main

Aborted (core dumped)

The error message may not be the same each time you run this test. Of course, the program runs well with the original parameter file.

@bangerth
Copy link
Contributor

@YiminJin -- thanks for submitting this!

I think your diagnosis is exactly correct. In assemblers/interface.h, we have this declaration:

        template <int dim>
        struct AdvectionSystem: public ScratchBase<dim>
        {
          AdvectionSystem (const FiniteElement<dim> &finite_element,
                           const FiniteElement<dim> &advection_element,
                           const Mapping<dim>       &mapping,
                           const Quadrature<dim>    &quadrature,
                           const Quadrature<dim-1>  &face_quadrature,
                           const UpdateFlags         update_flags,
                           const UpdateFlags         face_update_flags,
                           const unsigned int        n_compositional_fields,
                           const typename Simulator<dim>::AdvectionField     &advection_field);
          AdvectionSystem (const AdvectionSystem &data);

          FEValues<dim> finite_element_values;

          std_cxx11::shared_ptr<FEFaceValues<dim> >    face_finite_element_values;
          std_cxx11::shared_ptr<FEFaceValues<dim> >    neighbor_face_finite_element_values;
          std_cxx11::shared_ptr<FESubfaceValues<dim> > subface_finite_element_values;

But then in assemblers/interface.cc, we have this copy constructor:

        template <int dim>
        AdvectionSystem<dim>::
        AdvectionSystem (const AdvectionSystem &scratch)
          :
          ScratchBase<dim>(scratch),

          finite_element_values (scratch.finite_element_values.get_mapping(),
                                 scratch.finite_element_values.get_fe(),
                                 scratch.finite_element_values.get_quadrature(),
                                 scratch.finite_element_values.get_update_flags()),
          face_finite_element_values (scratch.face_finite_element_values),
          neighbor_face_finite_element_values (scratch.neighbor_face_finite_element_values),
          subface_finite_element_values (scratch.subface_finite_element_values),

In other words, we correctly copy the finite_element_values member variable, whereas the other member variables are simply shared pointers.

The bug would have been obvious if we had (correctly) used std_cxx11::unique_ptr for these member variables because these are not copyable. So, the correct steps are:

  • use std_cxx11::unique_ptr
  • replace the direct copy with something like the following:
  face_finite_element_values (scratch.face_finite_element_values.get() != NULL ?
                                                new FEFaceValues<dim> (scratch.face_finite_element_values->get_mapping(),
                                                                                           scratch.face_finite_element_values->get_fe(),
                                                                                           scratch.face_finite_element_values->get_quadrature(),
                                                                                           scratch.face_finite_element_values->get_update_flags())
                                                NULL),

The same change has to be made for the other two member variables, obviously.

Do you want to try this and whether it fixes the problem? If you are not confident in making the change yourself, then I can take care of it. But it would be nice to have you among our list of developers!

@YiminJin
Copy link
Contributor Author

YiminJin commented May 21, 2018

@bangerth thanks for your detailed solution!

I'm a beginner of github and it took me some time to learn to use it. I have submitted a pull-request and fixed the problem there.

However, another problem arises when those bugs are fixed. When running ./aspect -j $PATH_TO_TESTS/discontinuous_composition_1.prm on my computer, it prints:


-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
-- . version 2.1.0-pre (iss#2252, 5f43cb7)
-- . using deal.II 8.5.1
-- . using Trilinos 12.12.1
-- . using p4est 1.1.0
-- . running in DEBUG mode
-- . running with 1 MPI process
-- . using 8 threads
-- How to cite ASPECT: https://aspect.geodynamics.org/cite.html

... (timesteps)

Termination requested by criterion: end time

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 1.26s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system | 18 | 0.0698s | 5.5% |
| Assemble composition system | 36 | 0.399s | 32% |
| Assemble temperature system | 18 | 0.144s | 11% |
| Build Stokes preconditioner | 18 | 0.0992s | 7.9% |
| Build composition preconditioner| 36 | 0.0498s | 4% |
| Build temperature preconditioner| 18 | 0.00621s | 0.49% |
| Solve Stokes system | 18 | 0.0524s | 4.2% |
| Solve composition system | 36 | 0.0209s | 1.7% |
| Solve temperature system | 18 | 0.00952s | 0.76% |
| Initialization | 1 | 0.035s | 2.8% |
| Postprocessing | 17 | 0.0254s | 2% |
| Refine mesh structure, part 1 | 9 | 0.0585s | 4.6% |
| Refine mesh structure, part 2 | 9 | 0.0064s | 0.51% |
| Setup dof systems | 10 | 0.0382s | 3% |
| Setup initial conditions | 2 | 0.00735s | 0.58% |
+---------------------------------+-----------+------------+------------+

WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
Option left: name:-j value: ../tests/discontinuous_composition_1.prm

I don't know what the warnings mean. I'm sure that multi-thread works because the program runs faster in system assembling than running without -j. Maybe I didn't write the command line in the right form?

gassmoeller added a commit that referenced this issue May 21, 2018
@gassmoeller
Copy link
Member

You did everything right, I think the problem is that PETSc reads parameters from your ASPECT command line and tries to interpret them (even if you do not use it). Could you check if the warning disappears if you apply the attached patch to main.cc? It essentially just filters out all of the ASPECT command line options before initializing PETSc.
patch.txt

@jdannberg jdannberg added this to Bugfixes in Issue tracker May 21, 2018
@YiminJin
Copy link
Contributor Author

@gassmoeller thank you! The patch works and the warnings disappear now.

@gassmoeller
Copy link
Member

Great, then I will open a pull request with that patch later.

@gassmoeller
Copy link
Member

Can this issue be closed?

Issue tracker automation moved this from Bugfixes to Done May 23, 2018
tjhei pushed a commit to tjhei/aspect that referenced this issue Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Issue tracker
  
Done
Development

No branches or pull requests

3 participants