Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floating point exception (clang6) #2211

Closed
tjhei opened this issue May 4, 2018 · 12 comments
Closed

floating point exception (clang6) #2211

tjhei opened this issue May 4, 2018 · 12 comments
Milestone

Comments

@tjhei
Copy link
Member

tjhei commented May 4, 2018

I am getting

Thread 1 "aspect" received signal SIGFPE, Arithmetic exception.
0x0000000001a1f19d in aspect::Assemblers::StokesIncompressibleTerms<2>::execute (this=0x603000556c38, scratch_base=..., data_base=...)
    at ../source/simulator/assemblers/stokes.cc:205
205	          for (unsigned int i=0; i<stokes_dofs_per_cell; ++i)

when running solcx in the second nonlinear solve. I am not sure what is going on here.

@tjhei tjhei added this to the 2.0 milestone May 4, 2018
@gassmoeller
Copy link
Member

I hit an assertion in sol_cx and other places with deal.II 9.0, might be related, but is slightly different:

--------------------------------------------------------
An error occurred in line <7264> of file </home/rengas/Software/dealii/include/deal.II/numerics/vector_tools.templates.h> in function
    void dealii::VectorTools::internal::do_integrate_difference(const dealii::hp::MappingCollection<dim, spacedim>&, const DoFHandlerType&, const InVector&, const dealii::Function<spacedim>&, OutVector&, const dealii::hp::QCollection<dim>&, const dealii::VectorTools::NormType&, const dealii::Function<spacedim>*, double) [with int dim = 2; InVector = dealii::TrilinosWrappers::MPI::BlockVector; OutVector = dealii::Vector<float>; DoFHandlerType = dealii::DoFHandler<2, 2>; int spacedim = 2]
The violated condition was: 
    exact_solution.n_components==n_components
Additional information: 
    Dimension 1 not equal to 4.

Stacktrace:
-----------
#0  /home/rengas/Software/deal.II-dev/lib/libdeal_II.g.so.9.0.0-rc0: 
#1  /home/rengas/Software/deal.II-dev/lib/libdeal_II.g.so.9.0.0-rc0: void dealii::VectorTools::integrate_difference<2, dealii::TrilinosWrappers::MPI::BlockVector, dealii::Vector<float>, 2>(dealii::Mapping<2, 2> const&, dealii::DoFHandler<2, 2> const&, dealii::TrilinosWrappers::MPI::BlockVector const&, dealii::Function<2, double> const&, dealii::Vector<float>&, dealii::Quadrature<2> const&, dealii::VectorTools::NormType const&, dealii::Function<2, double> const*, double)
#2  ./libsol_cx_2.so: aspect::InclusionBenchmark::SolCxPostprocessor<2>::execute(dealii::TableHandler&)
#3  ../aspect: aspect::Postprocess::Manager<2>::execute(dealii::TableHandler&)
#4  ../aspect: aspect::Simulator<2>::postprocess()
#5  ../aspect: aspect::Simulator<2>::run()
#6  ../aspect: void run_simulator<2>(std::string const&, bool, bool)
#7  ../aspect: main
--------------------------------------------------------

I have an idea where it is coming from and will fix, lets see if that solves your issue as well.

@gassmoeller
Copy link
Member

The fix for my problem is in #2214, does your error still occur after that fix?

@gassmoeller
Copy link
Member

I can reproduce exactly your error message on ubuntu 14.04 (clang 6 manually installed) and 18.04 (clang 6 installed from repository). I can not see what is happening though. Is there a tester for deal.II with clang 6 and this particular setup? Then we could at least narrow down if the problem is in aspect or deal.II.

@gassmoeller
Copy link
Member

This is the full callstack:

[cb45769cb27a:06347] *** Process received signal ***
[cb45769cb27a:06347] Signal: Floating point exception (8)
[cb45769cb27a:06347] Signal code: Invalid floating point operation (7)
[cb45769cb27a:06347] Failing at address: 0x135f71c
[cb45769cb27a:06347] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f0098b0a890]
[cb45769cb27a:06347] [ 1] ../aspect(_ZNK6aspect10Assemblers25StokesIncompressibleTermsILi2EE7executeERNS_8internal8Assembly7Scratch11ScratchBaseILi2EEERNS4_8CopyData12CopyDataBaseILi2EEE+0x3ec)[0x135f71c]
[cb45769cb27a:06347] [ 2] ../aspect(_ZN6aspect9SimulatorILi2EE28local_assemble_stokes_systemERKN6dealii18TriaActiveIteratorINS2_15DoFCellAccessorINS2_10DoFHandlerILi2ELi2EEELb0EEEEERNS_8internal8Assembly7Scratch12StokesSystemILi2EEERNSC_8CopyData12StokesSystemILi2EEE+0x396)[0x128c3c6]
[cb45769cb27a:06347] [ 3] ../aspect(_ZNSt5_BindIFMN6aspect9SimulatorILi2EEEFvRKN6dealii18TriaActiveIteratorINS3_15DoFCellAccessorINS3_10DoFHandlerILi2ELi2EEELb0EEEEERNS0_8internal8Assembly7Scratch12StokesSystemILi2EEERNSD_8CopyData12StokesSystemILi2EEEEPS2_St12_PlaceholderILi1EESP_ILi2EESP_ILi3EEEE6__callIvJRNS3_16FilteredIteratorIS9_EESH_SL_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x95)[0x129e425]
[cb45769cb27a:06347] [ 4] ../aspect(_ZNSt5_BindIFMN6aspect9SimulatorILi2EEEFvRKN6dealii18TriaActiveIteratorINS3_15DoFCellAccessorINS3_10DoFHandlerILi2ELi2EEELb0EEEEERNS0_8internal8Assembly7Scratch12StokesSystemILi2EEERNSD_8CopyData12StokesSystemILi2EEEEPS2_St12_PlaceholderILi1EESP_ILi2EESP_ILi3EEEEclIJRNS3_16FilteredIteratorIS9_EESH_SL_EvEET0_DpOT_+0x51)[0x129dc21]
[cb45769cb27a:06347] [ 5] ../aspect(_ZN6dealii10WorkStream3runISt5_BindIFMN6aspect9SimulatorILi2EEEFvRKNS_18TriaActiveIteratorINS_15DoFCellAccessorINS_10DoFHandlerILi2ELi2EEELb0EEEEERNS3_8internal8Assembly7Scratch12StokesSystemILi2EEERNSF_8CopyData12StokesSystemILi2EEEEPS5_St12_PlaceholderILi1EESR_ILi2EESR_ILi3EEEES2_IFMS5_FvRKSM_ESQ_SS_EENS_16FilteredIteratorISB_EESI_SM_EEvRKT1_RKNS_8identityIS15_E4typeET_T0_RKT2_RKT3_jj+0x107)[0x128cea7]
[cb45769cb27a:06347] [ 6] ../aspect(_ZN6aspect9SimulatorILi2EE22assemble_stokes_systemEv+0x43f)[0x128cbef]
[cb45769cb27a:06347] [ 7] ../aspect(_ZN6aspect9SimulatorILi2EE25assemble_and_solve_stokesEbPd+0xaa)[0x13a80ea]
[cb45769cb27a:06347] [ 8] ../aspect(_ZN6aspect9SimulatorILi2EE34solve_no_advection_iterated_stokesEv+0x7a)[0x13a842a]
[cb45769cb27a:06347] [ 9] ../aspect(_ZN6aspect9SimulatorILi2EE14solve_timestepEv+0x14e)[0x12d7d5e]
[cb45769cb27a:06347] [10] ../aspect(_ZN6aspect9SimulatorILi2EE3runEv+0x300)[0x12d70d0]
[cb45769cb27a:06347] [11] ../aspect(_Z13run_simulatorILi2EEvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0xca)[0x10d364a]
[cb45769cb27a:06347] [12] ../aspect(main+0x33a)[0x10d2e3a]
[cb45769cb27a:06347] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f0098728b97]
[cb45769cb27a:06347] [14] ../aspect(_start+0x2a)[0x10627aa]
[cb45769cb27a:06347] *** End of error message ***

Does it tell us anything that the exception is raised from libpthread?

@bangerth
Copy link
Contributor

bangerth commented May 7, 2018

Demangled, this looks as follows:

[cb45769cb27a:06347] *** Process received signal ***
[cb45769cb27a:06347] Signal: Floating point exception (8)
[cb45769cb27a:06347] Signal code: Invalid floating point operation (7)
[cb45769cb27a:06347] Failing at address: 0x135f71c
[cb45769cb27a:06347] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f0098b0a890]
[cb45769cb27a:06347] [ 1] ../aspect(aspect::Assemblers::StokesIncompressibleTerms<2>::execute(aspect::internal::Assembly::Scratch::ScratchBase<2>&, aspect::internal::Assembly::CopyData::CopyDataBase<2>&) const+0x3ec)[0x135f71c]
[cb45769cb27a:06347] [ 2] ../aspect(aspect::Simulator<2>::local_assemble_stokes_system(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)+0x396)[0x128c3c6]
[cb45769cb27a:06347] [ 3] ../aspect(void std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>::__call<void, dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&, 0ul, 1ul, 2ul, 3ul>(std::tuple<dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul>)+0x95)[0x129e425]
[cb45769cb27a:06347] [ 4] ../aspect(void std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>::operator()<dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&, void>(dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)+0x51)[0x129dc21]
[cb45769cb27a:06347] [ 5] ../aspect(void dealii::WorkStream::run<std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>, std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>))(aspect::internal::Assembly::CopyData::StokesSystem<2> const&)>, dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >, aspect::internal::Assembly::Scratch::StokesSystem<2>, aspect::internal::Assembly::CopyData::StokesSystem<2> >(dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > > const&, dealii::identity<dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > > >::type const&, std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>, std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>))(aspect::internal::Assembly::CopyData::StokesSystem<2> const&)>, aspect::internal::Assembly::Scratch::StokesSystem<2> const&, aspect::internal::Assembly::CopyData::StokesSystem<2> const&, unsigned int, unsigned int)+0x107)[0x128cea7]
[cb45769cb27a:06347] [ 6] ../aspect(aspect::Simulator<2>::assemble_stokes_system()+0x43f)[0x128cbef]
[cb45769cb27a:06347] [ 7] ../aspect(aspect::Simulator<2>::assemble_and_solve_stokes(bool, double*)+0xaa)[0x13a80ea]
[cb45769cb27a:06347] [ 8] ../aspect(aspect::Simulator<2>::solve_no_advection_iterated_stokes()+0x7a)[0x13a842a]
[cb45769cb27a:06347] [ 9] ../aspect(aspect::Simulator<2>::solve_timestep()+0x14e)[0x12d7d5e]
[cb45769cb27a:06347] [10] ../aspect(aspect::Simulator<2>::run()+0x300)[0x12d70d0]
[cb45769cb27a:06347] [11] ../aspect(void run_simulator<2>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool)+0xca)[0x10d364a]
[cb45769cb27a:06347] [12] ../aspect(main+0x33a)[0x10d2e3a]

I don't think that the problem is inside libpthread, but only that the pthread library has installed a signal handler that cleans up the thread before it aborts the program.

@bangerth
Copy link
Contributor

bangerth commented May 7, 2018

In other words, I assume that the problem happens inside the execute() function. Can you narrow things down by putting printf statements in there?

@gassmoeller
Copy link
Member

@bangerth: Timo posted the line that gdb shows, but that makes no sense, because there is no floating point operation there. So it must happen at some random point before. Would printf check for FPE's?

A few more informations:

  • If I disable FP_Exceptions with clang6 the tests run fine.
  • If I try to use gcc-7.3.0 on the same system I am testing at the moment (ubuntu 18.04) it does not allow me to enable FPEs.

So does our test just incorrectly assume that FPEs would work on this system? Then #2225 would be the solution. Should we just go with that and make the release? I do not see a reason why clang6 should suddenly find errors that other compilers did not find before.

@tjhei
Copy link
Member Author

tjhei commented May 7, 2018

I do not see a reason why clang6 should suddenly find errors that other compilers did not find before.

I assume clang is more aggressive in optimizing the code in debug mode. Without FP exceptions, it is of course legal to optimize something like

const double bdf2_factor = (use_bdf2_scheme)? ((2*time_step + old_time_step) /
                                                           (time_step + old_time_step)) : 1.0;

and always do the divide. I don't think we have a bug in our code.

Should we just go with that and make the release?

Hardcoding a check like this for a specific compiler version is not ideal. I would prefer to extend the check. Give me a plane ride to see if I can figure this out. ;-)

@bangerth
Copy link
Contributor

bangerth commented May 8, 2018

Let's let @tjhei have his plane ride :-)

@gassmoeller -- no, printf doesn't fix the issue of course. I just meant this as a way to figure out in which line the problem happens -- put some printfs throughout the function and see which ones get executed before the exception happens. printf is an expensive and non-inlined function, so the compile will generally not move instructions across these calls. That means that if a particular printf shows its output, the offending instruction must indeed be in the lines that follow.

@tjhei
Copy link
Member Author

tjhei commented May 8, 2018

So, my guess was correct: clang is optimizing around simple bool checks an eagerly evaluates expressions that contain floating point exceptions like the bdf2_factor above. I can work around this by moving it into a separate function, for example.
Note that I am hitting similar problems in other functions...

I tried extending our FPE check to contain code similar to this, but I haven't succeeded in making it fail the check.

So what do we do? Try to disable these clang optimizations? rewrite the functions to be safe? blacklist all clang 6.0+ for FPEs?

@bangerth
Copy link
Contributor

bangerth commented May 8, 2018

That's clearly a compiler bug then. I vote to just disable FPEs for clang 6, as already implemented in #2225. This has the advantage that (i) we don't further obfuscate our source code, (ii) don't penalize everyone who is using a different compiler. The number of people who would be impacted by #2225 is likely quite small, and that's useful.

@tjhei
Copy link
Member Author

tjhei commented May 8, 2018

while not "fixed", let's close this with #2225 as the solution.

@tjhei tjhei closed this as completed May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants