Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particles with free surface hang #5594

Closed
ryanstoner1 opened this issue Feb 29, 2024 · 9 comments
Closed

Particles with free surface hang #5594

ryanstoner1 opened this issue Feb 29, 2024 · 9 comments

Comments

@ryanstoner1
Copy link
Contributor

When running particles with a free surface.

Both I and @vaturino used to get the error message:

An error occurred in line <62> of file </work2/09184/rstoner1/stampede3/software/deal.II-v9.5.2/tmp/unpack/deal.II-v9.5.2/source/particles/property_pool.cc> in function
    void dealii::Particles::PropertyPool<2, 2>::clear() [dim = 2, spacedim = 2]
The violated condition was:
    n_open_handles == 0
Additional information:
    This property pool currently still holds 18 open handles to memory
    that was allocated via allocate_properties_array() but that has not
    been returned via deregister_particle().

This behavior changed after #5446 (not in 2.5). Afterwards ASPECT hangs after at least the second time step without an error message.

I.e. after

*** Timestep 1426:  t=6.88599e+06 years, dt=2456.18 years

For the previous error message I was able to run a debugger and trace the crash to
https://github.com/geodynamics/aspect/blob/162b878a925d86cd54ac0dab1fa7eba04091bcd3/source/simulator/core.cc#L610C8-L610C42

ASPECT tends to hang (or crash previously) after AMR, and if AMR is turned off completely then no hanging or crash occurs.

Even if I increase the maximum particles per cell, so that no particles are removed then the issue still appears. With no free surface the behavior is normal, and no hanging occurs.

@gassmoeller
Copy link
Member

Could you post the full log.txt of an affected model and a minimal parameter file that reproduces the problem? This behavior should not happen and may be related to something about the internal particle handling in deal.II. Also in #5446 Wolfgang refers to some problem you reported, but I cannot see where that was reported, is there a forum thread or issue about it? That would help me get the context for the problem and what was already tried to solve it.

@vaturino
Copy link

vaturino commented Mar 1, 2024

Hi Rene, thank you for your help!
After re-compiling ASPECT with Wolfgang's fix (#5446), I still get the first error message that @ryanstoner1 reported. Attached is a zip file with the parameter file (and what's needed to run it) and the log file ASPECT prints out.
kinematic_particles_free_surface.tar.gz

@ryanstoner1
Copy link
Contributor Author

The problem Wolfgang referred to was brought up during one of the weekly user meetings, and was the same error mentioned at the beginning of my response (with the open handles that weren't returned).

@gassmoeller
Copy link
Member

Thanks for the files @vaturino and for the report @ryanstoner1. I think what you two are seeing are indeed the result of one (or two) bugs in deal.II.

  1. The Assert message should be fixed by my PR Properly deregister a particle if deleted during refinement dealii/dealii#16709 that I just opened. It is an oversight in the particle memory management by deal.II where some particles are deleted, but their memory is not properly freed. It should have no consequence for your model results and the error should disappear if you run your model in optimized mode instead of debug mode. (A side note: I hope you usually run your large application models in optimized mode. debug mode is mostly for setting up your models and development, see https://aspect-documentation.readthedocs.io/en/latest/user/run-aspect/debug-mode.html).

  2. I cannot immediately see how the error in 1. would lead to hanging models, however a deal.II developer just recently found a bug in the particle system that could indeed lead to hanging models (ParticleHandler: use tolerance for is_inside_unit_cell() check dealii/dealii#16691). A fix was merged last week so the latest deal.II development version should not contain that either.

Could one of you test your model with my deal.II branch of dealii/dealii#16709 which contains both fixes and let me know if that solves your crashes? You will also have to update your ASPECT to a recent development version though (and make sure you have sundials in deal.II active) as that is required for the latest ASPECT version to work.

@vaturino
Copy link

vaturino commented Mar 4, 2024

Hi @gassmoeller, thanks for you instructions. I usually run models in optimized mode, yes, but, due to a couple of previous issues, that particular model was run in debug mode. I tried running it again in optimized mode (both with ASPECT 2.5 and 2.6-pre), as suggested, but I get the same error message. I have re-installed aspect on stampede3 from scratch, so I also re-installed candi and deal.ii.
I am now trying to follow the instructions in your second point and see if that solves it, with aspect 2.6-pre. I'll write back as soon as I know if it worked, thanks for your help.

@ryanstoner1
Copy link
Contributor Author

  1. @vaturino and I used your branch to run models with particles, and it solves our issues with the caveat that we cannot use the master branch on Stampede because the latest deal.II isn't installing. Instead we reverted to an August, 2023 version and cherry-picked your commit from #16709.

We've triangulated the break (possibly intel-compiler related) to some time between Sep. 10-12, 2023, but I feel like that discussion would be best continued in #5569.

  1. I think the hanging models were potentially related to Stampede3 issues. Either way, the behavior is correct now after #16709

A separate issue is that the fix for 1. works with ASPECT 2.5, but ASPECT 2.6-pre doesn't compile, likely also best addressed in #5569. Currently working on this.

To summarize, the pull request solves our issues, but I'll continue working to find why more recent versions of deal.II don't compile on Stampede3.

@gassmoeller
Copy link
Member

Sounds good, thanks for the feedback, I will continue working on the deal.II PR to get it merged.

If you feel you found something during the installation that seems to be an issue in deal.II, feel free to open an issue in the deal.II repository, there are plenty of developers there to help.

@gassmoeller
Copy link
Member

It seems except for the compile problems discussed in #5569 this issue can be closed?

@ryanstoner1
Copy link
Contributor Author

Yes, closing it. The compile problems are separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants