Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not compare against NaN time #4303

Merged
merged 1 commit into from Aug 12, 2021

Conversation

anne-glerum
Copy link
Contributor

After restart, the setup_dofs function in mesh_deformation/interface.cc is evaluated when sim.time is a NaN. On my local machine, this does not give any issues, but on our cluster, restarted models fail due to this.

I've replaced the condition with what is used several lines below to deform the initial mesh. However:

When setup_dofs is evaluated at the start of a model run,
sim.time / this->get_time() is 0
sim.timestep_number / this->get_timestep_number() is max int

When setup_dofs is evaluated after restart,
sim.time / this->get_time() is NaN
sim.timestep_number / this->get_timestep_number() is max int

So the timestep_number comparison seems superfluous, as that condition is never met.

For all pull requests:

For new features/models or changes of existing features:

  • I have tested my new feature locally to ensure it is correct.
  • I have created a testcase for the new feature/benchmark in the tests/ directory.
  • I have added a changelog entry in the doc/modules/changes directory that will inform other users of my change.

Copy link
Member

@gassmoeller gassmoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is correct, the comparison against time does not seem reasonable (we could for example also have a start time that is different from 0).

However I think the timestep number comparison is important. The two cases you mentioned are correct, but there is also the case of initial refinement steps (see core.cc:1961). After each initial refinement step setup_dofs() is called again, and at that time simulator_is_past_initialization is already set to true (core.cc:1940) and timestep_number is set to 0 (core.cc:1919) so in order to apply the initial topography after initial refinement we need this condition. Or am I misreading this? I have not actually checked in models what happens.

@gassmoeller
Copy link
Member

Oh, and concerning the checkpoints: The difference in time comes from the order in which we read things after resuming from a checkpoint. Currently, we first load the mesh and solution, then call setup_dofs, then load the serialized member variables of the Simulator class. This is a bit weird, but had not affect as long as setup_dofs does not depend on a serialized state. I don't think anything would prevent us from changing that order (e.g. load serialized state then load mesh then call setup_dofs) in checkpoint_restart.cc:resume_from_snapshot(). There is one call to signals.post_resume_load_user_data(triangulation); that may depend on the mesh and may need to be moved, but otherwise you could probably move the whole try-catch block that reads the restart.resume.z file before the part that loads the triangulation.

@gassmoeller gassmoeller merged commit 5b92350 into geodynamics:master Aug 12, 2021
@anne-glerum
Copy link
Contributor Author

This change is correct, the comparison against time does not seem reasonable (we could for example also have a start time that is different from 0).

However I think the timestep number comparison is important. The two cases you mentioned are correct, but there is also the case of initial refinement steps (see core.cc:1961). After each initial refinement step setup_dofs() is called again, and at that time simulator_is_past_initialization is already set to true (core.cc:1940) and timestep_number is set to 0 (core.cc:1919) so in order to apply the initial topography after initial refinement we need this condition. Or am I misreading this? I have not actually checked in models what happens.

Good point. I tested it without initial refinement, so it didn't show up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants