Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652

Open
jessecarterMOOSE opened this issue Aug 14, 2017 · 10 comments
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations.

Comments

@jessecarterMOOSE
Copy link
Contributor

jessecarterMOOSE commented Aug 14, 2017

Description of the enhancement or error report

When manually turning off the mesh adaptivity with Adaptivity::setAdaptivityOn(false) inside of a user object, the solution can come out scrambled after a recover operation occurs. In addition, if the adaptivity call isn't in the right place (ie only in execute()), the mesh can become scrambled too.

Rationale for the enhancement or information for reproducing the error

Pull down my branch: https://github.com/jessecarterMOOSE/moose/tree/toggle_adaptivity and run:
./run_tests --recover toggle_mesh_adaptivity
FYI the mesh corruption isn't shown in that branch as-is, but if you take out initialSetup() in the user object the tests use, you'll get it.

Identified impact

Better adaptivity and recovery behavior

Here's an illustration:

last step before recover:
solve-before

first solve after recover:
solve-after

example of mesh corruption:

before recover (marker shown):
no-solve-before

after recover (marker shown):
no-solve-after

Tagging @permcody @roystgnr

@roystgnr
Copy link
Contributor

Is this the two circle marker test again? In dbg mode run_tests without '--re somethingorother' takes forever for me.

Which initialSetup() call needs to be disabled to replicate the failure? TestControl.C is the only one I see in test/ (and it doesn't look relevant) and there's dozens of those calls in framework/

@jessecarterMOOSE
Copy link
Contributor Author

@roystgnr No this is a new test using existing markers in the test suite. There are only three tests so it should be pretty quick. If the adaptivity system was working as I had hoped, there shouldn't be ANY adaptivity going on apart from the initial refinement. You'll see the test files in tests/userobjects/toggle_mesh_adaptivity. They use a new user object in my branch which you can see at src/userobjects/ToggleMeshAdaptivity.C. That's the one with the initialSetup() call I referred to. It's actually the same call that is in execute(), but it needed to be called earlier in the simulation setup so I put it in both places. You'll see what I mean. It's pretty simple.

@permcody permcody added C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations. labels Aug 14, 2017
@jessecarterMOOSE
Copy link
Contributor Author

jessecarterMOOSE commented Aug 14, 2017

@roystgnr I'm not sure if you saw the discussion on the MOOSE google group (here). This seems to me to be a two-part problem, but the two parts may be related.

  1. When "manually" switching off mesh adaptivity (here from within a user object), the adaptivity gets switched back on upon recovering and the user object isn't executed early enough to turn it back off, so some "unwanted" adaptivity can occur upon recovery startup.
  2. Even if I can get in front of that extra adaptivity by putting the call early enough in the startup logic (here in initialSetup()), the solution can still get messed up.

@roystgnr
Copy link
Contributor

Okay, I can replicate this.

And in theory (2) should be a tiny fix - there's no way we should be doing any kind of adaptivity in between mesh recovery and solution recovery, so we just need to find that and turn it off. The "find that" part might be easier said than done...

@roystgnr
Copy link
Contributor

But it's strange that we see changes to the solution, on one processor, in the "early enough in the startup logic" case where there are no changes to the mesh.

Wait a minute. In both Checkpoint.C and Resurrector.C, we're testing _fe_problem.adaptivity().isOn() when deciding whether to order solution data by current numbering or by libHilbert ordering. But if that value isn't consistent between the write and the read then we're basically asking for the solution to be scrambled.

@roystgnr
Copy link
Contributor

I'm tempted to just turn off the partition_agnostic libHilbert ordering. MOOSE doesn't currently support N-to-M restarts regardless, but it does support multiple nodes at the same location. libHilbert is only useful for the former, and breaks if it encounters the latter, so it seems like it's a lose-lose option here.

@roystgnr
Copy link
Contributor

Oh, wait, I fixed libHilbert to handle multiple nodes at the same location if libMesh was configured with unique_id() support. So I guess either "libHilbert always on" or "libHilbert always off" would be an option. I'll still go with "always off" for now, since that's the more efficient case and since there's a bit of other work to be done in MOOSE before we can support N-to-M restarts, but I'll leave a comment describing what we should change if we want to add that support in the future.

roystgnr added a commit to roystgnr/moose that referenced this issue Aug 17, 2017
At the moment, MOOSE can't do N-to-M restarts with .rd files, which
means MOOSE can't do N-to-M restarts.

If we can't do N-to-M restarts, there's no point in using libHilbert,
which adds a little CPU and a little more communication cost solely to
support N-to-M restarts.

Most importantly, we don't dare use libHilbert *inconsistently* -
trying to select libHilbert ordering based on the state of the
Adaptivity object was causing solution scrambling when user objects
manually adjusted adaptivity settings mid-simulation.

This fixes the second problem in idaholab#9652 for me, but probably shouldn't
close that issue because it doesn't fix the first problem and I'm not
sure if the initialSetup() workaround in his ToggleMeshAdaptivity test
object is acceptable to @jessecarterMOOSE as a long-term solution.
roystgnr added a commit to roystgnr/moose that referenced this issue Aug 17, 2017
At the moment, MOOSE can't do N-to-M restarts with .rd files, which
means MOOSE can't do N-to-M restarts.

If we can't do N-to-M restarts, there's no point in using libHilbert,
which adds a little CPU and a little more communication cost solely to
support N-to-M restarts.

Most importantly, we don't dare use libHilbert *inconsistently* -
trying to select libHilbert ordering based on the state of the
Adaptivity object was causing solution scrambling when user objects
manually adjusted adaptivity settings mid-simulation.

This fixes the second problem in idaholab#9652 for me, but probably shouldn't
close that issue because it doesn't fix the first problem and I'm not
sure if the initialSetup() workaround in his ToggleMeshAdaptivity test
object is acceptable to @jessecarterMOOSE as a long-term solution.
roystgnr added a commit to roystgnr/moose that referenced this issue Aug 17, 2017
At the moment, MOOSE can't do N-to-M restarts with .rd files, which
means MOOSE can't do N-to-M restarts.

If we can't do N-to-M restarts, there's no point in using libHilbert,
which adds a little CPU and a little more communication cost solely to
support N-to-M restarts.

Most importantly, we don't dare use libHilbert *inconsistently* -
trying to select libHilbert ordering based on the state of the
Adaptivity object was causing solution scrambling when user objects
manually adjusted adaptivity settings mid-simulation.

This fixes the second problem in idaholab#9652 for me, but probably shouldn't
close that issue because it doesn't fix the first problem and I'm not
sure if the initialSetup() workaround in his ToggleMeshAdaptivity test
object is acceptable to @jessecarterMOOSE as a long-term solution.
@jessecarterMOOSE
Copy link
Contributor Author

@roystgnr I yield to you on the fix for 2. But for 1, if @permcody can refactor the logic for the Transient loop (like all of it), the problem might just go away.

I'm wondering if a simpler and quicker fix would be to make the _mesh_adaptivity_on member variable restartable. That way, when this line is hit in FEProblemBase::adaptMesh() when recovering, we will skip the adaptivity altogether because Adaptivity::isAdaptivityDue() will correctly return false. Also Adaptivity::isOn() will return the right thing.

However, it doesn't appear that the Adaptivity system is set up to handle recoverable data (according to its doxygen page). Can that be added?

roystgnr added a commit to roystgnr/moose that referenced this issue Aug 17, 2017
At the moment, MOOSE can't do N-to-M restarts with .rd files, which
means MOOSE can't do N-to-M restarts.

If we can't do N-to-M restarts, there's no point in using libHilbert,
which adds a little CPU and a little more communication cost solely to
support N-to-M restarts.

Most importantly, we don't dare use libHilbert *inconsistently* -
trying to select libHilbert ordering based on the state of the
Adaptivity object was causing solution scrambling when user objects
manually adjusted adaptivity settings mid-simulation.

This fixes the second problem in idaholab#9652 for me, but probably shouldn't
close that issue because it doesn't fix the first problem and I'm not
sure if the initialSetup() workaround in his ToggleMeshAdaptivity test
object is acceptable to @jessecarterMOOSE as a long-term solution.
@roystgnr
Copy link
Contributor

Ha! I was just reopening this tab to say "a simpler and quicker fix would be to make the mesh adaptivity setting restartable" in slightly different words. I'm not certain that's the best fix, but if you already came up with the idea independently then that's evidence in its favor.

@roystgnr
Copy link
Contributor

I'll defer to others on the final fix for 1, though. I'm learning more about MOOSE internals by the day, but there's still got to be someone else who can futz with Adaptivity parent classes in less of a bull-in-a-china-shop fashion.

jarons pushed a commit to jarons/moose that referenced this issue Oct 5, 2017
At the moment, MOOSE can't do N-to-M restarts with .rd files, which
means MOOSE can't do N-to-M restarts.

If we can't do N-to-M restarts, there's no point in using libHilbert,
which adds a little CPU and a little more communication cost solely to
support N-to-M restarts.

Most importantly, we don't dare use libHilbert *inconsistently* -
trying to select libHilbert ordering based on the state of the
Adaptivity object was causing solution scrambling when user objects
manually adjusted adaptivity settings mid-simulation.

This fixes the second problem in idaholab#9652 for me, but probably shouldn't
close that issue because it doesn't fix the first problem and I'm not
sure if the initialSetup() workaround in his ToggleMeshAdaptivity test
object is acceptable to @jessecarterMOOSE as a long-term solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations.
Projects
None yet
Development

No branches or pull requests

3 participants