Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652

jessecarterMOOSE · 2017-08-14T04:22:06Z

Description of the enhancement or error report

When manually turning off the mesh adaptivity with Adaptivity::setAdaptivityOn(false) inside of a user object, the solution can come out scrambled after a recover operation occurs. In addition, if the adaptivity call isn't in the right place (ie only in execute()), the mesh can become scrambled too.

Rationale for the enhancement or information for reproducing the error

Pull down my branch: https://github.com/jessecarterMOOSE/moose/tree/toggle_adaptivity and run:
./run_tests --recover toggle_mesh_adaptivity
FYI the mesh corruption isn't shown in that branch as-is, but if you take out initialSetup() in the user object the tests use, you'll get it.

Identified impact

Better adaptivity and recovery behavior

Here's an illustration:

last step before recover:

first solve after recover:

example of mesh corruption:

before recover (marker shown):

after recover (marker shown):

Tagging @permcody @roystgnr

The text was updated successfully, but these errors were encountered:

roystgnr · 2017-08-14T15:36:04Z

Is this the two circle marker test again? In dbg mode run_tests without '--re somethingorother' takes forever for me.

Which initialSetup() call needs to be disabled to replicate the failure? TestControl.C is the only one I see in test/ (and it doesn't look relevant) and there's dozens of those calls in framework/

jessecarterMOOSE · 2017-08-14T16:10:23Z

@roystgnr No this is a new test using existing markers in the test suite. There are only three tests so it should be pretty quick. If the adaptivity system was working as I had hoped, there shouldn't be ANY adaptivity going on apart from the initial refinement. You'll see the test files in tests/userobjects/toggle_mesh_adaptivity. They use a new user object in my branch which you can see at src/userobjects/ToggleMeshAdaptivity.C. That's the one with the initialSetup() call I referred to. It's actually the same call that is in execute(), but it needed to be called earlier in the simulation setup so I put it in both places. You'll see what I mean. It's pretty simple.

jessecarterMOOSE · 2017-08-14T19:59:08Z

@roystgnr I'm not sure if you saw the discussion on the MOOSE google group (here). This seems to me to be a two-part problem, but the two parts may be related.

When "manually" switching off mesh adaptivity (here from within a user object), the adaptivity gets switched back on upon recovering and the user object isn't executed early enough to turn it back off, so some "unwanted" adaptivity can occur upon recovery startup.
Even if I can get in front of that extra adaptivity by putting the call early enough in the startup logic (here in initialSetup()), the solution can still get messed up.

roystgnr · 2017-08-17T14:19:03Z

Okay, I can replicate this.

And in theory (2) should be a tiny fix - there's no way we should be doing any kind of adaptivity in between mesh recovery and solution recovery, so we just need to find that and turn it off. The "find that" part might be easier said than done...

roystgnr · 2017-08-17T14:44:39Z

But it's strange that we see changes to the solution, on one processor, in the "early enough in the startup logic" case where there are no changes to the mesh.

Wait a minute. In both Checkpoint.C and Resurrector.C, we're testing _fe_problem.adaptivity().isOn() when deciding whether to order solution data by current numbering or by libHilbert ordering. But if that value isn't consistent between the write and the read then we're basically asking for the solution to be scrambled.

roystgnr · 2017-08-17T14:47:20Z

I'm tempted to just turn off the partition_agnostic libHilbert ordering. MOOSE doesn't currently support N-to-M restarts regardless, but it does support multiple nodes at the same location. libHilbert is only useful for the former, and breaks if it encounters the latter, so it seems like it's a lose-lose option here.

roystgnr · 2017-08-17T15:04:25Z

Oh, wait, I fixed libHilbert to handle multiple nodes at the same location if libMesh was configured with unique_id() support. So I guess either "libHilbert always on" or "libHilbert always off" would be an option. I'll still go with "always off" for now, since that's the more efficient case and since there's a bit of other work to be done in MOOSE before we can support N-to-M restarts, but I'll leave a comment describing what we should change if we want to add that support in the future.

@jessecarterMOOSE

At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.

@jessecarterMOOSE

At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.

@jessecarterMOOSE

At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.

jessecarterMOOSE · 2017-08-17T15:47:57Z

@roystgnr I yield to you on the fix for 2. But for 1, if @permcody can refactor the logic for the Transient loop (like all of it), the problem might just go away.

I'm wondering if a simpler and quicker fix would be to make the _mesh_adaptivity_on member variable restartable. That way, when this line is hit in FEProblemBase::adaptMesh() when recovering, we will skip the adaptivity altogether because Adaptivity::isAdaptivityDue() will correctly return false. Also Adaptivity::isOn() will return the right thing.

However, it doesn't appear that the Adaptivity system is set up to handle recoverable data (according to its doxygen page). Can that be added?

@jessecarterMOOSE

At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.

roystgnr · 2017-08-17T15:54:44Z

Ha! I was just reopening this tab to say "a simpler and quicker fix would be to make the mesh adaptivity setting restartable" in slightly different words. I'm not certain that's the best fix, but if you already came up with the idea independently then that's evidence in its favor.

roystgnr · 2017-08-17T16:13:08Z

I'll defer to others on the final fix for 1, though. I'm learning more about MOOSE internals by the day, but there's still got to be someone else who can futz with Adaptivity parent classes in less of a bull-in-a-china-shop fashion.

@jessecarterMOOSE

At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.

permcody added C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations. labels Aug 14, 2017

roystgnr mentioned this issue Aug 17, 2017

Adaptivity-toggling test+fix #9694

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652

Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652

jessecarterMOOSE commented Aug 14, 2017 •

edited

roystgnr commented Aug 14, 2017

jessecarterMOOSE commented Aug 14, 2017

jessecarterMOOSE commented Aug 14, 2017 •

edited

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

jessecarterMOOSE commented Aug 17, 2017

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652

Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652

Comments

jessecarterMOOSE commented Aug 14, 2017 • edited

Description of the enhancement or error report

Rationale for the enhancement or information for reproducing the error

Identified impact

roystgnr commented Aug 14, 2017

jessecarterMOOSE commented Aug 14, 2017

jessecarterMOOSE commented Aug 14, 2017 • edited

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

jessecarterMOOSE commented Aug 17, 2017

roystgnr commented Aug 17, 2017

roystgnr commented Aug 17, 2017

jessecarterMOOSE commented Aug 14, 2017 •

edited

jessecarterMOOSE commented Aug 14, 2017 •

edited