New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solution and mesh scrambled when recovering and manually toggling mesh adaptivity off #9652
Comments
Is this the two circle marker test again? In dbg mode run_tests without '--re somethingorother' takes forever for me. Which initialSetup() call needs to be disabled to replicate the failure? TestControl.C is the only one I see in test/ (and it doesn't look relevant) and there's dozens of those calls in framework/ |
@roystgnr No this is a new test using existing markers in the test suite. There are only three tests so it should be pretty quick. If the adaptivity system was working as I had hoped, there shouldn't be ANY adaptivity going on apart from the initial refinement. You'll see the test files in tests/userobjects/toggle_mesh_adaptivity. They use a new user object in my branch which you can see at src/userobjects/ToggleMeshAdaptivity.C. That's the one with the |
@roystgnr I'm not sure if you saw the discussion on the MOOSE google group (here). This seems to me to be a two-part problem, but the two parts may be related.
|
Okay, I can replicate this. And in theory (2) should be a tiny fix - there's no way we should be doing any kind of adaptivity in between mesh recovery and solution recovery, so we just need to find that and turn it off. The "find that" part might be easier said than done... |
But it's strange that we see changes to the solution, on one processor, in the "early enough in the startup logic" case where there are no changes to the mesh. Wait a minute. In both Checkpoint.C and Resurrector.C, we're testing _fe_problem.adaptivity().isOn() when deciding whether to order solution data by current numbering or by libHilbert ordering. But if that value isn't consistent between the write and the read then we're basically asking for the solution to be scrambled. |
I'm tempted to just turn off the partition_agnostic libHilbert ordering. MOOSE doesn't currently support N-to-M restarts regardless, but it does support multiple nodes at the same location. libHilbert is only useful for the former, and breaks if it encounters the latter, so it seems like it's a lose-lose option here. |
Oh, wait, I fixed libHilbert to handle multiple nodes at the same location if libMesh was configured with unique_id() support. So I guess either "libHilbert always on" or "libHilbert always off" would be an option. I'll still go with "always off" for now, since that's the more efficient case and since there's a bit of other work to be done in MOOSE before we can support N-to-M restarts, but I'll leave a comment describing what we should change if we want to add that support in the future. |
At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.
At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.
At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.
@roystgnr I yield to you on the fix for 2. But for 1, if @permcody can refactor the logic for the Transient loop (like all of it), the problem might just go away. I'm wondering if a simpler and quicker fix would be to make the _mesh_adaptivity_on member variable restartable. That way, when this line is hit in FEProblemBase::adaptMesh() when recovering, we will skip the adaptivity altogether because Adaptivity::isAdaptivityDue() will correctly return false. Also Adaptivity::isOn() will return the right thing. However, it doesn't appear that the Adaptivity system is set up to handle recoverable data (according to its doxygen page). Can that be added? |
At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.
Ha! I was just reopening this tab to say "a simpler and quicker fix would be to make the mesh adaptivity setting restartable" in slightly different words. I'm not certain that's the best fix, but if you already came up with the idea independently then that's evidence in its favor. |
I'll defer to others on the final fix for 1, though. I'm learning more about MOOSE internals by the day, but there's still got to be someone else who can futz with Adaptivity parent classes in less of a bull-in-a-china-shop fashion. |
At the moment, MOOSE can't do N-to-M restarts with .rd files, which means MOOSE can't do N-to-M restarts. If we can't do N-to-M restarts, there's no point in using libHilbert, which adds a little CPU and a little more communication cost solely to support N-to-M restarts. Most importantly, we don't dare use libHilbert *inconsistently* - trying to select libHilbert ordering based on the state of the Adaptivity object was causing solution scrambling when user objects manually adjusted adaptivity settings mid-simulation. This fixes the second problem in idaholab#9652 for me, but probably shouldn't close that issue because it doesn't fix the first problem and I'm not sure if the initialSetup() workaround in his ToggleMeshAdaptivity test object is acceptable to @jessecarterMOOSE as a long-term solution.
Description of the enhancement or error report
When manually turning off the mesh adaptivity with
Adaptivity::setAdaptivityOn(false)
inside of a user object, the solution can come out scrambled after a recover operation occurs. In addition, if the adaptivity call isn't in the right place (ie only inexecute()
), the mesh can become scrambled too.Rationale for the enhancement or information for reproducing the error
Pull down my branch: https://github.com/jessecarterMOOSE/moose/tree/toggle_adaptivity and run:
./run_tests --recover toggle_mesh_adaptivity
FYI the mesh corruption isn't shown in that branch as-is, but if you take out
initialSetup()
in the user object the tests use, you'll get it.Identified impact
Better adaptivity and recovery behavior
Here's an illustration:
last step before recover:
first solve after recover:
example of mesh corruption:
before recover (marker shown):
after recover (marker shown):
Tagging @permcody @roystgnr
The text was updated successfully, but these errors were encountered: