Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto_advance parameter to PicardSolve #15614

Merged
merged 3 commits into from
Jul 23, 2020

Conversation

lindsayad
Copy link
Member

This allows for uniform time-step cutting across multi-app levels even
when not performing Picard between two levels, e.g. to prevent a
sub-application from auto-advancing despite the state of it's solve, a
user can specify auto_advance = false to require the master to cut its
timestep.

Closes #15166

@lindsayad lindsayad force-pushed the bug_multilevel branch 2 times, most recently from dc81ecc to d325696 Compare July 15, 2020 23:57
@moosebuild
Copy link
Contributor

moosebuild commented Jul 16, 2020

Job Documentation on db89cd3 wanted to post the following:

View the site here

This comment will be updated on new commits.

@moosebuild
Copy link
Contributor

Job App tests on d325696 : invalidated by @lindsayad

@moosebuild
Copy link
Contributor

Job Documentation on d325696 : invalidated by @lindsayad

@YaqiWang
Copy link
Contributor

I have to say this auto_auto makes my eyes bleeding ;-) before reviewing this.

@lindsayad
Copy link
Member Author

auto_advance ?

@YaqiWang
Copy link
Contributor

YaqiWang commented Jul 20, 2020

Nah, I have not reviewed this. I may like it after looking into the code or come up other suggestions. Just auto-auto word seems complicated to me.

@fdkong
Copy link
Contributor

fdkong commented Jul 20, 2020

I have to say this auto_auto makes my eyes bleeding ;-) before reviewing this.

auto_square might be better :-)

@lindsayad
Copy link
Member Author

I do not see auto auto anywhere

@lindsayad lindsayad changed the title Add auto_auto advance parameter to PicardSolve Add auto_advanced advance parameter to PicardSolve Jul 20, 2020
@lindsayad lindsayad changed the title Add auto_advanced advance parameter to PicardSolve Add auto_advance parameter to PicardSolve Jul 20, 2020
@lindsayad
Copy link
Member Author

Lol I see it was in the PR title. Changed the PR title. There is no auto auto in the code, nor auto_auto

Copy link
Contributor

@fdkong fdkong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few minor comments

@@ -0,0 +1,9 @@
# TimePostprocessor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explain this object using one sentence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the redundancy that that brings. See this postprocessor's doc page, vs. the one for TimestepSize which has an explicit description in the .md file. If I didn't have the !syntax description /Postprocessors/TimePostprocessor I would absolutely agree with you.

* Whether sub-applications are automatically advanced no matter what happens during their solves
*/
bool autoAdvance() const;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds we should use forceAutoAdvance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forceAutoAdvance sounds like it would be a setter-type method to me. This is just querying the sate of whether we are auto-advancing

@@ -99,6 +99,9 @@ PicardSolve::validParams()
params.addParam<bool>("update_xfem_at_timestep_begin",
false,
"Should XFEM update the mesh at the beginning of the timestep");
params.addParam<bool>("auto_advance",
"Whether to automatically advance sub-applications regardless of whether "
"their solve converges.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to document what use case requires us to advance the state even though sub-apps fail to solve?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what those use cases are. As you know I hate multiapps 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you, @YaqiWang, or @vincentlaboure know of some? I agree that I should add documentation about those cases. Otherwise yea even I don't understand why it's there!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of an example where auto_advance=true would be desired but it doesn't mean there isn't one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does sound counter logic. If the solve in suapp is not successful, why not stop but rather advance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not name the parameter as force_advance and default it to false?

Copy link
Member Author

@lindsayad lindsayad Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic auto advances sub-applications as long as you are not doing Picard. If I do what you are describing, then I will be changing the default behavior. Presumably we have tests and/or applications that rely on this default behavior. I don't know how this dumpster fire advanced to the point where we are at now, but I am terrified of modifying default behavior, as I assume the original code-writer had some reason for it being that way. This seems like a classic Chesterton's fence.

I am shocked any time I make changes in the PicardSolve/Transient/TransientMultiApp code system, and I don't break something. I suppose I could try changing the default and seeing whether any tests fail... How much should we bet that tests fail? 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have more shock before I created PicardSolve ;-) This is exactly we want to have tests. Breaking tests will force us to think of the design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test I added applies auto_advance = false. All other test cases in the world test the other case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the conclusion is that we still do not know why we have that option when subapps fail

vincentlaboure
vincentlaboure previously approved these changes Jul 21, 2020
Copy link
Contributor

@vincentlaboure vincentlaboure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test works as expected, thank you @lindsayad!

Copy link
Contributor

@YaqiWang YaqiWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know current Transient and TransientMultiApp are messy even after I factored out Picard stuff into PicardSolve object. This looks like a bandage to me. I think the key is to document that extra parameter carefully and having a test. Thus in the future when we refactoring Transient and TransientMultiApp, we know what we are dealing with.

@@ -99,6 +99,9 @@ PicardSolve::validParams()
params.addParam<bool>("update_xfem_at_timestep_begin",
false,
"Should XFEM update the mesh at the beginning of the timestep");
params.addParam<bool>("auto_advance",
"Whether to automatically advance sub-applications regardless of whether "
"their solve converges.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does sound counter logic. If the solve in suapp is not successful, why not stop but rather advance?

*/
virtual void finishStep() {}
virtual void finishStep(bool /*recurse_through_multiapp_levels*/ = false) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guy is doing nothing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh it is the base, nvm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this always recursively? Of cause I do not know the implications here ;-) just throwing wrenches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we should not. We also call this from incrementStepOrReject where we do not want to recurse through. But when the master solve is totally finished, then we also call this method, and it is then that we need to recurse through, otherwise we do not finish the steps of multiapp levels farther down than the first level.

@@ -99,6 +99,9 @@ PicardSolve::validParams()
params.addParam<bool>("update_xfem_at_timestep_begin",
false,
"Should XFEM update the mesh at the beginning of the timestep");
params.addParam<bool>("auto_advance",
"Whether to automatically advance sub-applications regardless of whether "
"their solve converges.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not name the parameter as force_advance and default it to false?

@YaqiWang
Copy link
Contributor

I guess I only need you to update TransientMultiApp.md or maybe Transient.md, then I will approve ;-)

fdkong
fdkong previously approved these changes Jul 22, 2020
Copy link
Contributor

@fdkong fdkong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with this PR, even though I really really want an example to demonstrate that we have to keep going when subapps fail and crash.

This allows for uniform time-step cutting across multi-app levels even
when not performing Picard between two levels, e.g. to prevent a
sub-application from auto-advancing despite the state of it's solve, a
user can specify `auto_advance = false` to require the master to cut its
timestep.

Closes idaholab#15166
@lindsayad lindsayad dismissed stale reviews from fdkong and vincentlaboure via face32d July 22, 2020 15:52
@lindsayad lindsayad marked this pull request as draft July 22, 2020 15:52
@lindsayad
Copy link
Member Author

I just pushed up a commit to never auto-advance...let's see what the results are

@lindsayad
Copy link
Member Author

I hate these systems with a fiery passion

@YaqiWang
Copy link
Contributor

Which systems?

@lindsayad
Copy link
Member Author

Ok I think my conclusion is this: within the current design we need to have auto_advance = true in order for TransientMultiApp to work with restart. This is because the incrementing of the sub-application state happens way after checkpoint output has occured. Checkpoint output happens in the master application's Transient::endStep; however, the incrementing of non-auto-advanced (Picard) sub-applications occur in the master application's call to Transient::incrementStepOrReject. So if an application is not auto-advanced, then restart data will show that sub-application as actually on the previous time step relative to the master application.

This conundrum could probably be fixed by having PICARD_END and TIMESTEP_END. However, that is a much bigger undertaking. For now, I think we should stick with our default of auto-advancing sub-applications when not doing Picard in order to ensure that those simulations can work with restart and recover. Then we will have this additional auto_advance parameter which the user can set if they want to.

How does that sound to people?

@lindsayad
Copy link
Member Author

I am OK with this PR, even though I really really want an example to demonstrate that we have to keep going when subapps fail and crash.

@fdkong hopefully I answered this in the above comment. The purpose of auto-advance is not to keep going even when subapps fail; the purpose is to keep the states of sub-applications in sync with the states of master applications whenever we can.

fdkong
fdkong previously approved these changes Jul 22, 2020
bool
PicardSolve::autoAdvance() const
{
bool auto_advance = !(_has_picard_its && _problem.isTransient());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add your findings right here? So we know why we have auto on when there is no picard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll add some good documentation to the .md files of what I've outlined in my comments here on github.

Note that I have this PR on draft, so I'm guessing the auto merge label isn't going to have an effect... Actually this could be an interesting test of CIVET 😄

@fdkong fdkong added the PR: Auto Merge Add this label to have CIVET merge on success label Jul 22, 2020
@lindsayad
Copy link
Member Author

lindsayad commented Jul 22, 2020

With this PR, there are now multiple ways to approach the possibility of a failed sub-app solve.

  1. You can set auto_advance = false in the Executioner block of the master application . This will cause the master application to immediately cut its time-step when the sub-application fails. However, setting this parameter also eliminates the possibility of doing restart/recover because the master and sub are out of sync when checkpoint output occurs.
  2. Set catch_up = true in the TransientMultiApp block. This will cause the sub-application to try and catch up to the master application after a sub-app failed solve. If catch-up is unsuccessful, then we register this as a true failure of the solve, and the master dt will then get cut. This option has the advantage of keeping the master and sub transient states in sync, enabling accurate restart/recover data.

@vincentlaboure I assume that you're aware of the catch_up parameter. You seem like a multi-app expert, whereas I am a newcomer.

@lindsayad lindsayad marked this pull request as ready for review July 23, 2020 14:40
@lindsayad
Copy link
Member Author

Ok, documentation added to TransientMultiApp.md

@vincentlaboure
Copy link
Contributor

With this PR, there are now multiple ways to approach the possibility of a failed sub-app solve.

  1. You can set auto_advance = false in the Executioner block of the master application . This will cause the master application to immediately cut its time-step when the sub-application fails. However, setting this parameter also eliminates the possibility of doing restart/recover because the master and sub are out of sync when checkpoint output occurs.
  2. Set catch_up = true in the TransientMultiApp block. This will cause the sub-application to try and catch up to the master application after a sub-app failed solve. If catch-up is unsuccessful, then we register this as a true failure of the solve, and the master dt will then get cut. This option has the advantage of keeping the master and sub transient states in sync, enabling accurate restart/recover data.

@vincentlaboure I assume that you're aware of the catch_up parameter. You seem like a multi-app expert, whereas I am a newcomer.

I actually have never used catch-up so I'll give it a try. Thanks for the detailed explanation!

@lindsayad
Copy link
Member Author

This is ready for review/merge

@fdkong fdkong merged commit 7511c47 into idaholab:next Jul 23, 2020
@fdkong
Copy link
Contributor

fdkong commented Jul 23, 2020

Great, I would like to use catch up all the time.

@lindsayad lindsayad deleted the bug_multilevel branch July 23, 2020 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: Auto Merge Add this label to have CIVET merge on success PR: Ready for review/merge
Projects
None yet
5 participants