Residual based automatic scaling #14397

lindsayad · 2019-11-18T20:48:37Z

Reason

Convergence checks, both linear and nonlinear, are based on residuals. In multi-physics simulations with bad comparative residual scaling a linear solve may be considered converged even when it doesn't sufficiently solve one variable resulting in a bad Newton step for that variable. This can eventually result in a failed non-linear solve (in the case of ReferenceResidualProblem) for example because the "less important" variable never converges. Alternatively for the traditional convergence check for FEProblem a non-linear solve may be considered converged even when the "less important" variable hasn't moved at all.

Design

Very similar to automatic scaling based on the Jacobian/preconditioning matrix but use the residual values. Moreover, we may want the default to be to compute it on every time step since loads may be changing dramatically for these multi-physics simulations.

Impact

Hopefully improve solve convergence for some multi-physics applications

The text was updated successfully, but these errors were encountered:

lindsayad · 2019-11-18T20:49:35Z

@bwspenc @novasr @friedmud @fdkong

fdkong · 2019-11-18T20:58:39Z

@lindsayad Actually we could do both if we want. Do you think it is reasonable ?

lindsayad · 2019-11-18T21:19:04Z

What do you mean? Have the option to do both? Or do both within the same simulation? I would struggle to see how we can use them both in one simulation

lindsayad · 2019-11-18T21:26:53Z

Would you scale the residual in a custom convergence check (both linear and non-linear) if you want to do both?

fdkong · 2019-11-18T21:31:18Z

OK, Residual scaling help bring the nonlinear residual to a nice scale, and help convergence check.

'Jacobian scaling' will improve the condition number of matrix.

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

But I am not sure if we can get any benefits or not by doing this.

tophmatthews · 2019-11-18T21:41:11Z

I struggle to see why this matters in reference residual, since each variable is compared against itself?

dschwen · 2019-11-18T22:22:31Z

I think it might matter for Eisenstat-Walker, which doesn't know anything about the scaling. Alex mentioned that ideally both the jacobian as well as the residual should be close to unity.

lindsayad · 2019-11-18T22:31:53Z

It could matter for any linear solve because one could envision a situation where even if the linear residual has been cranked down my orders of magnitude, if we were to examine a sub-vector of that linear residual corresponding to an individual variable we might find that it's norm has barely changed at all...

lindsayad · 2019-11-18T22:32:32Z

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

Ok, you want to do this?

bwspenc · 2019-11-18T23:01:50Z

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

So would it be applied on top of whatever other preconditioner was computed? I don't think that would work because the preconditioner was already computed to be correct for the physics, and scaling it would make it wrong.

bwspenc · 2019-11-18T23:09:36Z

I like this idea, but I'm a little skeptical that it will help. It seems like we just need to accept that the residuals of the individual solution variables can have widely varying scaling, and make all of the convergence checks work robustly when there is a wide disparity. Can we add a PETSc callback similar to the one we have now for checking nonlinear convergence, but for linear convergence? That wouldn't help us with Eisenstat-Walker, but I think it would completely cover us for the standard PJFNK algorithm.

fdkong · 2019-11-18T23:10:06Z

So would it be applied on top of whatever other preconditioner was computed? I don't think that would work because the preconditioner was already computed to be correct for the physics, and scaling it would make it wrong.

It will definitely work. The question if it is worthwhile to do that?

It will be applied to the linear level.

The simplest way is to use a PCCOMPOSITE https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCCOMPOSITE.html

But I would like to do a native implementation at the moose side

bwspenc · 2019-11-18T23:11:24Z

Oh, I guess the other place this matters is for the PJFNK epsilon. If it really would work to apply an additional scaling as @fdkong says it would, then it seems worth doing.

lindsayad · 2019-11-19T00:41:11Z

Epsilon calculation is based on the norm of the solution vector and the norm of whatever vector is involved in the matrix-vector product to be approximated. At least that is what I see in the calculation of the difference parameter for walker-pernice and Dennis-Schnaebel. So I’m not sure that residual scaling will matter for the selection of that parameter. So Fande how would this be done in a native MOOSE way? I understand conceptually the PC composite way.

…

On Nov 18, 2019, at 3:11 PM, Ben Spencer ***@***.***> wrote: Oh, I guess the other place this matters is for the PJFNK epsilon. If it really would work to apply an additional scaling as @fdkong says it would, then it seems worth doing. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lindsayad · 2019-11-19T03:19:47Z

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

friedmud · 2019-11-21T00:36:35Z

This is definitely NOT simply a convergence criteria issue. Improper scaling of the residual can lead to precision loss and non-convergence. This is especially true in JFNK.

@lindsayad that "vector in the matrix-vector product" is the residual...

Regardless of how epsilon is computed... having widely varying entries in the residual vector will mean there will be precision loss in the finite-difference... one way or the other.

Why can't we simply combine the two using a linear parameter? 0 for pure Jacobian scaling, 1 for pure residual scaling... and anything in-between for a mix of the two?

I'm really interested in seeing some BISON results with ranges of manual scaling applied to solid mechanics...

friedmud · 2019-11-21T00:48:34Z

Wait - @lindsayad ... @fdkong and I are looking at PETSc... something is definitely not right here. Will report back.

friedmud · 2019-11-21T01:09:30Z

@fdkong and I are going to talk to Barry about this. To us... the formulas for the differencing parameter look wrong. In their current form they could definitely be causing (possibly huge) precision loss.

We will report back soon...

lindsayad · 2019-11-21T02:57:26Z

@lindsayad that "vector in the matrix-vector product" is the residual...

Indeed (or recursive application of the Jacobian action to it). Good point! 👍

Why can't we simply combine the two using a linear parameter? 0 for pure Jacobian scaling, 1 for pure residual scaling... and anything in-between for a mix of the two?

Sure yea that sounds good.

To us... the formulas for the differencing parameter look wrong. In their current form they could definitely be causing (possibly huge) precision loss.

They don't match the formulas on the PETSc manual pages?

dschwen · 2019-11-21T03:26:44Z

Are we sure we want linear mixing of those factors? It seems to me that we might want logarithmic mixing (i.e. weighted average of the exponents in the scientific notation)

lindsayad · 2019-11-21T04:24:57Z

Ah yes definitely

…

On Nov 20, 2019, at 7:26 PM, Daniel Schwen ***@***.***> wrote: Are we sure we want linear mixing of those factors? It seems to me that we might want logarithmic mixing (i.e. weighted average of the exponents in the scientific notation) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

fdkong · 2019-11-21T04:45:43Z

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

It is unclear how much a Jacobian scaling will help after a residual scaling has been done.

It was why I said if it was worthwhile to implement both at the same time or not?

Anyway, it is not hard to implement.

scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES.
After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear the residual honors the residual scaling.

lindsayad · 2019-11-21T06:09:16Z

What you’re proposing certainly makes sense of convergence checks. But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action. It may not matter if the PMat is well scaled if we’re getting garbage from the approximation of the matrix-vector product. For PJFNK residual scaling may very well yield a better linear solve and newton step than pmat scaling. Or perhaps there’s a crossover point, motivating the manual scaling sweep of that @friedmud has asked for. It will also be easy to do this sweep automatically. This issue is my first priority after I put up my RANFS PR.

…

On Nov 20, 2019, at 8:45 PM, Fande Kong ***@***.***> wrote: It will definitely work. The question if it is worthwhile to do that? Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run. It is unclear how much a Jacobian scaling will help after a residual scaling has been done. It was why I said it was worthwhile to implement both. Anyway, it is not hard to implement. scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES. After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear residual honors the residual scaling, — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lindsayad · 2019-11-21T06:24:55Z

But I guess that’s what you were saying...a Jacobian scaling may not make sense after a residual scaling. I would be really curious though to try pure residual scaling and then do some kind of “Jacobian” scaling in a PCCOMPOSITE.

…

On Nov 20, 2019, at 10:09 PM, Alexander Lindsay ***@***.***> wrote: What you’re proposing certainly makes sense of convergence checks. But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action. It may not matter if the PMat is well scaled if we’re getting garbage from the approximation of the matrix-vector product. For PJFNK residual scaling may very well yield a better linear solve and newton step than pmat scaling. Or perhaps there’s a crossover point, motivating the manual scaling sweep of that @friedmud has asked for. It will also be easy to do this sweep automatically. This issue is my first priority after I put up my RANFS PR. >> On Nov 20, 2019, at 8:45 PM, Fande Kong ***@***.***> wrote: >> > > It will definitely work. The question if it is worthwhile to do that? > > Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run. > > It is unclear how much a Jacobian scaling will help after a residual scaling has been done. > > It was why I said it was worthwhile to implement both. > > Anyway, it is not hard to implement. > > scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES. > > After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear residual honors the residual scaling, > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe.

fdkong · 2019-11-21T06:31:47Z

But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action.

If Pmat is in the right scale, and then residual function for computing Jacobian should be in the right scale as Pmat is an approximation of the Jacobian matrix.

That being said: the residual callback from GMRES should use the scaling factor from PMat as these residual evaluations are used to compute derivatives.

And then convergence checks use the residual scaling.

lindsayad · 2019-11-21T15:55:29Z

If Pmat is in the right scale, and then residual function for computing Jacobian should be in the right scale as Pmat is an approximation of the Jacobian matrix.

But that BISON run perfectly illustrates that the residual and Pmat may have dramatically different scales

friedmud · 2019-11-21T19:24:37Z

Ok - after studying this closer with @fdkong ... there are 3 things that need to be handled.

The solution values for all of the physics need to be near 1
The residual values for all of the physics need to be near 1
The Jacobian should be scaled to improve the condition number (making diagonal entries near 1 is a good start).

(1) is new (to me)... but is critical. If the solution vector has wildly varying ranges in it... then there is no possibility of finding a perturbation that will be accurate for all of the entries.

This goes back to @lindsayad 's suggestion of non-dimensionalization. That's still not the answer... but I am starting to believe that we need to be a bit more careful about choosing units to make the values in the solution vector closer to "1".

Actually: all of this is making me rethink JFNK in general... we have some new ideas in AD... and maybe AD with Newton is just better because it doesn't suffer from these issues (not all of them anyway - the Jacobian scaling is still important).

lindsayad · 2019-11-21T19:53:25Z

This goes back to @lindsayad 's suggestion of non-dimensionalization. That's still not the answer... but I am starting to believe that we need to be a bit more careful about choosing units to make the values in the solution vector closer to "1".

This can yield the same result in practice. I've done this in TMAP.

Actually: all of this is making me rethink JFNK in general... we have some new ideas in AD... and maybe AD with Newton is just better because it doesn't suffer from these issues (not all of them anyway - the Jacobian scaling is still important).

Yay for NEWTON! Although PJFNK will still have to be an important player in many applications. Putting AD into libMesh will help along the road to NEWTON although I worry about the computational expense. I feel like there will be decisions to make between the computational expense of AD NEWTON vs the computational expense (in terms of added non-linear iterations)/lack of robustness of a poorly scaled PJFNK vs the developer/user effort required to construct a well-scaled PJFNK.

For the current BISON problem, the relative unit scaling is not great (T ~ 1000, disp ~ 1e-2) for finding a good perturbation, but I still think its worth some effort to see whether hybrid residual-Jacobian scaling (or pure residual scaling) can improve things. I'm working on the code right now. It shoudn't take me too long to get a PR up, maybe by the end of the day.

Closes idaholab#14397

novasr · 2019-11-21T23:03:08Z

@elementx54 and @acasagran, fyi

Closes idaholab#14397

lindsayad added T: task An enhancement to the software. P: normal A defect affecting operation with a low possibility of significantly affects. labels Nov 18, 2019

lindsayad self-assigned this Nov 18, 2019

permcody added the C: Framework label Nov 20, 2019

lindsayad added a commit to lindsayad/moose that referenced this issue Nov 21, 2019

Add ability to auto scale based on residuals

2d614b9

Closes idaholab#14397

lindsayad mentioned this issue Nov 21, 2019

Add ability to auto scale based on residuals #14422

Merged

lindsayad added a commit to lindsayad/moose that referenced this issue Nov 22, 2019

Add ability to auto scale based on residuals

7e79c41

Closes idaholab#14397

lindsayad mentioned this issue Jan 14, 2020

Protect kinematic formulation against scaling effects #14546

Merged

3 tasks

lindsayad added a commit to lindsayad/moose that referenced this issue Jan 15, 2020

Add ability to auto scale based on residuals

d7768b7

Closes idaholab#14397

lindsayad added a commit to lindsayad/moose that referenced this issue Jan 17, 2020

Add ability to auto scale based on residuals

b774703

Closes idaholab#14397

lindsayad added a commit to lindsayad/moose that referenced this issue Jan 18, 2020

Add ability to auto scale based on residuals

db4339e

Closes idaholab#14397

fdkong closed this as completed in #14422 Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Residual based automatic scaling #14397

Residual based automatic scaling #14397

lindsayad commented Nov 18, 2019

lindsayad commented Nov 18, 2019

fdkong commented Nov 18, 2019

lindsayad commented Nov 18, 2019

lindsayad commented Nov 18, 2019

fdkong commented Nov 18, 2019

tophmatthews commented Nov 18, 2019

dschwen commented Nov 18, 2019

lindsayad commented Nov 18, 2019

lindsayad commented Nov 18, 2019

bwspenc commented Nov 18, 2019 •

edited

bwspenc commented Nov 18, 2019

fdkong commented Nov 18, 2019 •

edited

bwspenc commented Nov 18, 2019

lindsayad commented Nov 19, 2019 via email

lindsayad commented Nov 19, 2019

friedmud commented Nov 21, 2019

friedmud commented Nov 21, 2019

friedmud commented Nov 21, 2019

lindsayad commented Nov 21, 2019 •

edited

dschwen commented Nov 21, 2019

lindsayad commented Nov 21, 2019 via email

fdkong commented Nov 21, 2019 •

edited

lindsayad commented Nov 21, 2019 via email

lindsayad commented Nov 21, 2019 via email

fdkong commented Nov 21, 2019 •

edited

lindsayad commented Nov 21, 2019

friedmud commented Nov 21, 2019

lindsayad commented Nov 21, 2019 •

edited

novasr commented Nov 21, 2019

Residual based automatic scaling #14397

Residual based automatic scaling #14397

Comments

lindsayad commented Nov 18, 2019

Reason

Design

Impact

lindsayad commented Nov 18, 2019

fdkong commented Nov 18, 2019

lindsayad commented Nov 18, 2019

lindsayad commented Nov 18, 2019

fdkong commented Nov 18, 2019

tophmatthews commented Nov 18, 2019

dschwen commented Nov 18, 2019

lindsayad commented Nov 18, 2019

lindsayad commented Nov 18, 2019

bwspenc commented Nov 18, 2019 • edited

bwspenc commented Nov 18, 2019

fdkong commented Nov 18, 2019 • edited

bwspenc commented Nov 18, 2019

lindsayad commented Nov 19, 2019 via email

lindsayad commented Nov 19, 2019

friedmud commented Nov 21, 2019

friedmud commented Nov 21, 2019

friedmud commented Nov 21, 2019

lindsayad commented Nov 21, 2019 • edited

dschwen commented Nov 21, 2019

lindsayad commented Nov 21, 2019 via email

fdkong commented Nov 21, 2019 • edited

lindsayad commented Nov 21, 2019 via email

lindsayad commented Nov 21, 2019 via email

fdkong commented Nov 21, 2019 • edited

lindsayad commented Nov 21, 2019

friedmud commented Nov 21, 2019

lindsayad commented Nov 21, 2019 • edited

novasr commented Nov 21, 2019

bwspenc commented Nov 18, 2019 •

edited

fdkong commented Nov 18, 2019 •

edited

lindsayad commented Nov 21, 2019 •

edited

fdkong commented Nov 21, 2019 •

edited

fdkong commented Nov 21, 2019 •

edited

lindsayad commented Nov 21, 2019 •

edited