Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Residual based automatic scaling #14397

Closed
lindsayad opened this issue Nov 18, 2019 · 29 comments · Fixed by #14422
Closed

Residual based automatic scaling #14397

lindsayad opened this issue Nov 18, 2019 · 29 comments · Fixed by #14422
Assignees
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software.

Comments

@lindsayad
Copy link
Member

Reason

Convergence checks, both linear and nonlinear, are based on residuals. In multi-physics simulations with bad comparative residual scaling a linear solve may be considered converged even when it doesn't sufficiently solve one variable resulting in a bad Newton step for that variable. This can eventually result in a failed non-linear solve (in the case of ReferenceResidualProblem) for example because the "less important" variable never converges. Alternatively for the traditional convergence check for FEProblem a non-linear solve may be considered converged even when the "less important" variable hasn't moved at all.

Design

Very similar to automatic scaling based on the Jacobian/preconditioning matrix but use the residual values. Moreover, we may want the default to be to compute it on every time step since loads may be changing dramatically for these multi-physics simulations.

Impact

Hopefully improve solve convergence for some multi-physics applications

@lindsayad lindsayad added T: task An enhancement to the software. P: normal A defect affecting operation with a low possibility of significantly affects. labels Nov 18, 2019
@lindsayad lindsayad self-assigned this Nov 18, 2019
@lindsayad
Copy link
Member Author

@fdkong
Copy link
Contributor

fdkong commented Nov 18, 2019

@lindsayad Actually we could do both if we want. Do you think it is reasonable ?

@lindsayad
Copy link
Member Author

What do you mean? Have the option to do both? Or do both within the same simulation? I would struggle to see how we can use them both in one simulation

@lindsayad
Copy link
Member Author

Would you scale the residual in a custom convergence check (both linear and non-linear) if you want to do both?

@fdkong
Copy link
Contributor

fdkong commented Nov 18, 2019

OK, Residual scaling help bring the nonlinear residual to a nice scale, and help convergence check.

'Jacobian scaling' will improve the condition number of matrix.

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

But I am not sure if we can get any benefits or not by doing this.

@tophmatthews
Copy link
Contributor

I struggle to see why this matters in reference residual, since each variable is compared against itself?

@dschwen
Copy link
Member

dschwen commented Nov 18, 2019

I think it might matter for Eisenstat-Walker, which doesn't know anything about the scaling. Alex mentioned that ideally both the jacobian as well as the residual should be close to unity.

@lindsayad
Copy link
Member Author

It could matter for any linear solve because one could envision a situation where even if the linear residual has been cranked down my orders of magnitude, if we were to examine a sub-vector of that linear residual corresponding to an individual variable we might find that it's norm has barely changed at all...

@lindsayad
Copy link
Member Author

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

Ok, you want to do this?

@bwspenc
Copy link
Contributor

bwspenc commented Nov 18, 2019

If we want to apply both, then 'Jacobian scaling' can be considered as a linear preconditioner in this context.

So would it be applied on top of whatever other preconditioner was computed? I don't think that would work because the preconditioner was already computed to be correct for the physics, and scaling it would make it wrong.

@bwspenc
Copy link
Contributor

bwspenc commented Nov 18, 2019

I like this idea, but I'm a little skeptical that it will help. It seems like we just need to accept that the residuals of the individual solution variables can have widely varying scaling, and make all of the convergence checks work robustly when there is a wide disparity. Can we add a PETSc callback similar to the one we have now for checking nonlinear convergence, but for linear convergence? That wouldn't help us with Eisenstat-Walker, but I think it would completely cover us for the standard PJFNK algorithm.

@fdkong
Copy link
Contributor

fdkong commented Nov 18, 2019

So would it be applied on top of whatever other preconditioner was computed? I don't think that would work because the preconditioner was already computed to be correct for the physics, and scaling it would make it wrong.

It will definitely work. The question if it is worthwhile to do that?

It will be applied to the linear level.

The simplest way is to use a PCCOMPOSITE https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCCOMPOSITE.html

But I would like to do a native implementation at the moose side

@bwspenc
Copy link
Contributor

bwspenc commented Nov 18, 2019

Oh, I guess the other place this matters is for the PJFNK epsilon. If it really would work to apply an additional scaling as @fdkong says it would, then it seems worth doing.

@lindsayad
Copy link
Member Author

lindsayad commented Nov 19, 2019 via email

@lindsayad
Copy link
Member Author

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

@friedmud
Copy link
Contributor

This is definitely NOT simply a convergence criteria issue. Improper scaling of the residual can lead to precision loss and non-convergence. This is especially true in JFNK.

@lindsayad that "vector in the matrix-vector product" is the residual...

Regardless of how epsilon is computed... having widely varying entries in the residual vector will mean there will be precision loss in the finite-difference... one way or the other.

Why can't we simply combine the two using a linear parameter? 0 for pure Jacobian scaling, 1 for pure residual scaling... and anything in-between for a mix of the two?

I'm really interested in seeing some BISON results with ranges of manual scaling applied to solid mechanics...

@friedmud
Copy link
Contributor

Wait - @lindsayad ... @fdkong and I are looking at PETSc... something is definitely not right here. Will report back.

@friedmud
Copy link
Contributor

@fdkong and I are going to talk to Barry about this. To us... the formulas for the differencing parameter look wrong. In their current form they could definitely be causing (possibly huge) precision loss.

We will report back soon...

@lindsayad
Copy link
Member Author

lindsayad commented Nov 21, 2019

@lindsayad that "vector in the matrix-vector product" is the residual...

Indeed (or recursive application of the Jacobian action to it). Good point! 👍

Why can't we simply combine the two using a linear parameter? 0 for pure Jacobian scaling, 1 for pure residual scaling... and anything in-between for a mix of the two?

Sure yea that sounds good.

To us... the formulas for the differencing parameter look wrong. In their current form they could definitely be causing (possibly huge) precision loss.

They don't match the formulas on the PETSc manual pages?

@dschwen
Copy link
Member

dschwen commented Nov 21, 2019

Are we sure we want linear mixing of those factors? It seems to me that we might want logarithmic mixing (i.e. weighted average of the exponents in the scientific notation)

@lindsayad
Copy link
Member Author

lindsayad commented Nov 21, 2019 via email

@fdkong
Copy link
Contributor

fdkong commented Nov 21, 2019

It will definitely work. The question if it is worthwhile to do that?

Why wouldn't it be worthwhile? The current automatic scaling implementation has been nothing but beneficial in single physics cases so I think it would be silly to abandon it simply because we are encountering some convergence issues in this single BISON run.

It is unclear how much a Jacobian scaling will help after a residual scaling has been done.

It was why I said if it was worthwhile to implement both at the same time or not?

Anyway, it is not hard to implement.

  1. scale Jacobian in the Jacobian evaluation, and at the same time, the residual should be adjusted accordingly. The residual can be retrieved from SNES.

  2. After the linear solver is done, we scale the system back in linesearch_precheck so that nonlinear the residual honors the residual scaling.

@lindsayad
Copy link
Member Author

lindsayad commented Nov 21, 2019 via email

@lindsayad
Copy link
Member Author

lindsayad commented Nov 21, 2019 via email

@fdkong
Copy link
Contributor

fdkong commented Nov 21, 2019

But for PJFNK we have to consider what @friedmud has been harping on...the residual will factor into selection of the differencing parameter and the accuracy of the approximation or the Jacobian action.

If Pmat is in the right scale, and then residual function for computing Jacobian should be in the right scale as Pmat is an approximation of the Jacobian matrix.

That being said: the residual callback from GMRES should use the scaling factor from PMat as these residual evaluations are used to compute derivatives.

And then convergence checks use the residual scaling.

@lindsayad
Copy link
Member Author

If Pmat is in the right scale, and then residual function for computing Jacobian should be in the right scale as Pmat is an approximation of the Jacobian matrix.

But that BISON run perfectly illustrates that the residual and Pmat may have dramatically different scales

@friedmud
Copy link
Contributor

Ok - after studying this closer with @fdkong ... there are 3 things that need to be handled.

  1. The solution values for all of the physics need to be near 1
  2. The residual values for all of the physics need to be near 1
  3. The Jacobian should be scaled to improve the condition number (making diagonal entries near 1 is a good start).

(1) is new (to me)... but is critical. If the solution vector has wildly varying ranges in it... then there is no possibility of finding a perturbation that will be accurate for all of the entries.

This goes back to @lindsayad 's suggestion of non-dimensionalization. That's still not the answer... but I am starting to believe that we need to be a bit more careful about choosing units to make the values in the solution vector closer to "1".

Actually: all of this is making me rethink JFNK in general... we have some new ideas in AD... and maybe AD with Newton is just better because it doesn't suffer from these issues (not all of them anyway - the Jacobian scaling is still important).

@lindsayad
Copy link
Member Author

lindsayad commented Nov 21, 2019

This goes back to @lindsayad 's suggestion of non-dimensionalization. That's still not the answer... but I am starting to believe that we need to be a bit more careful about choosing units to make the values in the solution vector closer to "1".

This can yield the same result in practice. I've done this in TMAP.

Actually: all of this is making me rethink JFNK in general... we have some new ideas in AD... and maybe AD with Newton is just better because it doesn't suffer from these issues (not all of them anyway - the Jacobian scaling is still important).

Yay for NEWTON! Although PJFNK will still have to be an important player in many applications. Putting AD into libMesh will help along the road to NEWTON although I worry about the computational expense. I feel like there will be decisions to make between the computational expense of AD NEWTON vs the computational expense (in terms of added non-linear iterations)/lack of robustness of a poorly scaled PJFNK vs the developer/user effort required to construct a well-scaled PJFNK.

For the current BISON problem, the relative unit scaling is not great (T ~ 1000, disp ~ 1e-2) for finding a good perturbation, but I still think its worth some effort to see whether hybrid residual-Jacobian scaling (or pure residual scaling) can improve things. I'm working on the code right now. It shoudn't take me too long to get a PR up, maybe by the end of the day.

@novasr
Copy link
Contributor

novasr commented Nov 21, 2019

@elementx54 and @acasagran, fyi

lindsayad added a commit to lindsayad/moose that referenced this issue Nov 22, 2019
lindsayad added a commit to lindsayad/moose that referenced this issue Jan 15, 2020
lindsayad added a commit to lindsayad/moose that referenced this issue Jan 17, 2020
lindsayad added a commit to lindsayad/moose that referenced this issue Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants