Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use deal.II CG solver instead of trilinos #5614

Merged
merged 3 commits into from Mar 27, 2024

Conversation

gassmoeller
Copy link
Member

We use CG solvers for the inner A and S blocks of the preconditioner when doing the expensive iterations. For the GMG preconditioner we use the CG solver provided by deal.II in deal.II/lac/solver_cg.h, but for the AMG preconditioner we use the CG solver provided by trilinos through deal.II/lac/trilinos_solver.h. When working on #5613 I tried to use the deal.II CG solver for the AMG preconditioner instead. This does not make a difference for the number of A or S iterations (they are literally identical) or the solution (also identical), but the deal.II solver seems to be consistently faster by about 10-20% of the total Stokes solver time (including the outer GMRES iterations).
Here is an example for the cookbook transform_fault_behn_2007/temperature_dependent.prm run with the AMG preconditioner and only expensive Stokes iterations on Frontera:

With trilinos CG:

-----------------------------------------------------------------------------
--                             This is ASPECT                              --
-- The Advanced Solver for Planetary Evolution, Convection, and Tectonics. --
-----------------------------------------------------------------------------
--     . version 2.6.0-pre (main, b2e93b7a5)
--     . using deal.II 9.5.2
--     .       with 64 bit indices and vectorization level 3 (512 bits)
--     . using Trilinos 13.2.0
--     . using p4est 2.3.2
--     . using Geodynamic World Builder 0.5.0
--     . running in OPTIMIZED mode
--     . running with 56 MPI processes
-----------------------------------------------------------------------------

...

+----------------------------------------------+------------+------------+
| Total wallclock time elapsed since start     |      98.6s |            |
|                                              |            |            |
| Section                          | no. calls |  wall time | % of total |
+----------------------------------+-----------+------------+------------+
| Assemble Stokes system           |        10 |      19.5s |        20% |
| Assemble temperature system      |        10 |      10.3s |        10% |
| Build Stokes preconditioner      |        10 |      15.8s |        16% |
| Build temperature preconditioner |        10 |     0.969s |      0.98% |
| Initialization                   |         1 |     0.159s |      0.16% |
| Postprocessing                   |         4 |     0.264s |      0.27% |
| Setup dof systems                |         1 |     0.154s |      0.16% |
| Setup initial conditions         |         1 |     0.125s |      0.13% |
| Setup matrices                   |         1 |      1.25s |       1.3% |
| Solve Stokes system              |        10 |      48.9s |        50% |
| Solve temperature system         |        10 |     0.731s |      0.74% |
+----------------------------------+-----------+------------+------------+

With this branch:

+----------------------------------------------+------------+------------+
| Total wallclock time elapsed since start     |      91.3s |            |
|                                              |            |            |
| Section                          | no. calls |  wall time | % of total |
+----------------------------------+-----------+------------+------------+
| Assemble Stokes system           |        10 |      19.2s |        21% |
| Assemble temperature system      |        10 |      10.6s |        12% |
| Build Stokes preconditioner      |        10 |      15.8s |        17% |
| Build temperature preconditioner |        10 |     0.967s |       1.1% |
| Initialization                   |         1 |     0.154s |      0.17% |
| Postprocessing                   |         4 |     0.273s |       0.3% |
| Setup dof systems                |         1 |     0.159s |      0.17% |
| Setup initial conditions         |         1 |     0.125s |      0.14% |
| Setup matrices                   |         1 |      1.25s |       1.4% |
| Solve Stokes system              |        10 |      41.8s |        46% |
| Solve temperature system         |        10 |     0.518s |      0.57% |
+----------------------------------+-----------+------------+------------+

To me it looks like the deal.II version is clearly superior and we should use that one. Since this is the version that is also used in the GMG solver, we know it scales well to large problem sizes. Is there any other reason to use the trilinos solver that I didnt see?

@tjhei
Copy link
Member

tjhei commented Mar 24, 2024

The only thing I could think of is that they might have an implementation that does some tricks to give better scalability (avoiding communication or hiding latency).
I am surprised how much faster our implementation is!

@bangerth
Copy link
Contributor

This fails on the following tests:

The following tests FAILED:
	594 - parallel_output_group_0 (Failed)
	595 - parallel_output_group_1 (Failed)
	596 - parallel_output_group_2 (Failed)
	691 - point_value_02_mpi (Failed)
	891 - stokes_solver_fail_A (Failed)
	892 - stokes_solver_fail_S (Failed)

and on another tester with

The following tests FAILED:
	 27 - advection_solver_fail (Failed)
	157 - checkpoint_07_enable_free_surface_resume (Failed)
	891 - stokes_solver_fail_A (Failed)
	892 - stokes_solver_fail_S (Failed)

@bangerth
Copy link
Contributor

There is not very much one can optimize about CG (unlike GMRES, for example). So this looks ok to me, and I'm not overly concerned about it making a difference on large parallel jobs.

@gassmoeller
Copy link
Member Author

I rebased the PR and updated the few changed test results. Most of the changed results are only marginally different.
The only significant differences are in the output of solve_stokes_fail_A and solve_stokes_fail_S, where the deal.II solver reports a different exception than the trilinos solver. I updated that output too.

Copy link
Member

@MFraters MFraters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and given that both @bangerth and @tjhei have approved it, I will merge it. We just need to remember that we need to update the error message when the new dealii is released. Would be interesting to see if this can lead to not requiring Trilinos anymore.

@MFraters MFraters merged commit a557949 into geodynamics:main Mar 27, 2024
8 checks passed
@gassmoeller gassmoeller deleted the use_dealii_cg branch March 27, 2024 17:45
@gassmoeller gassmoeller restored the use_dealii_cg branch April 19, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants