Skip to content

Conversation

@JHopeCollins
Copy link
Member

@JHopeCollins JHopeCollins commented Nov 4, 2024

Addresses #565

Changes that pass tests:

  1. Constant Jacobians for LinearVariationalSolvers in the diagnostics module.
  2. Constant Jacobians for the LinearVariationalSolvers in the solvers module.
  3. Simple iterative solver (ILU+CG) for the mass solves of explicit scheme time derivatives.
  4. Constant preconditioner (via snes lagging) for the mass solves of explicit schemes with linear time derivatives.

Changes that fail tests:

  1. Constant Jacobian (via snes lagging) for the mass solves of explicit schemes with linear time derivatives.
    Fails the Boussinesq tests on KGOs.
  2. Field-by-field iterative solver for the mass solves of explicit schemes with linear time derivatives. This uses CG+ILU on continuous fields (only tests for HDiv!) and preonly+ILU for the discontinuous fields where ILU is a direct solve (including theta space on extruded meshes). This minimises total work and number of inner products, but probably needs a better test for if the field is discontinuous or not (e.g. CG for the vorticity would need CG+ILU).
    Fails the linear swe wave test on KGOs.
  3. Iterative solver for the riesz map in the SIQN forcing calculation. This replaces the default LU direct solver - even ignoring the wallclock time, for larger cases LU is impractical because you run out of memory unless the DoFs/core is small.
    Fails the Boussinesq and simultaneous SIQN tests on KGOs.

All changes together fail additional tests in total (see final CI run).

@JHopeCollins JHopeCollins self-assigned this Nov 4, 2024
@JHopeCollins JHopeCollins linked an issue Nov 7, 2024 that may be closed by this pull request
@tommbendall tommbendall added the optimisation Involves optimising performance of code label Nov 14, 2024
@JHopeCollins
Copy link
Member Author

Profiles for a few timesteps of the baroclinic wave test case with C30L30 (6.5MDoFs). The lines are:

  1. "Safe" means only those updates which pass all tests.
  2. "No LU" changes the Riesz map in the SIQN forcing calculation to an iterative solver (CG+ILU). See the improvements in the forcing calculation.
  3. "Less CG" changes the iterative solvers to do a fieldsplit method, using PREONLY+ILU on the discontinuous fields (density and temperature) and CG+ILU on the continuous field (velocity). See the improvements in the transport.

barry_nc30_nl30-timestep_duration
barry_nc30_nl30-implicit_solve
barry_nc30_nl30-transport
barry_nc30_nl30-implicit_forcing
barry_nc30_nl30-explicit_forcing

@JHopeCollins
Copy link
Member Author

JHopeCollins commented Dec 17, 2024

Profiles for a few timesteps of the baroclinic wave test case with C90L30 (~59MDoFs). All runs have the direct solve on the SIQN forcing swapped for an iterative solve. This change made a big difference in the timing at C30L30, but at this scale the direct solve causes an out-of-memory error too!
The lines are:

  1. "No lag" Is the same as "No LU" in the C30L30 figures.
  2. "Lag" sets the mass solves in the RK time derivatives to lag the Jacobian - i.e. never rebuild it. See the improvement in the Transport profile.
  3. "Less CG" makes the same change as in the C30L30 profiles above. Doesn't seem to make as much difference here, I'm not 100% sure why yet. Possibly because the number of inner products are the same whether you do CG at the outer level or only on the HDiv space, and this is the bit that scales worst as the core count increases. There is some small difference but its small enough that I wouldn't trust it to be above the noise of the machine.

barry_nc90_nl30-timestep_duration
barry_nc90_nl30-implicit_solve
barry_nc90_nl30-transport
barry_nc90_nl30-implicit_forcing
barry_nc90_nl30-explicit_forcing

Copy link
Contributor

@tommbendall tommbendall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for doing these changes, the results are really impressive.

Just to record something we discussed offline: the all of the linear solvers include a reference profile in the left-hand side of their linear problem, so we discussed making an abstract update_reference_profiles method, which would allow us to call invalidate_jacobian if they are updated

@JHopeCollins
Copy link
Member Author

Final test results:

2024-12-18T18:22:45.2071835Z =========================== short test summary info ============================
2024-12-18T18:22:45.2072347Z FAILED integration-tests/equations/test_boussinesq.py::test_boussinesq[True] - AssertionError: Values for u in compressible test do not match KGO values
2024-12-18T18:22:45.2072477Z assert np.float64(2.3448621121291603e-10) < 1e-10
2024-12-18T18:22:45.2073004Z FAILED integration-tests/model/test_simultaneous_SIQN.py::test_simult_SIQN[0] - AssertionError: Values for u in the order 0 elements test do not match KGO values
2024-12-18T18:22:45.2073165Z assert np.float64(2.7295802911651962e-09) < 1e-10
2024-12-18T18:22:45.2073675Z FAILED integration-tests/equations/test_boussinesq.py::test_boussinesq[False] - AssertionError: Values for p in incompressible test do not match KGO values
2024-12-18T18:22:45.2073805Z assert np.float64(3.860060353323629e-10) < 1e-10
2024-12-18T18:22:45.2074307Z FAILED integration-tests/model/test_simultaneous_SIQN.py::test_simult_SIQN[1] - AssertionError: Values for u in the order 1 elements test do not match KGO values
2024-12-18T18:22:45.2074435Z assert np.float64(4.074234406642719e-09) < 1e-10
2024-12-18T18:22:45.2074638Z ==== 4 failed, 461 passed, 5 xfailed, 21847 warnings in 1550.82s (0:25:50) =====

Copy link
Contributor

@jshipton jshipton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fabulous - thanks @JHopeCollins !!

Copy link
Contributor

@tommbendall tommbendall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brilliant, thanks so much!

@tommbendall tommbendall merged commit a1d5959 into main Dec 18, 2024
4 checks passed
@tommbendall tommbendall deleted the JHopeCollins/lvs_constant_jacobian branch December 18, 2024 19:13
@tommbendall tommbendall restored the JHopeCollins/lvs_constant_jacobian branch December 18, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimisation Involves optimising performance of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Only refactor matrices when necessary

4 participants