use single precision for gmg #3532

tcclevenger · 2020-06-15T22:37:28Z

Based on the discussion in section 4.2 of https://dl.acm.org/doi/pdf/10.1145/3322813.

This pull request changes the the GMG solver to be setup and run in single precision. It will not change the accuracy of the solution (since all active level computations are done in double precision), but improves runtime.

Test with Nsinker, 7 global refinements (6.8M DoFs), run with 8 cores:
Single precision GMG: 28 iterations, 10.5s Stokes solve time
Double precision GMG: 28 iterations, 14.9s Stokes solve time
This gives a roughly 30% decrease in runtime which is similar to the results in the paper.

The changes are basically just changing a few double to float. I define a type gmg_number to float at the top of stokes_matrix_free.h. This can be changed to double to go back to double precision gmg. I'll comment the only other significant change.

source/simulator/stokes_matrix_free.cc

tjhei

great, this looks very good. I left a few small comments.

include/aspect/stokes_matrix_free.h

source/simulator/stokes_matrix_free.cc

gassmoeller

Very cool! Ready except for Timo's comments.

gassmoeller · 2020-06-16T19:03:23Z

/rebuild

gassmoeller · 2020-06-17T23:58:42Z

Merging #3495 created some conflicts, could you rebase? Otherwise good to go.

tjhei · 2020-06-18T00:15:27Z

and test results are somehow broken again.

tcclevenger · 2020-06-18T00:26:36Z

and test results are somehow broken again.

I will have to investigate. But the iteration counts seem to have risen drastically for only some of the tests.

tjhei · 2020-06-18T00:37:49Z

you will need to rebase and fix the single precision issues as well. Let me know if you need help with finding the bug.

tcclevenger · 2020-06-18T01:57:18Z

you will need to rebase and fix the single precision issues as well. Let me know if you need help with finding the bug.

I fixed the bug. I'll focus on testing the failing benchmarks tomorrow.

tcclevenger · 2020-06-18T17:15:23Z

The issue seems to show up when using no Advection, iterated Stokes, but not on the first iteration (I only tested with first timestep only, single Stokes, which gives basically identical output for float and double). I'm still investigating.

Outside of that, thought, there could be an issue with the scale of values needed. I believe float type is bounded above by 3.40282e+38. Is it the case where values of viscosity can exceed this number?

gassmoeller · 2020-06-18T19:02:42Z

Outside of that, thought, there could be an issue with the scale of values needed. I believe float type is bounded above by 3.40282e+38. Is it the case where values of viscosity can exceed this number?

Realistic values for viscosity will lie between 1e16 (worst case) to even lower (in some melt models, and 1 for nondimensional models) and 1e25. 1e38 is outside of realistic ranges, but may occur in some unrealistic tests. If that turns out to be the problem I think it is ok to throw an exception.

naliboff · 2020-06-18T19:05:48Z

Realistic values for viscosity will lie between 1e16 (worst case) to even lower (in some melt models, and 1 for nondimensional models) and 1e25
It likely would not make much of a difference here, but I often have 1e26 or 1e28 Pa s as the upper viscosity limit in many models. The minimum viscosity in these models would be typically be 1e18 or 1e20 Pa s.

tjhei · 2020-06-18T20:15:34Z

I'm still investigating.

could this be a missing update_ghost_values() on the diagonal vector?

tcclevenger · 2020-06-19T17:01:28Z

I'm continuing to investigate. It looks like the initial_nonlinear_residual (first return value of solve_stokes()) is very different on the 2+ iterated Stokes solves between float and double GMG. I'm a little confused since computing this value does not involve GMG (except that the previous solution is used). Perhaps the previous solution is slightly less accurate? But this doesn't make sense since it is solved to the same tolerance no matter the GMGNumberType.

gassmoeller · 2020-07-10T00:53:31Z

Any new insights? Let us know if you want input.

tjhei · 2020-07-10T21:38:51Z

Any new insights? Let us know if you want input.

thanks, Rene. We are working on things and we do have a lead. Conrad will get to this in the next 2 weeks, I hope.

tcclevenger · 2020-07-13T14:29:21Z

@gassmoeller sorry for the absence. I’ll be getting back to this this week. @tjhei and I talked last week and there are a few simple fixes that could work, so I’m I should be able to wrap this up. I’ll keep you guys updated.

tcclevenger · 2020-07-25T05:20:11Z

A closer look, I believe the new code is implemented correctly. Here is a comparison of the differences in the performance with a few benchmarks:

Nonlinear channel flow, 4 timesteps (0-4), 214K DoFs, 1 processor
Single precision GMG: 63 total Stokes solves, 75.5s solve time
Double precision GMG: 65 total Stokes solves, 79.9s solve time
SolCx, no advection iterated Stokes, 3K DoFs, 1 processor
Single precision GMG: First Stokes: 12 iterations, 2nd Stokes: 3 iteration
Double precision GMG: First Stokes: 12 iterations, 2nd Stokes: 0 iteration
Nsinker, no advection iterated Stokes, 150K DoFs, 1 processor
Single precision GMG: First Stokes: 23 iterations, 2nd Stokes: 1 iteration
Double precision GMG: First Stokes: 23 iterations, 2nd Stokes: 1 iteration
Nsinker, first timestep only, single Stokes, 6.8M DoFs, 8 processor
Single precision GMG: 28 iterations, 10.5s solve time
Double precision GMG: 28 iterations, 14.9s solve time

There were 2 types of failing tests before:

Tests which were failing due to slight differences in the solver performance (the preconditioner is slightly less effective so we expect higher iteration counts. Nonlinear channel flow test is in this category.) or different memory statistics since the viscosity tables in single precision will require less memory (matrix_nonzeros_7).
Tests which were failing due major differences in solver performance, specifically when using "no advection, iterated Stokes". All these tests were on very small problems (<1000 unknowns) and so we are not worried about the differences. I tested the same benchmarks with larger number of DoFs and we get expected output. For all of these (except prescribed_dilation_gmg since it's .prm is based on a different test) I decided to change to "first timestep only, single Stokes` for simplicity.

Other tests will fail now with the changes and so I will be sure to correct these when the tester is finished.

tjhei · 2020-07-25T14:48:31Z

Thanks, Conrad. I will take a look at the tests in more detail. Would you mind squashing your commits down and rebasing to the current master? That would make it a lot easier to inspect the changes in the test results.

tjhei

I went over the code again and things look good. I just pushed a commit with updated test results. While some problems have different number of linear/nonlinear iterations, I think we are good here.

tjhei · 2020-07-28T22:58:33Z

@gassmoeller can you please take another look?

gassmoeller · 2020-07-29T19:17:09Z

I am ok with the slight increase in linear iterations (expected due to lower precision of preconditioner), but can you explain your reasoning here again:

Tests which were failing due major differences in solver performance, specifically when using "no advection, iterated Stokes". All these tests were on very small problems (<1000 unknowns) and so we are not worried about the differences. I tested the same benchmarks with larger number of DoFs and we get expected output. For all of these (except prescribed_dilation_gmg since it's .prm is based on a different test) I decided to change to "first timestep only, single Stokes` for simplicity.

I ran the sol_cx_2_gmg test with double and single precision with the iterated Stokes scheme and like you found significant differences in nonlinear solver behavior. The old version converges to 1e-8 in the second iteration (expected, because this is actually a linear problem, viscosity and density are independent of the solution), while in single precision I get this:

  Solving Stokes system... 16+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 1: 1

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 2: 4.08645

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 3: 2.29006

   Solving Stokes system... 16+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 4: 1.01736

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 5: 0.644649

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 6: 0.132881

   Solving Stokes system... 12+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 7: 0.0387798

   Solving Stokes system... 9+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 8: 0.00600291

   Solving Stokes system... 8+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 9: 0.00220304

   Solving Stokes system... 8+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 10: 0.000221702

Also when I decrease the linear solver tolerance (from default 1e-7 to 1e-9 or 1e-11) the nonlinear solver behavior becomes worse (higher residual after 10 nonlinear iterations).

This suggests the solution in the first linear solve is actually worse than before (i.e. either the tolerance computation is different, or the solver stops although the tolerance is not reached, maybe because some residual computation is done in single precision?), and additionally it does not converge in the expected number of nonlinear iterations. Your changed test does not show this anymore because it uses a single Stokes solve, but that does not solve or explain the problem. Why do you think it is ok, because this is a small test?

tjhei · 2020-07-29T20:46:32Z

This is a good point, @gassmoeller . I think we need to look at this in more detail. Conrad somehow convinced me that the change is acceptable in our last conversation.

This suggests the solution in the first linear solve is actually worse than before (i.e. either the tolerance computation is different, or the solver stops although the tolerance is not reached, maybe because some residual computation is done in single precision?),

Tolerance and residual computation is done as before in double precision (only the preconditioner is different), so I am not sure what could be different. I will need to take a look as well, I think.

bangerth · 2020-08-03T20:54:12Z

What happens if you put the new typedef to double? You should be getting the exact same results as are currently in the testsuite. If you don't, then that's a lead worth pursuing.

tjhei · 2020-08-06T11:58:38Z

What happens if you put the new typedef to double?

After rebase to current master and setting things back to double, all tests pass without changes to test output.

bangerth · 2020-08-06T14:51:17Z

So then at least you haven't introduced a bug :-)

tjhei · 2020-08-06T15:41:16Z

I made some progress with my research, but I will have to talk to Conrad. It looks like the resulting constant pressure is vastly different between the float and double preconditioner.

gassmoeller · 2020-08-06T21:36:41Z

Are we running into the issue that the dynamic pressure is usually 3-7 orders of magnitude smaller than the static pressure and the dynamic pressure is the only part that affects the velocities? This is why many other codes solve for dynamic pressure only.

tcclevenger commented Jun 15, 2020

View reviewed changes

source/simulator/stokes_matrix_free.cc Show resolved Hide resolved

tjhei added the ready to test label Jun 16, 2020

tjhei approved these changes Jun 16, 2020

View reviewed changes

include/aspect/stokes_matrix_free.h Outdated Show resolved Hide resolved

source/simulator/stokes_matrix_free.cc Show resolved Hide resolved

gassmoeller reviewed Jun 16, 2020

View reviewed changes

tjhei mentioned this pull request Jul 25, 2020

enable tests that require deal.II 9.2 #3614

Merged

use single precision gmg

35e3b53

tcclevenger force-pushed the gmg_single_precision branch from 7d41943 to 35e3b53 Compare July 25, 2020 15:15

update test results

29775d1

tjhei approved these changes Jul 28, 2020

View reviewed changes

tjhei mentioned this pull request Oct 7, 2020

GMG: template argument for number type #3863

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use single precision for gmg #3532

use single precision for gmg #3532

tcclevenger commented Jun 15, 2020

tjhei left a comment

gassmoeller left a comment

gassmoeller commented Jun 16, 2020

gassmoeller commented Jun 17, 2020

tjhei commented Jun 18, 2020

tcclevenger commented Jun 18, 2020

tjhei commented Jun 18, 2020

tcclevenger commented Jun 18, 2020

tcclevenger commented Jun 18, 2020

gassmoeller commented Jun 18, 2020

naliboff commented Jun 18, 2020

tjhei commented Jun 18, 2020

tcclevenger commented Jun 19, 2020

gassmoeller commented Jul 10, 2020

tjhei commented Jul 10, 2020

tcclevenger commented Jul 13, 2020

tcclevenger commented Jul 25, 2020

tjhei commented Jul 25, 2020

tjhei left a comment

tjhei commented Jul 28, 2020

gassmoeller commented Jul 29, 2020

tjhei commented Jul 29, 2020

bangerth commented Aug 3, 2020

tjhei commented Aug 6, 2020

bangerth commented Aug 6, 2020

tjhei commented Aug 6, 2020

gassmoeller commented Aug 6, 2020

use single precision for gmg #3532

Are you sure you want to change the base?

use single precision for gmg #3532

Conversation

tcclevenger commented Jun 15, 2020

tjhei left a comment

Choose a reason for hiding this comment

gassmoeller left a comment

Choose a reason for hiding this comment

gassmoeller commented Jun 16, 2020

gassmoeller commented Jun 17, 2020

tjhei commented Jun 18, 2020

tcclevenger commented Jun 18, 2020

tjhei commented Jun 18, 2020

tcclevenger commented Jun 18, 2020

tcclevenger commented Jun 18, 2020

gassmoeller commented Jun 18, 2020

naliboff commented Jun 18, 2020

tjhei commented Jun 18, 2020

tcclevenger commented Jun 19, 2020

gassmoeller commented Jul 10, 2020

tjhei commented Jul 10, 2020

tcclevenger commented Jul 13, 2020

tcclevenger commented Jul 25, 2020

tjhei commented Jul 25, 2020

tjhei left a comment

Choose a reason for hiding this comment

tjhei commented Jul 28, 2020

gassmoeller commented Jul 29, 2020

tjhei commented Jul 29, 2020

bangerth commented Aug 3, 2020

tjhei commented Aug 6, 2020

bangerth commented Aug 6, 2020

tjhei commented Aug 6, 2020

gassmoeller commented Aug 6, 2020