Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use single precision for gmg #3532

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tcclevenger
Copy link
Contributor

Based on the discussion in section 4.2 of https://dl.acm.org/doi/pdf/10.1145/3322813.

This pull request changes the the GMG solver to be setup and run in single precision. It will not change the accuracy of the solution (since all active level computations are done in double precision), but improves runtime.

  • Test with Nsinker, 7 global refinements (6.8M DoFs), run with 8 cores:
    Single precision GMG: 28 iterations, 10.5s Stokes solve time
    Double precision GMG: 28 iterations, 14.9s Stokes solve time
    This gives a roughly 30% decrease in runtime which is similar to the results in the paper.

The changes are basically just changing a few double to float. I define a type gmg_number to float at the top of stokes_matrix_free.h. This can be changed to double to go back to double precision gmg. I'll comment the only other significant change.

Copy link
Member

@tjhei tjhei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, this looks very good. I left a few small comments.

include/aspect/stokes_matrix_free.h Outdated Show resolved Hide resolved
source/simulator/stokes_matrix_free.cc Show resolved Hide resolved
Copy link
Member

@gassmoeller gassmoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Ready except for Timo's comments.

@gassmoeller
Copy link
Member

/rebuild

@gassmoeller
Copy link
Member

Merging #3495 created some conflicts, could you rebase? Otherwise good to go.

@tjhei
Copy link
Member

tjhei commented Jun 18, 2020

and test results are somehow broken again.

@tcclevenger
Copy link
Contributor Author

and test results are somehow broken again.

I will have to investigate. But the iteration counts seem to have risen drastically for only some of the tests.

@tjhei
Copy link
Member

tjhei commented Jun 18, 2020

you will need to rebase and fix the single precision issues as well. Let me know if you need help with finding the bug.

@tcclevenger
Copy link
Contributor Author

you will need to rebase and fix the single precision issues as well. Let me know if you need help with finding the bug.

I fixed the bug. I'll focus on testing the failing benchmarks tomorrow.

@tcclevenger
Copy link
Contributor Author

The issue seems to show up when using no Advection, iterated Stokes, but not on the first iteration (I only tested with first timestep only, single Stokes, which gives basically identical output for float and double). I'm still investigating.

Outside of that, thought, there could be an issue with the scale of values needed. I believe float type is bounded above by 3.40282e+38. Is it the case where values of viscosity can exceed this number?

@gassmoeller
Copy link
Member

Outside of that, thought, there could be an issue with the scale of values needed. I believe float type is bounded above by 3.40282e+38. Is it the case where values of viscosity can exceed this number?

Realistic values for viscosity will lie between 1e16 (worst case) to even lower (in some melt models, and 1 for nondimensional models) and 1e25. 1e38 is outside of realistic ranges, but may occur in some unrealistic tests. If that turns out to be the problem I think it is ok to throw an exception.

@naliboff
Copy link
Contributor

Realistic values for viscosity will lie between 1e16 (worst case) to even lower (in some melt models, and 1 for nondimensional models) and 1e25
It likely would not make much of a difference here, but I often have 1e26 or 1e28 Pa s as the upper viscosity limit in many models. The minimum viscosity in these models would be typically be 1e18 or 1e20 Pa s.

@tjhei
Copy link
Member

tjhei commented Jun 18, 2020

I'm still investigating.

could this be a missing update_ghost_values() on the diagonal vector?

@tcclevenger
Copy link
Contributor Author

I'm continuing to investigate. It looks like the initial_nonlinear_residual (first return value of solve_stokes()) is very different on the 2+ iterated Stokes solves between float and double GMG. I'm a little confused since computing this value does not involve GMG (except that the previous solution is used). Perhaps the previous solution is slightly less accurate? But this doesn't make sense since it is solved to the same tolerance no matter the GMGNumberType.

@gassmoeller
Copy link
Member

Any new insights? Let us know if you want input.

@tjhei
Copy link
Member

tjhei commented Jul 10, 2020

Any new insights? Let us know if you want input.

thanks, Rene. We are working on things and we do have a lead. Conrad will get to this in the next 2 weeks, I hope.

@tcclevenger
Copy link
Contributor Author

@gassmoeller sorry for the absence. I’ll be getting back to this this week. @tjhei and I talked last week and there are a few simple fixes that could work, so I’m I should be able to wrap this up. I’ll keep you guys updated.

@tcclevenger
Copy link
Contributor Author

A closer look, I believe the new code is implemented correctly. Here is a comparison of the differences in the performance with a few benchmarks:

  • Nonlinear channel flow, 4 timesteps (0-4), 214K DoFs, 1 processor
    Single precision GMG: 63 total Stokes solves, 75.5s solve time
    Double precision GMG: 65 total Stokes solves, 79.9s solve time

  • SolCx, no advection iterated Stokes, 3K DoFs, 1 processor
    Single precision GMG: First Stokes: 12 iterations, 2nd Stokes: 3 iteration
    Double precision GMG: First Stokes: 12 iterations, 2nd Stokes: 0 iteration

  • Nsinker, no advection iterated Stokes, 150K DoFs, 1 processor
    Single precision GMG: First Stokes: 23 iterations, 2nd Stokes: 1 iteration
    Double precision GMG: First Stokes: 23 iterations, 2nd Stokes: 1 iteration

  • Nsinker, first timestep only, single Stokes, 6.8M DoFs, 8 processor
    Single precision GMG: 28 iterations, 10.5s solve time
    Double precision GMG: 28 iterations, 14.9s solve time

There were 2 types of failing tests before:

  1. Tests which were failing due to slight differences in the solver performance (the preconditioner is slightly less effective so we expect higher iteration counts. Nonlinear channel flow test is in this category.) or different memory statistics since the viscosity tables in single precision will require less memory (matrix_nonzeros_7).
  2. Tests which were failing due major differences in solver performance, specifically when using "no advection, iterated Stokes". All these tests were on very small problems (<1000 unknowns) and so we are not worried about the differences. I tested the same benchmarks with larger number of DoFs and we get expected output. For all of these (except prescribed_dilation_gmg since it's .prm is based on a different test) I decided to change to "first timestep only, single Stokes` for simplicity.

Other tests will fail now with the changes and so I will be sure to correct these when the tester is finished.

@tjhei
Copy link
Member

tjhei commented Jul 25, 2020

Thanks, Conrad. I will take a look at the tests in more detail. Would you mind squashing your commits down and rebasing to the current master? That would make it a lot easier to inspect the changes in the test results.

Copy link
Member

@tjhei tjhei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the code again and things look good. I just pushed a commit with updated test results. While some problems have different number of linear/nonlinear iterations, I think we are good here.

@tjhei
Copy link
Member

tjhei commented Jul 28, 2020

@gassmoeller can you please take another look?

@gassmoeller
Copy link
Member

I am ok with the slight increase in linear iterations (expected due to lower precision of preconditioner), but can you explain your reasoning here again:

  1. Tests which were failing due major differences in solver performance, specifically when using "no advection, iterated Stokes". All these tests were on very small problems (<1000 unknowns) and so we are not worried about the differences. I tested the same benchmarks with larger number of DoFs and we get expected output. For all of these (except prescribed_dilation_gmg since it's .prm is based on a different test) I decided to change to "first timestep only, single Stokes` for simplicity.

I ran the sol_cx_2_gmg test with double and single precision with the iterated Stokes scheme and like you found significant differences in nonlinear solver behavior. The old version converges to 1e-8 in the second iteration (expected, because this is actually a linear problem, viscosity and density are independent of the solution), while in single precision I get this:

  Solving Stokes system... 16+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 1: 1

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 2: 4.08645

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 3: 2.29006

   Solving Stokes system... 16+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 4: 1.01736

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 5: 0.644649

   Solving Stokes system... 14+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 6: 0.132881

   Solving Stokes system... 12+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 7: 0.0387798

   Solving Stokes system... 9+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 8: 0.00600291

   Solving Stokes system... 8+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 9: 0.00220304

   Solving Stokes system... 8+0 iterations.
      Relative nonlinear residual (Stokes system) after nonlinear iteration 10: 0.000221702

Also when I decrease the linear solver tolerance (from default 1e-7 to 1e-9 or 1e-11) the nonlinear solver behavior becomes worse (higher residual after 10 nonlinear iterations).

This suggests the solution in the first linear solve is actually worse than before (i.e. either the tolerance computation is different, or the solver stops although the tolerance is not reached, maybe because some residual computation is done in single precision?), and additionally it does not converge in the expected number of nonlinear iterations. Your changed test does not show this anymore because it uses a single Stokes solve, but that does not solve or explain the problem. Why do you think it is ok, because this is a small test?

@tjhei
Copy link
Member

tjhei commented Jul 29, 2020

This is a good point, @gassmoeller . I think we need to look at this in more detail. Conrad somehow convinced me that the change is acceptable in our last conversation.

This suggests the solution in the first linear solve is actually worse than before (i.e. either the tolerance computation is different, or the solver stops although the tolerance is not reached, maybe because some residual computation is done in single precision?),

Tolerance and residual computation is done as before in double precision (only the preconditioner is different), so I am not sure what could be different. I will need to take a look as well, I think.

@bangerth
Copy link
Contributor

bangerth commented Aug 3, 2020

What happens if you put the new typedef to double? You should be getting the exact same results as are currently in the testsuite. If you don't, then that's a lead worth pursuing.

@tjhei
Copy link
Member

tjhei commented Aug 6, 2020

What happens if you put the new typedef to double?

After rebase to current master and setting things back to double, all tests pass without changes to test output.

@bangerth
Copy link
Contributor

bangerth commented Aug 6, 2020

So then at least you haven't introduced a bug :-)

@tjhei
Copy link
Member

tjhei commented Aug 6, 2020

I made some progress with my research, but I will have to talk to Conrad. It looks like the resulting constant pressure is vastly different between the float and double preconditioner.

@gassmoeller
Copy link
Member

Are we running into the issue that the dynamic pressure is usually 3-7 orders of magnitude smaller than the static pressure and the dynamic pressure is the only part that affects the velocities? This is why many other codes solve for dynamic pressure only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants