automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] #537

kostrzewa · 2022-03-24T19:50:25Z

started work on a simple algorithm to automatically tune the (QUDA)-MG parameters which can be tuned without rebuilding the setup

…rmionic derivative

kostrzewa · 2022-03-24T19:56:59Z

The preliminary idea for the input is as follows but this has to be fine-tuned depending how the algorithm will turn out in the end:

BeginExternalInverter QUDA
  Pipeline = 24
  gcrNkrylov = 24
  MGNumberOfLevels = 3
  MGNumberOfVectors = 24, 32
  MGSetupSolver = cg
  MGSetup2KappaMu = 0.000224102400
  MGVerbosity = summarize, silent, silent
  MGSetupSolverTolerance = 5e-7, 5e-7
  MGSetupMaxSolverIterations = 1500, 1500
  MGCoarseSolverType = gcr, gcr, cagcr
  MGSmootherType = cagcr, cagcr, cagcr
  MGBlockSizesX = 4,3
  MGBlockSizesY = 4,3
  MGBlockSizesZ = 3,2
  MGBlockSizesT = 4,2
  
  MGCoarseMuFactor = 1.0, 1.0, 20.0
  MGCoarseMaxSolverIterations = 50, 50, 50
  MgCoarseSolverTolerance = 0.1, 0.1, 0.1
  MGSmootherPostIterations = 2, 2, 2
  MGSmootherPreIterations = 0, 0, 0
  MGSmootherTolerance = 0.1, 0.1, 0.1
  MGOverUnderRelaxationFactor = 0.85, 0.85, 0.85
  
EndExternalInverter

BeginTuneMGParams QUDA
  MGCoarseMuFactorSteps = 10, 10, 10
  MGCoarseMuFactorDelta = 0.1, 0.2, 5

  MGCoarseMaxSolverIterationsSteps = 10, 10, 10
  MGCoarseMaxSolverIterationsDelta = -5, -5, -5

  MGCoarseSolverToleranceSteps = 10, 10, 10
  MGCoarseSolverToleranceDelta = 0.05, 0.05, 0.05

  MGSmootherPreIterationsSteps = 4, 4, 4
  MGSmootherPreIterationsDelta = 1, 1, 1

  MGSmootherPostIterationsSteps = 4, 4, 4
  MGSmootherPostIterationsDelta = 1, 1, 1

  MGSmootherToleranceSteps = 4, 4, 4
  MGSmootherToleranceDelta = 0.1, 0.1, 0.1

  MGOverUnderRelaxationFactorSteps = 4, 4, 4
  MGOverUnderRelaxationFactorDelta = 0.05, 0.05, 0.05

  MGTuningIterations = 1000

  # when in a particular tuning step the improvement is less than 1%, we
  # move on to the next parameter to be tuned
  MGTuningTolerance = 0.99
EndTuneMGParams

There may be some adaptive process added to dynamically reduce the search space if certain parameter changes don't affect the tts.

…ks down when the solver does not converge at any point...)

kostrzewa · 2022-03-25T19:15:36Z

I will probably change the input format such that one doesn't specify min/max and a number of steps but a "delta" for each parameter and level and a number of steps that this delta should be applied for

The current "algorithm" (I use the word very cautiously) can start with a completely useless setup which doesn't converge and finds something which does. Unfortunately, it doesn't yet find a better minimum than I can find by hand. However, I've tested this only on small lattices (16c32 and 24c48, albeit at the physical point) and I suspect that it will work better on larger lattices.

kostrzewa · 2022-03-27T10:27:00Z

Funnily enough, this actually works and seems to find parameter sets that I would have never considered. For example, on cA211.12.48, this is a parameter set that it evolves to:

QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 3.000000, 27.000000)
 mg_coarse_solver_maxiter: (20, 10, 50)
     mg_coarse_solver_tol: (0.200000, 0.400000, 0.200000)
               mg_nu_post: (6, 6, 8)
                mg_nu_pre: (0, 4, 2)
          mg_smoother_tol: (0.200000, 0.200000, 0.100000)
                 mg_omega: (0.950000, 1.050000, 0.850000)
Timing: 1.989135, Iters: 51
-------------------------------------------

…or ratios

…hen the tuning direction is changed or the outer iteration of the tuning loop is reset

…eriv_mg_tune

kostrzewa · 2023-03-17T08:05:38Z

First experience on a large volume (64c128) at the physical point suggests that this tuner, surprisingly, really seems to work.

Setting

BeginTuneMGParams QUDA
  MGCoarseMuFactorSteps = 10, 10, 11
  MGCoarseMuFactorDelta = 0.25, 0.5, 5

  MGCoarseMaxSolverIterationsSteps = 10, 10, 10
  MGCoarseMaxSolverIterationsDelta = 5, 5, 5

  MGCoarseSolverToleranceSteps = 10, 10, 10
  MGCoarseSolverToleranceDelta = 0.05, 0.05, 0.05

  MGSmootherPreIterationsSteps = 2, 2, 2
  MGSmootherPreIterationsDelta = 1, 1, 1

  MGSmootherPostIterationsSteps = 2, 2, 2
  MGSmootherPostIterationsDelta = 2, 2, 2

  MGSmootherToleranceSteps = 4, 4, 4
  MGSmootherToleranceDelta = 0.1, 0.1, 0.1

  MGOverUnderRelaxationFactorSteps = 3, 3, 3
  MGOverUnderRelaxationFactorDelta = 0.05, 0.05, 0.05

  MGTuningIterations = 1000

  # when in a particular tuning step the improvement is less than 1%, we
  # move on to the next parameter to be tuned
  MGTuningTolerance = 0.99
EndTuneMGParams

and starting from

BeginExternalInverter QUDA
  Pipeline = 24
  gcrNkrylov = 24
  MGNumberOfLevels = 3
  MGNumberOfVectors = 24, 32
  MGSetupSolver = cg
  MGSetup2KappaMu = 0.000215613244
  MGVerbosity = silent, silent, silent
  MGSetupSolverTolerance = 5e-7, 5e-7
  MGSetupMaxSolverIterations = 1500, 1500
  MGCoarseSolverType = gcr, gcr, cagcr
  MGSmootherType = cagcr, cagcr, cagcr
  MGBlockSizesX = 4,2
  MGBlockSizesY = 4,2
  MGBlockSizesZ = 4,2
  MGBlockSizesT = 4,2
  MGResetSetupMDUThreshold = 1.0
  MGRefreshSetupMDUThreshold = 0.0263
  MGRefreshSetupMaxSolverIterations = 30, 30
 
  MGCoarseMuFactor = 1.0, 1.0, 20.0
  MGCoarseMaxSolverIterations = 15, 15, 15
  MGCoarseSolverTolerance = 0.1, 0.1, 0.1
  MGSmootherPostIterations = 2, 2, 2
  MGSmootherPreIterations = 0, 0, 0
  MGSmootherTolerance = 0.1, 0.1, 0.1
  MGOverUnderRelaxationFactor = 0.90, 0.90, 0.90  
EndExternalInverter

the tuner takes the solver from non-convergence through a successful solve in around 9 seconds (on Meluxina)

QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 1.000000, 65.000000)
 mg_coarse_solver_maxiter: (15, 15, 15)
     mg_coarse_solver_tol: (0.100000, 0.100000, 0.100000)
               mg_nu_post: (2, 2, 2)
                mg_nu_pre: (0, 0, 0)
          mg_smoother_tol: (0.100000, 0.100000, 0.100000)
                 mg_omega: (0.900000, 0.900000, 0.900000)
Timing: 8.628203, Iters: 112
-------------------------------------------

down to a solve in 2.5 seconds with parameters that I would not have thought to choose by hand:

QUDA-MG param tuner: BEST SET OF PARAMETERS
-------------------------------------------
             mg_mu_factor: (1.000000, 4.000000, 120.000000)
 mg_coarse_solver_maxiter: (15, 25, 30)
     mg_coarse_solver_tol: (0.100000, 0.200000, 0.150000)
               mg_nu_post: (2, 6, 10)
                mg_nu_pre: (0, 0, 6)
          mg_smoother_tol: (0.200000, 0.200000, 0.200000)
                 mg_omega: (0.900000, 0.900000, 0.950000)
Timing: 2.501800, Iters: 64
-------------------------------------------

kostrzewa · 2023-03-17T08:08:35Z

Using these parameters in practice and comparing between the "hand-tuned" setup on the left and the auto-tuned setup on the right:

MGCoarseMuFactor = 1.0, 1.0, 80.0              ->  MGCoarseMuFactor = 1.0, 4.0, 120.0                                                                  
MGCoarseMaxSolverIterations = 30, 30, 30       ->  MGCoarseMaxSolverIterations = 15, 25, 30
MGCoarseSolverTolerance = 0.3, 0.2, 0.15       ->  MGCoarseSolverTolerance = 0.1, 0.2, 0.15
MGSmootherPostIterations = 4, 4, 6             ->  MGSmootherPostIterations = 2, 6, 10
MGSmootherPreIterations = 0, 0, 1              ->  MGSmootherPreIterations = 0, 0, 6
MGSmootherTolerance = 0.2, 0.2, 0.2            ->  MGSmootherTolerance = 0.2, 0.2, 0.2 
MGOverUnderRelaxationFactor = 1.00, 0.90, 0.90 ->  MGOverUnderRelaxationFactor = 0.90, 0.90, 0.95

I seem to obtain very stable timings so far (red is the auto-tuned MG setup):

kostrzewa · 2023-03-18T11:02:54Z

After some more runtime, extracting the time to solution of the two MG setups, I get the following histograms after resampling to get the same number of solver calls in both cases (logarithmic count axis):

kostrzewa · 2023-03-21T12:50:05Z

Doing the same on a L=48 simulation at the physical point similarly leads to a very nice improvement. Below, untuned refers to a hand-selected MG setup. mk1tuned refers to the auto-tuning result after about 100 tuning iterations and mk2tuned the setup which was reached at the end of the tuning procedure.

The two "peaks" correspond to inversions related to cloverdetratio2light (below and around 1 second in the tuned setups) and cloverdetratio3light (from 1.5 seconds and up) and both timings from the HB/ACC steps as well as from the derivative are included in the histograms.

The final setup is:

  MGCoarseMuFactor = 1.0, 2.5, 105.0
  MGCoarseMaxSolverIterations = 15, 15, 15
  MGCoarseSolverTolerance = 0.1, 0.35, 0.25
  MGSmootherPostIterations = 2, 2, 4
  MGSmootherPreIterations = 0, 0, 1
  MGSmootherTolerance = 0.2, 0.1, 0.2
  MGOverUnderRelaxationFactor = 0.90, 0.90, 1.00

kostrzewa · 2023-03-23T14:20:20Z

note to self from meeting just now: it should be possible to integrate this directly in the HMC

define a time to solution threshold deemed unacceptable
when the solve time of monomials using the MG goes above this threshold more than N times -> enter MG tuning loop for k iterations in an attempt to stabilize the MG
- this would allow to automatically adapt to changes in the behaviour of the MG as the simulation progresses

… the MG autotuner (default 5 per-mille)

…ased on current experience

…etup was actually able to make the problem converge

…nces)

… iterations, prevent parameters going negative when tuning with negative delta

…sion: this seems to help with MPI errors (truncated messages)

skeleton for automatic tuning of (QUDA)-MG parameters for usage in fe…

f38f097

…rmionic derivative

kostrzewa added the WIP DO NOT MERGE Label for pull-requests which exist to track progress during development. label Mar 24, 2022

kostrzewa added 3 commits March 24, 2022 21:03

fix a few typos and a forgotten newline processor

49ae26b

first concrete implementation of a search algorithm (which kinda brea…

c0081a3

…ks down when the solver does not converge at any point...)

first kind of working implementation of QUDA-MG parameter tuner

daa4a53

fix tuning logic and use multiple passes to improve on tuning result

42d9848

kostrzewa added 4 commits March 28, 2022 21:41

deriv_mg_tune: simplify input format and temporarily remove support f…

76b5fbe

…or ratios

refine tuning logic to always start from the best set of parameters w…

ec02c47

…hen the tuning direction is changed or the outer iteration of the tuning loop is reset

suppress one instance of console output by 'find_best_params'

55bd8b2

we were stopping early in tuning due to a logic bug

28e9caf

kostrzewa force-pushed the deriv_mg_tune branch from 497ae34 to 28e9caf Compare April 4, 2022 15:09

print number of tuning iterations and total

dfc93ce

kostrzewa changed the title ~~skeleton for automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE]~~ automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] Apr 7, 2022

kostrzewa added 7 commits April 25, 2022 12:21

Merge branch 'quda_work_add_actions' of github.com:etmc/tmLQCD into d…

d5d681a

…eriv_mg_tune

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

a008824

ansatz to provide some explanatory strings for MG tuning directions

293315b

extraneous curly brace

1870d09

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

6d9e92a

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

2efe069

typo in sample_deriv_mg_tune_tmclover.input

e24efb4

kostrzewa changed the base branch from quda_work_add_actions to quda_work March 17, 2023 08:09

kostrzewa added 2 commits March 22, 2023 09:33

first stab at a multi-config MG tuner

783af10

fix some issues with multi-config MG auto-tuner and test it successfully

e017035

kostrzewa added 10 commits March 25, 2023 10:37

add treshold below which time-to-solution improvements are ignored in…

41ed324

… the MG autotuner (default 5 per-mille)

tuning_plam is a pointer

378dad4

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

9f7a328

install benchmark, offline_measurement and deriv_mg_tune too

c3248ba

set good defaults for MGTuningIgnoreThreshold and MGTuningTolerance b…

afd73e9

…ased on current experience

support 0 tuning iterations and reset tuning plan between calls of tuner

eeb5c43

only replace current tuning params with 'best' params if the 'best' s…

f7c21be

…etup was actually able to make the problem converge

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

2fe1762

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

8f74a25

document the MG autotuner

bb1c8bf

kostrzewa force-pushed the deriv_mg_tune branch from a279261 to bb1c8bf Compare October 15, 2023 15:15

kostrzewa added 6 commits October 16, 2023 11:43

clarify balance between MGTuningTolerance and MGTuningIgnoreThreshold

2f144d0

add example log from tuning procedure

3798564

typo in docs

a1971be

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

2fed63c

merge with current quda_work

405f70c

Merge remote-tracking branch 'origin/quda_work' into deriv_mg_tune

2812388

kostrzewa changed the base branch from quda_work to master March 13, 2024 12:53

kostrzewa added 5 commits June 14, 2024 17:49

Merge remote-tracking branch 'origin/master' into deriv_mg_tune

7f90d21

fix undefined behaviour in part of the MG autotuner (without conseque…

7d30f29

…nces)

interchange ordering of coarse solver tolerance and coarse solver max…

66fa8e8

… iterations, prevent parameters going negative when tuning with negative delta

add MPI_Barrier in mg_tune before and after MG update and after inver…

0c8d86f

…sion: this seems to help with MPI errors (truncated messages)

more tuning params that can go negative but shouldn't

d2d1848

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] #537

automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] #537

kostrzewa commented Mar 24, 2022

kostrzewa commented Mar 24, 2022 •

edited

Loading

kostrzewa commented Mar 25, 2022

kostrzewa commented Mar 27, 2022

kostrzewa commented Mar 17, 2023

kostrzewa commented Mar 17, 2023

kostrzewa commented Mar 18, 2023

kostrzewa commented Mar 21, 2023 •

edited

Loading

kostrzewa commented Mar 23, 2023

automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] #537

Are you sure you want to change the base?

automatic tuning of (QUDA)-MG parameters [WIP, DO NOT MERGE] #537

Conversation

kostrzewa commented Mar 24, 2022

kostrzewa commented Mar 24, 2022 • edited Loading

kostrzewa commented Mar 25, 2022

kostrzewa commented Mar 27, 2022

kostrzewa commented Mar 17, 2023

kostrzewa commented Mar 17, 2023

kostrzewa commented Mar 18, 2023

kostrzewa commented Mar 21, 2023 • edited Loading

kostrzewa commented Mar 23, 2023

kostrzewa commented Mar 24, 2022 •

edited

Loading

kostrzewa commented Mar 21, 2023 •

edited

Loading