# Adaptive PDE discretizations on cartesian grids 
## Volume : GPU accelerated methods
## Part : Eikonal equations, acceleration and reproducibility
## Chapter : Riemannian metrics

In this notebook, we solve Riemannian eikonal equations on the CPU and the GPU, and check that they produce consistent results.

In [1]:
large_instances = False # Set to True to show off GPU acceleration (CPU times may become a big long.)

[**Summary**](Summary.ipynb) of volume GPU accelerated methods, this series of notebooks.

[**Main summary**](../Summary.ipynb) of the Adaptive Grid Discretizations 
	book of notebooks, including the other volumes.

# Table of contents
  * [1. Two dimensions](#1.-Two-dimensions)
    * [1.1 Isotropic metric](#1.1-Isotropic-metric)
    * [1.2 Smooth anisotropic metric](#1.2-Smooth-anisotropic-metric)
  * [2. Three dimensions](#2.-Three-dimensions)
    * [2.1 Smooth anisotropic metric](#2.1-Smooth-anisotropic-metric)



**Acknowledgement.** The experiments presented in these notebooks are part of ongoing research.
The author would like to acknowledge fruitful informal discussions with L. Gayraud on the 
topic of GPU coding and optimization.

Copyright Jean-Marie Mirebeau, University Paris-Sud, CNRS, University Paris-Saclay

## 0. Importing the required libraries

In [5]:
import sys; sys.path.insert(0,"..")
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Riemann_Repro','GPU'))

In [6]:
import cupy as cp
import numpy as np
import itertools
from matplotlib import pyplot as plt
np.set_printoptions(edgeitems=30, linewidth=100000, formatter=dict(float=lambda x: "%5.3g" % x))

In [7]:
from agd import HFMUtils
from agd import AutomaticDifferentiation as ad
from agd import Metrics
from agd import FiniteDifferences as fd
from agd import LinearParallel as lp
import agd.AutomaticDifferentiation.cupy_generic as cugen

norm_infinity = ad.Optimization.norm_infinity
from agd.HFMUtils import RunGPU,RunSmart

In [8]:
def ReloadPackages():
    from Miscellaneous.rreload import rreload
    global HFMUtils,ad,cugen,RunGPU,RunSmart,Metrics
    HFMUtils,ad,cugen,RunGPU,Metrics = rreload([HFMUtils,ad,cugen,RunGPU,Metrics],"../..")    
    RunSmart = cugen.cupy_get_args(HFMUtils.RunSmart,dtype64=True,iterables=(dict,Metrics.Base))

In [9]:
cp = ad.functional.decorate_module_functions(cp,cugen.set_output_dtype32) # Use float32 and int32 types in place of float64 and int64
plt = ad.functional.decorate_module_functions(plt,cugen.cupy_get_args)
RunSmart = cugen.cupy_get_args(RunSmart,dtype64=True,iterables=(dict,Metrics.Base))

### 0.1 Utilities

In [10]:
#from Notebooks_GPU.ExportedCode.Isotropic_Repro import RunCompare
def RunCompare(gpuIn,check=True):
    gpuOut = RunGPU(gpuIn)
    if gpuIn.get('verbosity',1): print(f"--- gpu done, turning to cpu ---, large_instances={large_instances}")
    cpuIn = gpuIn.copy(); cpuIn.pop('traits',None)
    cpuOut = RunSmart(cpuIn)
    print("Max |gpuValues-cpuValues| : ", norm_infinity(gpuOut['values'].get()-cpuOut['values']))
    cpuTime = cpuOut['FMCPUTime'] + cpuOut['StencilCPUTime']; gpuTime = gpuOut['solverGPUTime'];
    print(f"Solver time (s). GPU : {gpuTime}, CPU : {cpuTime}. Device acceleration : {cpuTime/gpuTime}")
    assert not check or cp.allclose(gpuOut['values'],cpuOut['values'],atol=1e-5,rtol=1e-4)
    return gpuOut,cpuOut

In [11]:
factor_variants = [
    {}, # Default
    {"seedRadius":2}, # Spread seed information
    {"factorizationRadius":10,'factorizationPointChoice':'Key'} # Source factorization
]
multip_variants = [
    {'multiprecision':False,'tol':1e-5}, # Default, with smaller error tolerance for reproducibility check
    {'multiprecision':True}, # Reduces roundoff errors
]
order_variants = [
    {'order':1}, # Default
    {'order':2}, # More accurate on smooth instances
]

## 1. Two dimensions

### 1.1 Isotropic metric

In [12]:
n=4000 if large_instances else 1000
hfmIn = HFMUtils.dictIn({
    'model':'Riemann2',
    'metric':Metrics.Riemann(cp.eye(2)),
    'seeds':cp.array([[0.5,0.5]]),
    'exportValues':1,
    'bound_active_blocks':True,
    'traits':{
        'niter_i':24,'shape_i':(12,12), # Best
#        'pruning_macro':1,
    }
})
hfmIn.SetRect([[0,1],[0,1]],dimx=n+1,sampleBoundary=True)

Casting output of function eye from float64 to float32
Casting output of function array from float64 to float32


In [9]:
_,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.3769969940185547 seconds,  and 336 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 18.507 s.
Unused fields from user: bound_active_blocks 
********************
Max |gpuValues-cpuValues| :  1.5338908722906108e-05
Solver time (s). GPU : 0.3769969940185547, CPU : 18.507. Device acceleration : 49.09057709645595


In [13]:
n=200; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[0,1],[0,1]],dimx=n+1,sampleBoundary=True)
X = hfmInS.Grid()
cost = np.prod(np.sin(2*np.pi*X))+1.1
hfmInS.update({
    'metric': Metrics.Riemann(cost**2*fd.as_field(cp.eye(2),X.shape[1:])), # Isotropic but non-constant metric
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [14]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  4.017569794623199e-07
Solver time (s). GPU : 0.12299823760986328, CPU : 0.05. Device acceleration : 0.4065098896668295

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  4.940263631514341e-08
Solver time (s). GPU : 0.012986898422241211, CPU : 0.049. Device acceleration : 3.7730332837656735

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  4.202669309227858e-07
Solver time (s). GPU : 0.01300501823425293, CPU : 0.051000000000000004. Device acceleration : 3.9215631290446775

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  9.997386174465106e-08
Solver time (s). GPU : 0.011998414993286133, CPU : 0.05. Device acceleration : 4.1672170889220075

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice

In [15]:
hfmInS.update({
    'seeds':[[0.,1.]],
    'order':2,
    'traits':{'decreasing_macro':0,},
})

In [16]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  4.88107854068609e-06
Solver time (s). GPU : 0.04500007629394531, CPU : 0.063. Device acceleration : 1.3999976264146146

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  8.07258024870805e-08
Solver time (s). GPU : 0.036496877670288086, CPU : 0.066. Device acceleration : 1.8083738723142952

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  4.88107854068609e-06
Solver time (s). GPU : 0.035475969314575195, CPU : 0.064. Device acceleration : 1.8040380921658368

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  8.07258024870805e-08
Solver time (s). GPU : 0.03400087356567383, CPU : 0.062. Device acceleration : 1.8234825608302363


### 1.2 Smooth anisotropic metric

In [172]:
n=4000 if large_instances else 1000
hfmIn = HFMUtils.dictIn({
    'model':'Riemann2',
    'seeds':cp.array([[0.,0.]]),
    'exportValues':1,
    'bound_active_blocks':True,
    'traits':{
        'niter_i':16,'shape_i':(8,8), # Best
    },
})
hfmIn.SetRect([[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)

Casting output of function array from float64 to float32


In [173]:
def height(x): return np.sin(x[0])*np.sin(x[1])
def surface_metric(x,z,mu=10.):
    ndim,shape = x.ndim-1,x.shape[1:]
    x_ad = ad.Dense.identity(constant=x,shape_free=(ndim,))
    tensors = lp.outer_self( z(x_ad).gradient() ) + mu**-2 * fd.as_field(cp.eye(ndim),shape)
    return Metrics.Riemann(tensors)

In [174]:
hfmIn['metric'] = surface_metric(hfmIn.Grid(),height)

Casting output of function eye from float64 to float32


In [116]:
gpuOut,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.18547463417053223 seconds,  and 217 iterations.
Post-Processing
--- gpu done, turning to cpu ---, large_instances=False
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 1.516 s.
Max |gpuValues-cpuValues| :  0.0010567203521119062
Solver time (s). GPU : 0.18547463417053223, CPU : 2.661. Device acceleration : 14.346975325765454


In [175]:
n=200; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)
hfmInS.update({
    'metric' : surface_metric(hfmInS.Grid(),height), 
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [176]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  2.4202271889928184e-05
Solver time (s). GPU : 0.08350133895874023, CPU : 0.094. Device acceleration : 1.1257304514176398

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  2.1523372906173677e-07
Solver time (s). GPU : 0.05951857566833496, CPU : 0.093. Device acceleration : 1.5625373919940393

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  2.423421811181825e-05
Solver time (s). GPU : 0.04352450370788574, CPU : 0.091. Device acceleration : 2.090776281120758

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  2.2754650697009993e-07
Solver time (s). GPU : 0.060980796813964844, CPU : 0.09. Device acceleration : 1.4758744506826391

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'

Due to the different switching criteria of the second order scheme, we do not have bit consistency in that case. The results are nevertheless quite close. Note also that we do not deactivate the `decreasing` trait here, contrary to the isotropic case, because the scheme often does not converge without it.

**Bottom line.** Second order accuracy for anisotropic metrics on the GPU is very experimental, and not much reliable, at this stage. Further investigation is needed on the matter.

In [177]:
hfmInS.update({
    'seeds':[[0.,1.]],
    'order':2,
})

In [178]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  0.08327210144215402
Solver time (s). GPU : 0.05751752853393555, CPU : 0.11699999999999999. Device acceleration : 2.034162506321348

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.08325955429342868
Solver time (s). GPU : 0.0664987564086914, CPU : 0.119. Device acceleration : 1.7895071491058239

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  0.08324077883032444
Solver time (s). GPU : 0.055999040603637695, CPU : 0.113. Device acceleration : 2.0178917135351697

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.08325955429342868
Solver time (s). GPU : 0.06600832939147949, CPU : 0.12. Device acceleration : 1.8179523873162873


If one removes enforced monotonicity, obtaining the scheme convergence is harder, and requires setting some other parameters carefully and conservatively.
<!---
hfmInS.update({
    'order2_threshold':0.03,
    'verbosity':1,
    'traits':{'decreasing_macro':0,'order2_threshold_weighted_macro':1},
    'metric' : surface_metric(hfmInS.Grid(),height),
    'multiprecision':False,
    'tol':1e-6
})
--->

In [197]:
hfmInS.update({
    'tol':1e-6, # Tolerance for the convergence of the fixed point solver
    'order2_threshold':0.03, # Use first order scheme if second order difference is too large
    'traits':{'decreasing_macro':0}, # Do not enforce monotonicity
})

In [200]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  0.13612470285256717
Solver time (s). GPU : 0.056998252868652344, CPU : 0.11499999999999999. Device acceleration : 2.017605702143323

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.13615051166375491
Solver time (s). GPU : 0.06450557708740234, CPU : 0.12. Device acceleration : 1.8603042623338606

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  0.13613274947961185
Solver time (s). GPU : 0.05651998519897461, CPU : 0.11499999999999999. Device acceleration : 2.034678522918055

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.13615051166375491
Solver time (s). GPU : 0.06451916694641113, CPU : 0.11599999999999999. Device acceleration

In [None]:
# TODO : discontinuous metric

## 2. Three dimensions

### 2.1 Smooth anisotropic metric

We generalize the two dimensional test case, although it does not much make geometrical sense anymore: we are computing geodesics in a three dimensional volume viewed as an hypersurface embedded in four dimensional Euclidean space.

In [77]:
n=200 if large_instances else 100
hfmIn = HFMUtils.dictIn({
    'model':'Riemann3',
    'seeds':cp.array([[0.,0.,0.]]),
    'exportValues':1,
#    'tol':5e-3,
#    'multiprecision':True,
#    'bound_active_blocks':True,
#    'nitermax_o':200,
#    'raiseOnNonConvergence':0,
})
hfmIn.SetRect([[-np.pi,np.pi],[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)

Casting output of function array from float64 to float32


In [78]:
def height3(x): return np.sin(x[0])*np.sin(x[1])*np.sin(x[2])

In [79]:
hfmIn['metric'] = surface_metric(hfmIn.Grid(),height3)

Casting output of function eye from float64 to float32


In [81]:
gpuOut,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.32101917266845703 seconds,  and 82 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 50.128 s.
Max |gpuValues-cpuValues| :  0.00029890265088505785
Solver time (s). GPU : 0.32101917266845703, CPU : 75.92. Device acceleration : 236.49677796163547


In [82]:
n=20; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[-np.pi,np.pi],[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)
hfmInS.update({
    'metric' : surface_metric(hfmInS.Grid(),height), 
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [83]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  2.8622223124941115e-07
Solver time (s). GPU : 0.009001493453979492, CPU : 0.037000000000000005. Device acceleration : 4.110429029267647

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  1.3236979510278246e-06
Solver time (s). GPU : 0.010003328323364258, CPU : 0.038. Device acceleration : 3.7987356579355054

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  6.626114279484341e-07
Solver time (s). GPU : 0.009004592895507812, CPU : 0.036000000000000004. Device acceleration : 3.997959754289346

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  9.817316104498985e-07
Solver time (s). GPU : 0.007997751235961914, CPU : 0.037. Device acceleration : 4.62630043225518

Reproducibility with options : {'factorizationRadius': 10, 'factori

Due to the different switching criteria of the second order scheme, we do not have bit consistency in that case. The results are nevertheless quite close.

In [84]:
hfmInS.update({
    'seeds':[[0.,1.,1.]],
    'order':2,
})

In [85]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  0.09287448188858416
Solver time (s). GPU : 0.01151275634765625, CPU : 0.052000000000000005. Device acceleration : 4.516728959575879

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.09287448188858416
Solver time (s). GPU : 0.012499094009399414, CPU : 0.048. Device acceleration : 3.8402783404864094

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  0.09287448188858416
Solver time (s). GPU : 0.00899815559387207, CPU : 0.048. Device acceleration : 5.334426538777457

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.09287448188858416
Solver time (s). GPU : 0.012500762939453125, CPU : 0.049. Device acceleration : 3.919760756789747
