# Adaptive PDE discretizations on cartesian grids 
## Volume : GPU accelerated methods
## Part : Reproducibility
## Chapter : Riemannian metrics

In this notebook, we solve Riemannian eikonal equations on the CPU and the GPU, and check that they produce consistent results.

*Note* : we use fairly mild anisotropy for now...

## 0. Importing the required libraries

In [1]:
import sys; sys.path.insert(0,"../..")
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Isotropic_Repro','GPU'))

In [2]:
import cupy as cp
import numpy as np
import itertools
from matplotlib import pyplot as plt
np.set_printoptions(edgeitems=30, linewidth=100000, formatter=dict(float=lambda x: "%5.3g" % x))

In [3]:
from agd import HFMUtils
from agd import AutomaticDifferentiation as ad
from agd import Metrics
from agd import FiniteDifferences as fd
from agd import LinearParallel as lp
import agd.AutomaticDifferentiation.cupy_generic as cugen

norm_infinity = ad.Optimization.norm_infinity
from agd.HFMUtils import RunGPU,RunSmart

In [4]:
def ReloadPackages():
    from Miscellaneous.rreload import rreload
    global HFMUtils,ad,cugen,RunGPU,RunSmart,Metrics
    HFMUtils,ad,cugen,RunGPU,Metrics = rreload([HFMUtils,ad,cugen,RunGPU,Metrics],"../..")    
    RunSmart = cugen.cupy_get_args(HFMUtils.RunSmart,dtype64=True,iterables=(dict,Metrics.Base))

In [5]:
cp = ad.functional.decorate_module_functions(cp,cugen.set_output_dtype32) # Use float32 and int32 types in place of float64 and int64
plt = ad.functional.decorate_module_functions(plt,cugen.cupy_get_args)
RunSmart = cugen.cupy_get_args(RunSmart,dtype64=True,iterables=(dict,Metrics.Base))

### 0.1 Utilities

In [6]:
#from Notebooks_GPU.ExportedCode.Isotropic_Repro import RunCompare
def RunCompare(gpuIn,check=True):
    gpuOut = RunGPU(gpuIn)
    if gpuIn.get('verbosity',1): print("---")
    cpuIn = gpuIn.copy(); cpuIn.pop('traits',None)
    cpuOut = RunSmart(cpuIn)
    print("Max |gpuValues-cpuValues| : ", norm_infinity(gpuOut['values'].get()-cpuOut['values']))
    cpuTime = cpuOut['FMCPUTime']; gpuTime = gpuOut['solverGPUTime'];
    print(f"Solver time (s). GPU : {gpuTime}, CPU : {cpuTime}. Device acceleration : {cpuTime/gpuTime}")
    assert not check or cp.allclose(gpuOut['values'],cpuOut['values'],atol=1e-5,rtol=1e-4)
    return gpuOut,cpuOut

In [7]:
factor_variants = [
    {}, # Default
    {"seedRadius":2}, # Spread seed information
    {"factorizationRadius":10,'factorizationPointChoice':'Key'} # Source factorization
]
multip_variants = [
    {'multiprecision':False}, # Default
    {'multiprecision':True}, # Reduces roundoff errors
]
order_variants = [
    {'order':1}, # Default
    {'order':2}, # More accurate on smooth instances
]

## 1. Two dimensions

### 1.1 Isotropic metric

In [8]:
n=4000
hfmIn = HFMUtils.dictIn({
    'model':'Riemann2',
    'metric':Metrics.Riemann(cp.eye(2)),
    'seeds':cp.array([[0.5,0.5]]),
    'exportValues':1,
    'bound_active_blocks':True,
    'traits':{
        'niter_i':24,'shape_i':(12,12), # Best
#        'pruning_macro':1,
    }
})
hfmIn.SetRect([[0,1],[0,1]],dimx=n+1,sampleBoundary=True)

Casting output of function eye from float64 to float32
Casting output of function array from float64 to float32


In [12]:
ReloadPackages()

In [14]:
RunGPU(hfmIn);

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the values array (setting seeds,...)
Preparing the GPU kernel
Setup and run the eikonal solver
GPU solve took 0.39899373054504395 seconds, in 336 iterations.
Post-Processing


In [9]:
_,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the values array (setting seeds,...)
Preparing the GPU kernel
Setup and run the eikonal solver
GPU solve took 0.38199853897094727 seconds, in 336 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 17.677 s.
Max |gpuValues-cpuValues| :  1.532186014752135e-05
Solver time (s). GPU : 0.38199853897094727, CPU : 17.677. Device acceleration : 46.27504609734755


In [9]:
n=200; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[0,1],[0,1]],dimx=n+1,sampleBoundary=True)
X = hfmInS.Grid()
cost = np.prod(np.sin(2*np.pi*X))+1.1
hfmInS.update({
    'metric': Metrics.Riemann(cost**2*fd.as_field(cp.eye(2),X.shape[1:])), # Isotropic but non-constant metric
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [12]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False}
2.009999994925238e-05
Max |gpuValues-cpuValues| :  4.017569794623199e-07
Solver time (s). GPU : 0.0780031681060791, CPU : 0.027. Device acceleration : 0.34613978708251697

Reproducibility with options : {}, {'multiprecision': True}
4.999999987376214e-08
Max |gpuValues-cpuValues| :  6.320754353250635e-08
Solver time (s). GPU : 0.010974407196044922, CPU : 0.025. Device acceleration : 2.2780273734520966

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False}
2.009999994925238e-05
Max |gpuValues-cpuValues| :  4.202669309227858e-07
Solver time (s). GPU : 0.010500669479370117, CPU : 0.026. Device acceleration : 2.4760326045001473

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
4.999999987376214e-08
Max |gpuValues-cpuValues| :  1.0889257084922832e-07
Solver time (s). GPU : 0.010999202728271484, CPU : 0.026. Device acceleration : 2.3638076906403085

Reproducibility with options : {'fa

In [12]:
hfmInS.update({
    'seeds':[[0.,1.]],
    'order':2,
})

In [13]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  4.88107854068609e-06
Solver time (s). GPU : 0.023000717163085938, CPU : 0.039. Device acceleration : 1.695599303424828

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  1.291459792440719e-07
Solver time (s). GPU : 0.023998737335205078, CPU : 0.041. Device acceleration : 1.7084232152436964

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  4.88107854068609e-06
Solver time (s). GPU : 0.02249908447265625, CPU : 0.04. Device acceleration : 1.7778501186842999

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  1.291459792440719e-07
Solver time (s). GPU : 0.02397894859313965, CPU : 0.039. Device acceleration : 1.6264266070096942


### 1.2 Smooth anisotropic metric

In [26]:
ReloadPackages()

In [27]:
n=4000
hfmIn = HFMUtils.dictIn({
    'model':'Riemann2',
    'seeds':cp.array([[0.,0.]]),
    'exportValues':1,
    'traits':{
        'niter_i':16,'shape_i':(8,8), # Best
    },
})
hfmIn.SetRect([[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)

Casting output of function array from float64 to float32


In [28]:
def height(x): return np.sin(x[0])*np.sin(x[1])
def surface_metric(x,z):
    ndim,shape = x.ndim-1,x.shape[1:]
    x_ad = ad.Dense.identity(constant=x,shape_free=(ndim,))
    tensors = lp.outer_self( z(x_ad).gradient() ) + fd.as_field(cp.eye(ndim),shape)*0.1**2
    return Metrics.Riemann(tensors)

In [29]:
hfmIn['metric'] = surface_metric(hfmIn.Grid(),height)

Casting output of function eye from float64 to float32


In [47]:
hfmIn.pop('tol',None)
hfmIn.update({
    'multiprecision':False,
#    'nitermax_o':10000,
#    'tol':1.2*1e-4,
    'bound_active_blocks':False, #(4000/8)*6,
})
#hfmIn['traits'].update ({'decreasing_macro':1})
gpuOut=RunGPU(hfmIn);
print(np.max(np.abs(gpuOut['values'].get()-cpuOut['values'])))
print(gpuOut['keys']['defaulted'].get('tol',None))

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the values array (setting seeds,...)
Preparing the GPU kernel
Setup and run the eikonal solver
GPU solve took 2.0164945125579834 seconds, in 521 iterations.
Post-Processing
0.009657959028676277
4.364296539960117e-05


In [44]:
cpuOut

NameError: name 'cpuOut' is not defined

In [78]:
gpuOut['keys']['defaulted']['tol']

0.00012569512175277794

In [20]:
np.ndim(cp.array(1))

TypeError: no implementation found for 'numpy.ndim' on types that implement __array_function__: [<class 'cupy.core.core.ndarray'>]

In [76]:
gpuOut['keys']['defaulted']

OrderedDict([('verbosity', 1),
             ('help', []),
             ('values_float64', False),
             ('factoringRadius', 0),
             ('order', 1),
             ('periodic', (False, False)),
             ('drift', None),
             ('dualMetric', None),
             ('overwriteMetric', False),
             ('cost_magnitude_bound', 10),
             ('tol', 0.00012569512175277794),
             ('seedValues', array([    0], dtype=float32)),
             ('seedRadius', 0.0),
             ('dummy_kernel', False),
             ('cuoptions', ()),
             ('solver', 'AGSI'),
             ('nitermax_o', 2000),
             ('raiseOnNonConvergence', True)])

In [45]:
gpuOut,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the values array (setting seeds,...)
Preparing the GPU kernel
Setup and run the eikonal solver
GPU solve took 3.8419852256774902 seconds, in 1072 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 29.647 s.
Unused fields from user: bound_active_blocks multiprecision nitermax_o 
********************
Max |gpuValues-cpuValues| :  2.971724162303957e-07
Solver time (s). GPU : 3.8419852256774902, CPU : 29.647. Device acceleration : 7.716583552132762


In [24]:
ReloadPackages()

In [25]:
n=200; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)
hfmInS.update({
    'metric' : surface_metric(hfmInS.Grid(),height), 
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [26]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  1.619302737232431e-05
Solver time (s). GPU : 0.10149931907653809, CPU : 0.038. Device acceleration : 0.3743867480662127

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  3.045376777421893e-07
Solver time (s). GPU : 0.020489215850830078, CPU : 0.038. Device acceleration : 1.8546341781284181

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  1.7339657104820105e-05
Solver time (s). GPU : 0.01699352264404297, CPU : 0.037. Device acceleration : 2.1773001851955778

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  3.206113821097034e-07
Solver time (s). GPU : 0.019967317581176758, CPU : 0.038. Device acceleration : 1.9031099117601404

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Max

Due to the different switching criteria of the second order scheme, we do not have bit consistency in that case. The results are nevertheless quite close.

In [20]:
hfmInS.update({
    'seeds':[[0.,1.]],
    'order':2,
})

In [21]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  0.0021567369962434135
Solver time (s). GPU : 0.023497343063354492, CPU : 0.058. Device acceleration : 2.4683641824362033

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.002156498577664312
Solver time (s). GPU : 0.03049945831298828, CPU : 0.058. Device acceleration : 1.9016731184140585

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  0.0021567369962434135
Solver time (s). GPU : 0.023000001907348633, CPU : 0.059. Device acceleration : 2.5652171785755007

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.002156498577664312
Solver time (s). GPU : 0.030999183654785156, CPU : 0.061. Device acceleration : 1.9677937548069528


In [22]:
# TODO : discontinuous metric

## 2. Three dimensions

## 2.1 Smooth anisotropic metric

We generalize the two dimensional test case, although it does not much make geometrical sense anymore: we are computing geodesics in a three dimensional volume viewed as an hypersurface embedded in four dimensional Euclidean space.

In [8]:
n=200
hfmIn = HFMUtils.dictIn({
    'model':'Riemann3',
    'seeds':cp.array([[0.,0.,0.]]),
    'exportValues':1,
    'multiprecision':0,
    'nitermax_o':200,
    'raiseOnNonConvergence':0,
})
hfmIn.SetRect([[-np.pi,np.pi],[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)

Casting output of function array from float64 to float32


In [24]:
def height3(x): return np.sin(x[0])*np.sin(x[1])*np.sin(x[2])

In [25]:
hfmIn['metric'] = surface_metric(hfmIn.Grid(),height3)

Casting output of function eye from float64 to float32


In [26]:
gpuOut,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the values array (setting seeds,...)
Preparing the GPU kernel
Setup and run the eikonal solver
GPU solve took 0.2800016403198242 seconds, in 79 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 47.496 s.
Unused fields from user: multiprecision nitermax_o raiseOnNonConvergence 
********************
Max |gpuValues-cpuValues| :  0.00010251036877306774
Solver time (s). GPU : 0.2800016403198242, CPU : 47.496. Device acceleration : 169.62757770186272


In [27]:
n=20; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[-np.pi,np.pi],[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)
hfmInS.update({
    'metric' : surface_metric(hfmInS.Grid(),height), 
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [28]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  7.807706189755237e-05
Solver time (s). GPU : 0.007501125335693359, CPU : 0.017. Device acceleration : 2.266326616235459

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  5.211637139623804e-07
Solver time (s). GPU : 0.007997512817382812, CPU : 0.018. Device acceleration : 2.2506997376580014

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  0.0001812249710688718
Solver time (s). GPU : 0.005997419357299805, CPU : 0.016. Device acceleration : 2.6678141125024846

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  7.500026648621372e-07
Solver time (s). GPU : 0.007500171661376953, CPU : 0.017. Device acceleration : 2.2666147879712635

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Max

Due to the different switching criteria of the second order scheme, we do not have bit consistency in that case. The results are nevertheless quite close.

In [29]:
hfmInS.update({
    'seeds':[[0.,1.,1.]],
    'order':2,
})

In [30]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  0.056333140343209465
Solver time (s). GPU : 0.010499954223632812, CPU : 0.028. Device acceleration : 2.6666782924613988

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.05633361718036767
Solver time (s). GPU : 0.013499975204467773, CPU : 0.029. Device acceleration : 2.148152093672183

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  0.056333140343209465
Solver time (s). GPU : 0.01049947738647461, CPU : 0.028. Device acceleration : 2.6667994005177347

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  0.05633361718036767
Solver time (s). GPU : 0.01349949836730957, CPU : 0.028. Device acceleration : 2.0741511453347696
