# Adaptive PDE discretizations on cartesian grids 
## Volume : GPU accelerated methods
## Part : Reproducibility
## Chapter : Riemannian metrics

In this notebook, we solve Riemannian eikonal equations on the CPU and the GPU, and check that they produce consistent results.

[**Summary**](Summary.ipynb) of volume GPU accelerated methods, this series of notebooks.

[**Main summary**](../Summary.ipynb) of the Adaptive Grid Discretizations 
	book of notebooks, including the other volumes.

# Table of contents
  * [1. Two dimensions](#1.-Two-dimensions)
    * [1.1 Isotropic metric](#1.1-Isotropic-metric)
    * [1.2 Smooth anisotropic metric](#1.2-Smooth-anisotropic-metric)
  * [2. Three dimensions](#2.-Three-dimensions)
    * [2.1 Smooth anisotropic metric](#2.1-Smooth-anisotropic-metric)



**Acknowledgement.** The experiments presented in these notebooks are part of ongoing research.
The author would like to acknowledge fruitful informal discussions with L. Gayraud on the 
topic of GPU coding and optimization.

Copyright Jean-Marie Mirebeau, University Paris-Sud, CNRS, University Paris-Saclay

## 0. Importing the required libraries

In [1]:
import sys; sys.path.insert(0,"../..")
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Riemann_Repro','GPU'))

In [2]:
import cupy as cp
import numpy as np
import itertools
from matplotlib import pyplot as plt
np.set_printoptions(edgeitems=30, linewidth=100000, formatter=dict(float=lambda x: "%5.3g" % x))

In [3]:
from agd import HFMUtils
from agd import AutomaticDifferentiation as ad
from agd import Metrics
from agd import FiniteDifferences as fd
from agd import LinearParallel as lp
import agd.AutomaticDifferentiation.cupy_generic as cugen

norm_infinity = ad.Optimization.norm_infinity
from agd.HFMUtils import RunGPU,RunSmart

In [4]:
def ReloadPackages():
    from Miscellaneous.rreload import rreload
    global HFMUtils,ad,cugen,RunGPU,RunSmart,Metrics
    HFMUtils,ad,cugen,RunGPU,Metrics = rreload([HFMUtils,ad,cugen,RunGPU,Metrics],"../..")    
    RunSmart = cugen.cupy_get_args(HFMUtils.RunSmart,dtype64=True,iterables=(dict,Metrics.Base))

In [5]:
cp = ad.functional.decorate_module_functions(cp,cugen.set_output_dtype32) # Use float32 and int32 types in place of float64 and int64
plt = ad.functional.decorate_module_functions(plt,cugen.cupy_get_args)
RunSmart = cugen.cupy_get_args(RunSmart,dtype64=True,iterables=(dict,Metrics.Base))

### 0.1 Utilities

In [6]:
#from Notebooks_GPU.ExportedCode.Isotropic_Repro import RunCompare
def RunCompare(gpuIn,check=True):
    gpuOut = RunGPU(gpuIn)
    if gpuIn.get('verbosity',1): print("---")
    cpuIn = gpuIn.copy(); cpuIn.pop('traits',None)
    cpuOut = RunSmart(cpuIn)
    print("Max |gpuValues-cpuValues| : ", norm_infinity(gpuOut['values'].get()-cpuOut['values']))
    cpuTime = cpuOut['FMCPUTime'] + cpuOut['StencilCPUTime']; gpuTime = gpuOut['solverGPUTime'];
    print(f"Solver time (s). GPU : {gpuTime}, CPU : {cpuTime}. Device acceleration : {cpuTime/gpuTime}")
    assert not check or cp.allclose(gpuOut['values'],cpuOut['values'],atol=1e-5,rtol=1e-4)
    return gpuOut,cpuOut

In [7]:
factor_variants = [
    {}, # Default
    {"seedRadius":2}, # Spread seed information
    {"factorizationRadius":10,'factorizationPointChoice':'Key'} # Source factorization
]
multip_variants = [
    {'multiprecision':False,'tol':1e-5}, # Default, with smaller error tolerance for reproducibility check
    {'multiprecision':True}, # Reduces roundoff errors
]
order_variants = [
    {'order':1}, # Default
    {'order':2}, # More accurate on smooth instances
]

## 1. Two dimensions

### 1.1 Isotropic metric

In [8]:
n=4000
hfmIn = HFMUtils.dictIn({
    'model':'Riemann2',
    'metric':Metrics.Riemann(cp.eye(2)),
    'seeds':cp.array([[0.5,0.5]]),
    'exportValues':1,
    'bound_active_blocks':True,
    'traits':{
        'niter_i':24,'shape_i':(12,12), # Best
#        'pruning_macro':1,
    }
})
hfmIn.SetRect([[0,1],[0,1]],dimx=n+1,sampleBoundary=True)

Casting output of function eye from float64 to float32
Casting output of function array from float64 to float32


In [9]:
_,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 0.3769969940185547 seconds,  and 336 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 18.507 s.
Unused fields from user: bound_active_blocks 
********************
Max |gpuValues-cpuValues| :  1.5338908722906108e-05
Solver time (s). GPU : 0.3769969940185547, CPU : 18.507. Device acceleration : 49.09057709645595


In [10]:
n=200; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[0,1],[0,1]],dimx=n+1,sampleBoundary=True)
X = hfmInS.Grid()
cost = np.prod(np.sin(2*np.pi*X))+1.1
hfmInS.update({
    'metric': Metrics.Riemann(cost**2*fd.as_field(cp.eye(2),X.shape[1:])), # Isotropic but non-constant metric
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [11]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  4.017569794623199e-07
Solver time (s). GPU : 0.011497974395751953, CPU : 0.026. Device acceleration : 2.2612678638079045

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  4.940263631514341e-08
Solver time (s). GPU : 0.013996601104736328, CPU : 0.026. Device acceleration : 1.8575938404933054

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  4.202669309227858e-07
Solver time (s). GPU : 0.013505220413208008, CPU : 0.028. Device acceleration : 2.0732723453085002

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  9.997386174465106e-08
Solver time (s). GPU : 0.013484477996826172, CPU : 0.026. Device acceleration : 1.9281428621945613

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Ma

In [18]:
hfmInS.update({
    'seeds':[[0.,1.]],
    'order':2,
    'traits':{'decreasing_macro':0,},
})

In [19]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  4.88107854068609e-06
Solver time (s). GPU : 0.03699827194213867, CPU : 0.067. Device acceleration : 1.810895387351626

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  8.07258024870805e-08
Solver time (s). GPU : 0.03701019287109375, CPU : 0.064. Device acceleration : 1.7292533498247784

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False}
Max |gpuValues-cpuValues| :  4.88107854068609e-06
Solver time (s). GPU : 0.03249859809875488, CPU : 0.067. Device acceleration : 2.0616273907078773

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  8.07258024870805e-08
Solver time (s). GPU : 0.03749847412109375, CPU : 0.067. Device acceleration : 1.7867393692777214


### 1.2 Smooth anisotropic metric

In [8]:
n=4000
hfmIn = HFMUtils.dictIn({
    'model':'Riemann2',
    'seeds':cp.array([[0.,0.]]),
    'exportValues':1,
#    'bound_active_blocks':True,
    'traits':{
        'niter_i':16,'shape_i':(8,8), # Best
    },
})
hfmIn.SetRect([[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)

Casting output of function array from float64 to float32


In [9]:
def height(x): return np.sin(x[0])*np.sin(x[1])
def surface_metric(x,z):
    ndim,shape = x.ndim-1,x.shape[1:]
    x_ad = ad.Dense.identity(constant=x,shape_free=(ndim,))
    tensors = lp.outer_self( z(x_ad).gradient() ) + fd.as_field(cp.eye(ndim),shape)*0.2**2
    return Metrics.Riemann(tensors)

In [89]:
hfmIn['metric'] = surface_metric(hfmIn.Grid(),height)

Casting output of function eye from float64 to float32


In [24]:
gpuOut,cpuOut = RunCompare(hfmIn,check=False)

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
GPU kernel eikonal ran for 1.6865017414093018 seconds,  and 701 iterations.
Post-Processing
---
Field verbosity defaults to 1
Field order defaults to 1
Field seedRadius defaults to 0
Fast marching solver completed in 40.391 s.
Unused fields from user: bound_active_blocks 
********************
Max |gpuValues-cpuValues| :  0.00895856545502749
Solver time (s). GPU : 1.6865017414093018, CPU : 58.649. Device acceleration : 34.77553480080654


In [51]:
ReloadPackages()

In [52]:
n=200; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)
hfmInS.update({
    'metric' : surface_metric(hfmInS.Grid(),height), 
    'verbosity':0,
})

Casting output of function eye from float64 to float32


In [11]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  1.4248064864608168e-05
Solver time (s). GPU : 0.06002187728881836, CPU : 0.08399999999999999. Device acceleration : 1.3994897159880832

Reproducibility with options : {}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  2.0135803979748346e-07
Solver time (s). GPU : 0.026521682739257812, CPU : 0.08199999999999999. Device acceleration : 3.0918098525710174

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': False, 'tol': 1e-05}
Max |gpuValues-cpuValues| :  1.401993381255906e-05
Solver time (s). GPU : 0.015485763549804688, CPU : 0.08399999999999999. Device acceleration : 5.424336987313708

Reproducibility with options : {'seedRadius': 2}, {'multiprecision': True}
Max |gpuValues-cpuValues| :  2.1635085523108444e-07
Solver time (s). GPU : 0.025997638702392578, CPU : 0.08. Device acceleration : 3.077202545808037

Reproducibility with options : {'factorizationRadius':

Due to the different switching criteria of the second order scheme, we do not have bit consistency in that case. The results are nevertheless quite close.

In [58]:
hfmInS.update({
    'seeds':[[0.,1.]],
    'order':2,
    'solver':'global_iteration',
    'verbosity':1,
    'nitermax_o':200,
    'traits':{
        'strict_iter_i_macro':0,
        'strict_iter_o_macro':0,
    },
})

In [59]:
gpuOut = RunGPU({**hfmInS,'multiprecision':False,'raiseOnNonConvergence':False})

Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
inf
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
GPU kernel eikonal ran for 0.05752372741699219 seconds,  and 37 iterations.
Post-Processing


In [27]:
gpuOut['keys']['default']['tol']

4.674579e-05

In [122]:
np.all(np.isfinite(gpuOut['values']))

array(True)

In [134]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),(multip_variants[1],)):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': True}
Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
using global iteration
GPU kernel eikonal ran for 0.02498173713684082 seconds,  and 52 iterations.
Post-Processing
---
Field seedRadius defaults to 0
Fast marching solver completed in 0.062 s.
Unused fields from user: multiprecision solver 
********************
Max |gpuValues-cpuValues| :  0.01143910253341518
Solver time (s). GPU : 0.02498173713684082, CPU : 0.1. Device acceleration : 4.002924194271863

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
Setting the kernel traits.
Prepating the domain data (shape,metric,...)
Preparing the problem rhs (cost, seeds,...)
Preparing the GPU kernel
Running the eikonal GPU kernel
using global iteration
GPU kernel eikonal ran for 0.0249993801116943

In [105]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)


Reproducibility with options : {}, {'multiprecision': False, 'tol': 1e-05}
using global iteration
Max |gpuValues-cpuValues| :  0.0033670721841656537
Solver time (s). GPU : 0.020982980728149414, CPU : 0.101. Device acceleration : 4.813424808826371

Reproducibility with options : {}, {'multiprecision': True}
using global iteration
Max |gpuValues-cpuValues| :  0.008086381302999657
Solver time (s). GPU : 0.02551555633544922, CPU : 0.097. Device acceleration : 3.8016023920762474

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': False, 'tol': 1e-05}
using global iteration
Max |gpuValues-cpuValues| :  0.0033670721841656537
Solver time (s). GPU : 0.019998550415039062, CPU : 0.097. Device acceleration : 4.850351549833095

Reproducibility with options : {'factorizationRadius': 10, 'factorizationPointChoice': 'Key'}, {'multiprecision': True}
using global iteration
Max |gpuValues-cpuValues| :  0.008086381302999657
Solver time (s). G

In [None]:
# TODO : discontinuous metric

## 2. Three dimensions

### 2.1 Smooth anisotropic metric

We generalize the two dimensional test case, although it does not much make geometrical sense anymore: we are computing geodesics in a three dimensional volume viewed as an hypersurface embedded in four dimensional Euclidean space.

In [None]:
n=200
hfmIn = HFMUtils.dictIn({
    'model':'Riemann3',
    'seeds':cp.array([[0.,0.,0.]]),
    'exportValues':1,
#    'tol':5e-3,
#    'multiprecision':True,
#    'bound_active_blocks':True,
#    'nitermax_o':200,
#    'raiseOnNonConvergence':0,
})
hfmIn.SetRect([[-np.pi,np.pi],[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)

In [None]:
def height3(x): return np.sin(x[0])*np.sin(x[1])*np.sin(x[2])

In [None]:
hfmIn['metric'] = surface_metric(hfmIn.Grid(),height3)

In [None]:
"""
gpuOut = RunGPU(hfmIn)
print(np.max(np.abs(cpuOut['values']-gpuOut['values'].get())))
"""

In [None]:
gpuOut,cpuOut = RunCompare(hfmIn,check=False)

In [None]:
n=20; hfmInS = hfmIn.copy() # Define a small instance for bit-consistency validation
hfmInS.SetRect([[-np.pi,np.pi],[-np.pi,np.pi],[-np.pi,np.pi]],dimx=n+1,sampleBoundary=True)
hfmInS.update({
    'metric' : surface_metric(hfmInS.Grid(),height), 
    'verbosity':0,
})

In [None]:
for fact,multip in itertools.product(factor_variants,multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip})

Due to the different switching criteria of the second order scheme, we do not have bit consistency in that case. The results are nevertheless quite close.

In [None]:
hfmInS.update({
    'seeds':[[0.,1.,1.]],
    'order':2,
})

In [None]:
for fact,multip in itertools.product((factor_variants[0],factor_variants[2]),multip_variants):
    print(f"\nReproducibility with options : {fact}, {multip}")
    RunCompare({**hfmInS,**fact,**multip},check=False)