# Adaptive PDE discretizations on cartesian grids 
## Volume : Algorithmic tools
## Part : Automatic differentiation
## Chapter : Known bugs and incompatibilities

The techniques of automatic differentiation technique play an essential role in the notebooks presented in this repository. 
Our library is based on subclassing the `numpy.ndarray` class, and is written entirely in Python. This allows for a simple and powerfull implementation, which benefits from the high performance of the numpy module. It does however suffer from a few pitfalls, briefly described below, and illustrated in more detail in the body of the document.

**! Caution with the functions `np.sort`, `np.where`, `np.stack`, `np.broadcast_to` !**
* Problem : the arguments are silently cast to``np.ndarray`, loosing autodiff information.
* Solution : use similarly named replacements from the AutomaticDifferentiation (ad) library, which also apply to `np.ndarray`.

**! Caution with numpy scalars and array scalars !**
* Problem. In an expression `a+b` where the l.h.s is a *numpy scalar*, and the r.h.s an *array scalar of autodiff type*, the r.h.s is silently cast loosing autodiff information`
* Recommended solution : use `a,b = ad.common_cast(a,b)` before `a+b`
* Alternative solution : use `b+a`, or `ad.left_operand(a)+b`. 

[**Summary**](Summary.ipynb) of volume Algorithmic tools, this series of notebooks.

[**Main summary**](../Summary.ipynb) of the Adaptive Grid Discretizations 
	book of notebooks, including the other volumes.

# Table of contents
  * [1. The problem with numpy scalars on the left of array scalars](#1.-The-problem-with-numpy-scalars-on-the-left-of-array-scalars)
    * [1.1 Description of the issue](#1.1-Description-of-the-issue)
    * [1.2 Alternative solutions](#1.2-Alternative-solutions)
    * [1.3 Unexpected occurences](#1.3-Unexpected-occurences)
    * [1.4 Matrix multiplication and inversion](#1.4-Matrix-multiplication-and-inversion)
  * [2. In place modifications and aliasing](#2.-In-place-modifications-and-aliasing)
    * [2.1 Aliasing of the AD information](#2.1-Aliasing-of-the-AD-information)
    * [2.2 Non writeable AD information](#2.2-Non-writeable-AD-information)



**Acknowledgement.** The experiments presented in these notebooks are part of ongoing research, 
some of it with PhD student Guillaume Bonnet, in co-direction with Frederic Bonnans.

Copyright Jean-Marie Mirebeau, University Paris-Sud, CNRS, University Paris-Saclay

## 0. Importing the required libraries

In [2]:
import sys; sys.path.insert(0,"..") # Allow importing agd from parent directory
#from Miscellaneous import TocTools; TocTools.displayTOC('ADBugs','Algo')

In [3]:
import numpy as np
import scipy.sparse.linalg

In [4]:
import agd.AutomaticDifferentiation as ad

In [5]:
def reload_packages():
    from Miscellaneous.rreload import rreload
    global ad
    ad, = rreload([ad],rootdir='..',verbose=True)

## 1. The problem with numpy scalars on the left of array scalars


**TL DR.** When using array scalars (zero dimensional arrays) with AD information, you should cast all variables to a common AD type using the `ad.common_cast` function. Array scalars typically arise because of a reduction operation (e.g. `np.sum()`) or by considering a single coefficient of an AD variable.

In [16]:
a = np.ones((3))
b = ad.Dense.identity(constant=np.ones(3))

print(f"Error : AD array scalar silently downcasted to np.float64.")
print(a.sum()+b.sum(),a[0]+b[0])

Error : AD array scalar silently downcasted to np.float64.
6.0 2.0


The solution is a one liner : cast all numpy variables to a common type whenever array scalars are or might be involved in the computation.

In [17]:
a,b = ad.common_cast(a,b)
print(f"Fine (with ad.common_cast) : AD array scalar keeps its information.")
print(a.sum()+b.sum(),a[0]+b[0])

Fine (with ad.common_cast) : AD array scalar keeps its information.
denseAD(6.0,[1. 1. 1.]) denseAD(2.0,[1. 0. 0.])


The computational overhead should be minimal, because the non-ad variables are merely augmented with a few empty arrays (from one to four depending on the AD type).

In [18]:
a

denseAD(array([1., 1., 1.]),array([], shape=(3, 0), dtype=float64))

This solution (and the issue without it), is common to all the AD types.

In [19]:
for ad_module in [ad.Dense,ad.Dense2,ad.Sparse,ad.Sparse2]:
    a = np.ones((3))
    b = ad.Dense.identity(constant=np.ones(3))
    
    # AD array scalar issue without ad.common_cast
    assert not ad.is_ad(a.sum()+b.sum()) and not ad.is_ad(a[0]+b[0])
    
    # AD array scalar fixed with ad.common_cast
    a,b = ad.common_cast(a,b)
    assert ad.is_ad(a.sum()+b.sum()) and ad.is_ad(a[0]+b[0])

### 1.1 Description of the issue

**! Caution with numpy scalars and autodiff array scalars !**

The type `numpy.float64` often causes trouble due to bad operator priority. Specifically, when it is multiplied (when multiplied with an array of shape `()` and containing automatic differentiation information). We circumvent this issue using the function ad.to_array which casts any value of to a numpy array, in this case to an array containing a single element and of shape $()$ (the empty tuple).

**Context** In order to discuss this issue, which occurs in very specific circumstances, we need to introduce a few concepts.
* A numpy scalar is a variable of type `numpy.float64`, or possibly some other integer of floating point type defined in the numpy module. Standard python scalars, such as `float` and `int`, are not affected by the issue below.
* An array scalar is an array whose shape is the empty tuple `()`. Such arrays contain a single element, and for most purposes behave like a scalar variable.
* Operator resolution is the process by which Python selects the appropriate function to compute `a+b` where `a` and `b` are two variables. In practice: 
 * Python first calls `a.__add__(b)`. 
 * The result is returned, except if it is the special value `NotImplemented`.
 * In that case Python calls `b.__radd__(a)` (note the 'r' which stands for 'right' side operator).

**The problem.**
If `a` is of type `numpy.float64`, and `b` is a subclass of `np.ndarray`, then `a.__add__(b)` usually returns `NotImplemented`, and is superseded by the adequate `b.__radd__(a)`. The exception to the rule is when is an array scalar. In that case, `b` is cast to the base class `np.ndarray` loosing all AD information, and its (single) value is added to `a`.

**Solution.**
The idea is to avoid is previous situation, either exchanging the lhs and rhs, or using using `ad.left_operand` which casts numpy scalars into (better behaved) array scalars.

Let us illustrate the problem in its most basic form, with simple scalars.

In [27]:
a = np.float64(1.)
b = ad.Dense.identity(constant=1.) 
print("a =",a,", b =",b)
print("Error (cast to numpy scalar). a+b =",a+b)

a = 1.0 , b = denseAD(1.0,[1.])
Error (cast to numpy scalar). a+b = 2.0


The following conditions must be reunited for the issue to arise.

In [45]:
assert (np.isscalar(a) and 'numpy' in str(type(a)) and not ad.is_ad(a)) and (np.ndim(b)==0 and ad.is_ad(b))

In particular, numpy *arrays scalars* are better behaved than *numpy scalars* w.r.t. operator priority. Likewise for standard floats, or AD variables.

In [54]:
assert ad.is_ad(type(b)(a) + b) # -> ad.common_cast
assert ad.is_ad(np.array(a) + b) # -> ad.left_operand
assert ad.is_ad(float(a) + b)
assert ad.is_ad(b+a) # -> interversion of operands

The `ad.common_cast` solution given above is based on the first cast, the `ad.left_operand` solution on the second cast, and the interversion of operands method on the last observation.

### 1.2 Alternative solutions

Let us first recall that *the recommended solution is: use `ad.common_cast` whenever AD array scalars are involved*. There are other options, described below, but they might be more error prone in practical scenarios.

In [48]:
a = np.float64(1.) # Numpy scalar (with bad operator priority)
b = ad.Dense.identity(constant=1.) # AD array scalar

The `ad.left_operand` function is casts numpy scalars into array scalars, which are safe as left operands in operations $+,-,*,/$.

In [51]:
lo = ad.left_operand
loa = lo(a)
print(f"numpy scalar {a} {type(a)} is cast into array scalar {loa} {type(loa)}")

numpy scalar 1.0 <class 'numpy.float64'> is cast into array scalar 1.0 <class 'numpy.ndarray'>


However, numpy quickly casts array scalars back into numpy scalars, which must be kept in mind.

In [53]:
print(f"Square of {loa} {type(loa)} is {loa**2} {type(loa**2)}")

Square of 1.0 <class 'numpy.ndarray'> is 1.0 <class 'numpy.float64'>


More description of the issue and solutions below.

In [55]:
assert ad.is_ad(b+a) # Put the AD variable left
assert ad.is_ad(lo(a)+b) # Cast the left variable
assert ad.is_ad(lo(a)+lo(b)) # Cast both variables

The same issue arises with the other arithmetic operators

In [20]:
print("Error (cast to numpy scalar). a-b =",a-b)
print("Error (cast to numpy scalar). a*b =",a*b)
print("Error (cast to numpy scalar). a/b =",a/b)

Error (cast to numpy scalar). a-b = 0.0
Error (cast to numpy scalar). a*b = 1.0
Error (cast to numpy scalar). a/b = 1.0


The same solutions apply. Which one is the simplest can be discussed for non-symmetric operators.

In [56]:
assert ad.is_ad(-(b-a)) 
assert ad.is_ad(lo(a)-b)
assert ad.is_ad(lo(a)-lo(b)) 

assert ad.is_ad(b*a)
assert ad.is_ad(lo(a)*b) 
assert ad.is_ad(lo(a)*lo(b)) 

assert ad.is_ad(1./(b/a))
assert ad.is_ad(lo(a)/b)  
assert ad.is_ad(lo(a)/lo(b)) 

All these problems disappear if 'b' is anything else than an array scalar. In other words if `b.shape!=()`.

In [58]:
a = np.float64(1.)
b = ad.Dense.identity(constant=np.array([1.])) 
print("b is not an array scalar : b.shape =", b.shape)

b is not an array scalar : b.shape = (1,)


In [59]:
assert ad.is_ad(a+b)
assert ad.is_ad(a-b)
assert ad.is_ad(a*b)
assert ad.is_ad(a/b)

### 1.3 Unexpected occurences

The problem depicted above may infortunately occur in a slightly hidden form, where one may not thinking about numpy scalars and array scalars.

In [61]:
a = np.array([1.,2.,3.])
b = ad.Dense.identity(constant=np.array([4.,5.,6.]))

In [62]:
print("a[0] is a numpy scalar.", type(a[0]))
print("a.sum() is a numpy scalar.", type(a.sum()))
print()

print("b[0] is an array scalar.", b[0].shape)
print("b.sum() is an array scalar.",b.sum().shape)

a[0] is a numpy scalar. <class 'numpy.float64'>
a.sum() is a numpy scalar. <class 'numpy.float64'>

b[0] is an array scalar. ()
b.sum() is an array scalar. ()


In [63]:
print("Error (cast to numpy scalar).",a[0]+b[0])
print("Error (cast to numpy scalar).",a.sum()+b.sum())

Error (cast to numpy scalar). 5.0
Error (cast to numpy scalar). 21.0


In the following example, an incorrect value is assigned.

In [64]:
B=b.copy(); B[0]=a[0]+B[0]; print("Incorrect (AD information lost).", B[0])

Incorrect (AD information lost). denseAD(5.0,[0. 0. 0.])


The previous solutions apply. (Recall that we truly recommend `a,b = ad.common_cast(a,b)`.)

In [65]:
assert ad.is_ad(b[0]+a[0]) 
assert ad.is_ad(lo(a[0])+b[0])
assert ad.is_ad(lo(a[0])+lo(b[0])) 

assert ad.is_ad(lo(a.sum())+b.sum())

B=b.copy(); B[0]=lo(a[0])+B[0]; print(B[0])

denseAD(5.0,[1. 0. 0.])


Other alternative approaches can be considered too. For instance in the assignement case.

In [29]:
B=b.copy(); B[0]+=a[0]; print(B[0]) # Using in place assigment
B=b.copy(); B[[0]]=a[[0]]+B[[0]]; print(B[0]) # Using non-scalar arrays

denseAD(5.0,[1. 0. 0.])
denseAD(5.0,[1. 0. 0.])


Yet another solution is to fully eliminate array scalars of AD type, by introducing a (e.g. trailing) singleton dimension. This solution requires a bit of code refactoring, but should be transparent in most places.

In [67]:
a = np.float64(1)
b = ad.Dense.identity(constant=np.array([4.,5.,6.]))
b = np.expand_dims(b,axis=-1) # Add a trailing singleton dimension

In [68]:
a+b[0] # Problem solved, but with a trailing singleton dimension

denseAD(array([5.]),array([[1., 0., 0.]]))

### 1.4 Matrix multiplication and inversion

A similar issue arises with matrix multiplication and inversion : the AD information is lost. An appropriate syntax, presented below, allows to preserve it.


In [174]:
v = ad.Dense.denseAD( np.random.standard_normal((4,)),np.random.standard_normal((4,4)))
m0 = np.random.standard_normal((4,4))
m1 = scipy.sparse.coo_matrix( ([1.,2.,3.,4.,5.],([0,2,1,2,3],[0,1,2,2,3]))).tocsr()

In [169]:
print("np.dot looses AD:",np.dot(m0,v))
print("scipy '*' looses AD:",m1*v.value)

np.dot looses AD: [ 1.70530847 -0.11456795 -0.73066176  1.48267191]
scipy '*' looses AD: [1.67581874 2.97884129 3.97178838 0.20522961]


In [180]:
print("np.dot with AD:\n",ad.apply_linear_mapping(m0,v))
print("scipy '*' with AD:\n",ad.apply_linear_mapping(m1,v))

np.dot with AD:
 denseAD([-1.89150544  0.3796543   1.30376865 -0.98564952],
[[-1.6848455  -2.83764912  1.77247252 -2.77079864]
 [-2.20391131 -0.56107039  1.68847524 -2.58272018]
 [ 1.01843512 -1.36934636  0.89630998 -1.31008306]
 [-2.03349406 -1.29100599  1.54521456 -2.07800914]])
scipy '*' with AD:
 denseAD([ 0.8778869   1.1385308   2.85995405 -7.16854102],
[[-0.34111709 -0.14956627  0.64471031 -1.03636975]
 [ 3.67738763 -0.04489892 -4.54696905 -1.30243209]
 [ 5.89038497 -2.86303733 -4.29064566 -4.47909757]
 [-5.90443925 -4.00817189 -1.08642606 -7.62422685]])


In [179]:
print("scipy solve with AD :\n",ad.apply_linear_inverse(scipy.sparse.linalg.spsolve,m1,v))

scipy solve with AD :
 denseAD([ 0.8778869  -0.25754919  0.22365216 -0.28674164],
[[-0.34111709 -0.14956627  0.64471031 -1.03636975]
 [ 0.28383078  0.92690755 -1.34848809  0.6971018 ]
 [ 0.16453358 -0.46719535  0.29532996 -0.45708691]
 [-0.23617757 -0.16032688 -0.04345704 -0.30496907]])


## 2. In place modifications and aliasing

The AD information often consists of very large arrays. In order to save time and memory, this information is not systematically copied and/or stored fully. It can take the form of a broadcasted array, or of an alias to another array. In that case a copy is necessary to enable modifications.

### 2.1 Aliasing of the AD information

When an operation leaves the AD information untouched, an alias is used. This can lead to bugs if in place modifications are used afterward.

In [36]:
x=ad.Dense.identity(constant=np.array([1.,2.]))
y=x+1 # Only affects the value, not the AD information

In [37]:
print("Values are distinct :", x.value is y.value)
print("AD information is shared :", y.coef is x.coef)

Values are distinct : False
AD information is shared : True


A modification of the aliased variable will impact the original one.

In [38]:
print(x[0])
y[0]*=2
print("Caution ! Shared AD information is affected :", x[0])

denseAD(1.0,[1. 0.])
Caution ! Shared AD information is affected : denseAD(1.0,[2. 0.])


Avoid this effect by making a copy.

In [39]:
x=ad.Dense.identity(constant=np.array([1.,2.]))
y=(x+1).copy()
print("AD information is distinct :", y.coef is x.coef)

AD information is distinct : False


Note that a similar effect arises with the `-` binary operator, but not with `*`or `/`. That is because the latter modify the AD information, which therefore must be copied anyway.

In [40]:
x=ad.Dense.identity(constant=np.array([1.,2.]))
print("AD information is shared :", (x-1).coef is x.coef)
print("AD information is distinct :", (x*2).coef is x.coef)
print("AD information is distinct :", (x/2).coef is x.coef)

AD information is shared : True
AD information is distinct : False
AD information is distinct : False


### 2.2 Non writeable AD information

When creating an dense AD variable, the coefficients may be non writeable (e.g. broadcasted) arrays.

In [41]:
x=ad.Dense.identity(constant=np.array([[1.,2.],[3.,4.]]),shape_bound=(2,))

In [42]:
x.coef.flags.writeable

False

In [43]:
# x+=1 # Fails because non-writeable

Make a copy to solve the issue.

In [44]:
y=x.copy()

In [45]:
y.coef.flags.writeable

True

In [46]:
y+=1