# Enzyme.jl Demo for Speedy (CPU)

This notebook outlines the basic functionality of Enzyme in examples that use similar structures like Speedy, but we will start a bit easier. 

## Scalar Function - Scalar Output



In [2]:
import Pkg 
Pkg.activate("..") # make sure this is really the right environment
using Enzyme, Test

[32m[1m  Activating[22m[39m project at `~/Nextcloud/SpeedyExperiments`


In [3]:
f1(x) = x*x
∂ = autodiff(f1, Active(1.0))
@test first(autodiff(f1, Active(1.0))) ≈ 2.0


[32m[1mTest Passed[22m[39m
  Expression: first(autodiff(f1, Active(1.0))) ≈ 2.0
   Evaluated: 2.0 ≈ 2.0

The first example is a simple scalar function. Here, we use Enzyme's `autodiff` function the frist time. It is Enzyme's central function that computes gradients of the input function. In this example we don't have to do much. We hand over the function `f1` and in the second argument we indicate that the return value of the function is "active", i.e. we want to differentiate w.r.t. it. Enzyme expects the user to already allocate the memory for the gradients, as we will also see in the next example. The results are then usually multiplied to this input. This makes sense when we think about the are large chain of operations that the AD usually computes were we already have an input gradient with a certain value that is then further propagated through the chain (i.e. backpropagation).

## Mutating Function 1

Enzyme can handle mutation! This is the main reason why we are interested in it. However, when working with arrays and other non-scalar objects, Enzyme expects the functions to work in-place and return `nothing`. 

In [4]:
function f2!(y)
    y[1] *= 3 
    y[2] /= 2
    y[3] *= 2.5  
    # nothing   # we skip this hear, but also explictly write it 
end

inp = [2., 2, 2]
∂f∂inp = fill!(similar(inp), 1) # y is an output of this mutating function that's why we set the shadow to 1 
∂2 = autodiff(f2!, Const, Duplicated(inp, ∂f∂inp))
@test ∂f∂inp ≈ [3, 0.5, 2.5]


[32m[1mTest Passed[22m[39m
  Expression: ∂f∂inp ≈ [3, 0.5, 2.5]
   Evaluated: [3.0, 0.5, 2.5] ≈ [3.0, 0.5, 2.5]

In this example we defined `f2!` to update an array in-place. The syntax of `autodiff` is 
1. Argument: the function 
1. Argument: the behaviour of the return value, in this case it is `nothing`, its behaviour is constant (`Const`)
1. Every further argument: the input arguments of the function, here we have to indicate wheather we want to differentiate w.r.t to them with `Duplicated` or not `Const`

`Duplicated` takes in the actual input as the first argument and then a pre-allocated array for the gradient. For input arguments this "shadow" should be just zeros, for output arguments the gradient is multiplied with the shadow, so in most cases we will want it to be identity. If an argument is both input and output as it is the case here, it is considered like an output. 

## Mutating Function 2 

Now we add other input arguments. 

In [16]:
function f3!(y, x)
    y[1] = 3*x[1] 
    y[2] = x[2]/2
    y[3] = x[3]*2.5  
    # nothing
end


f3! (generic function with 1 method)

In [6]:
x1 = [2., 2, 2]
∂f∂x1 = zero(x1) # that's an input to the function

y1 = zeros(3)
∂f∂y1 = fill!(similar(y1), 1) 

3-element Vector{Float64}:
 1.0
 1.0
 1.0

In [17]:
∂2 = autodiff(f3!, Const, Duplicated(y1, ∂f∂y1), Duplicated(x1, ∂f∂x1))

()

The forward passes through the function: the input just remains the input and the output is computed:

In [9]:
@test x1 ≈ x1 # thats the forward pass, nothing should happen here


[32m[1mTest Passed[22m[39m
  Expression: x1 ≈ x1
   Evaluated: [2.0, 2.0, 2.0] ≈ [2.0, 2.0, 2.0]

In [10]:
@test y1 ≈ [3*x1[1], x1[2]/2, x1[3]*2.5]

[32m[1mTest Passed[22m[39m
  Expression: y1 ≈ [3 * x1[1], x1[2] / 2, x1[3] * 2.5]
   Evaluated: [6.0, 1.0, 5.0] ≈ [6.0, 1.0, 5.0]

Now, $\frac{\partial(f3!)}{\partial (x1)}$, the gradient that we want, is saved in the shadow `∂f∂x1` that we handed over together with `x1`

In [12]:
@test ∂f∂x1 ≈ [3, 0.5, 2.5] # this is the gradient with respect to the input that we (probably) want


[32m[1mTest Passed[22m[39m
  Expression: ∂f∂x1 ≈ [3, 0.5, 2.5]
   Evaluated: [3.0, 0.5, 2.5] ≈ [3.0, 0.5, 2.5]

The gradient that the hand over to `autodiff` as a shadow to the output `y1` is set to zero. This is due to technical reasons.

In [13]:
@test ∂f∂y1 ≈ zero(y1) # this is just zero, it is not used in this case 

[32m[1mTest Passed[22m[39m
  Expression: ∂f∂y1 ≈ zero(y1)
   Evaluated: [0.0, 0.0, 0.0] ≈ [0.0, 0.0, 0.0]

# Structs 1 

Amazingly, Enzyme can handle `structs` and functions defined on those very well. In this case we will have to use instances of the struct as inputs for `Duplicated`, also for the pre-allocated gradients / shadows. The results for the gradients are saved in the shadowed instance of the struct, here `∂X`.

In [22]:
using LinearAlgebra
struct PreComputeMul1
    X_1
    X_2
end

function compute!(y, a::PreComputeMul1) 
    mul!(y, a.X_1, a.X_2)
end

X_1 = rand(5,3)
∂X_1 = zero(X_1) # input, hence zero 
X_2 = rand(3,7)
∂X_2 = zero(X_2) # input, hence zero
Y = zeros(size(X_1,1), size(X_2,2))
∂Y = fill!(similar(Y), 1) # output, hence something
∂Y_copy = deepcopy(∂Y)

X = PreComputeMul1(X_1, X_2)
∂X = PreComputeMul1(∂X_1, ∂X_2)
∂2 = autodiff(compute!, Const, Duplicated(Y, ∂Y), Duplicated(X, ∂X))

()

In [20]:
@test Y ≈ X_1 * X_2

[32m[1mTest Passed[22m[39m
  Expression: Y ≈ X_1 * X_2
   Evaluated: [0.37451230688640424 0.37013664876879626 … 0.28743103558218003 0.45581274946163713; 1.0760458134025792 0.5713326693875351 … 0.4901343571343515 0.9115786976252107; … ; 1.3265754343580327 0.6486778663911275 … 0.5713374723816322 1.0803217918665573; 0.6384798797590644 0.19644649855263036 … 0.22822479002486376 0.43609764357660485] ≈ [0.37451230688640424 0.37013664876879626 … 0.28743103558218003 0.45581274946163713; 1.0760458134025792 0.5713326693875351 … 0.49013435713435144 0.9115786976252106; … ; 1.3265754343580327 0.6486778663911275 … 0.5713374723816322 1.0803217918665573; 0.6384798797590644 0.19644649855263038 … 0.22822479002486376 0.43609764357660485]

The gradient is the derivative multiplied with the input gradient `∂Y`, here $\frac{\partial(Y)}{\partial(X_1)} = X_2$ and $\frac{\partial(Y)}{\partial(X_2)} = X_1$

In [23]:
@test ∂X.X_1 ≈ ∂Y_copy * X_2' # 

[32m[1mTest Passed[22m[39m
  Expression: ∂X.X_1 ≈ ∂Y_copy * X_2'
   Evaluated: [2.9949364057740606 2.7954736223384082 4.082773106787766; 2.9949364057740606 2.7954736223384082 4.082773106787766; … ; 2.9949364057740606 2.7954736223384082 4.082773106787766; 2.9949364057740606 2.7954736223384082 4.082773106787766] ≈ [2.99493640577406 2.7954736223384082 4.082773106787766; 2.99493640577406 2.7954736223384082 4.082773106787766; … ; 2.99493640577406 2.7954736223384082 4.082773106787766; 2.99493640577406 2.7954736223384082 4.082773106787766]

In [24]:
@test ∂X.X_2 ≈ X_1' * ∂Y_copy

[32m[1mTest Passed[22m[39m
  Expression: ∂X.X_2 ≈ X_1' * ∂Y_copy
   Evaluated: [1.7933866736456028 1.7933866736456028 … 1.7933866736456028 1.7933866736456028; 2.0492396970934452 2.0492396970934452 … 2.0492396970934452 2.0492396970934452; 2.7195120833556357 2.7195120833556357 … 2.7195120833556357 2.7195120833556357] ≈ [1.7933866736456028 1.7933866736456028 … 1.7933866736456028 1.7933866736456028; 2.0492396970934452 2.0492396970934452 … 2.0492396970934452 2.0492396970934452; 2.7195120833556357 2.7195120833556357 … 2.7195120833556357 2.7195120833556357]

## Structs 2 

A similar example, but now we save the result in the struct itself

In [30]:
struct PreComputeMul2
    Y
    X_1
    X_2
end

function compute!(a::PreComputeMul2) 
    mul!(a.Y, a.X_1, a.X_2)
    # nothing
end

X_1 = rand(3,3)
∂X_1 = zero(X_1) # input, hence zero 
X_2 = rand(size(X_1)...)
∂X_2 = zero(X_2) # input, hence zero
Y = zeros(size(X_1)...)
∂Y = fill!(similar(Y), 1) # output, hence something
∂Y_copy = deepcopy(∂Y)



X = PreComputeMul2(Y, X_1, X_2)
∂X = PreComputeMul2(∂Y, ∂X_1, ∂X_2)
∂2 = autodiff(compute!, Const, Duplicated(X, ∂X))

()

In [28]:
@test X.Y ≈ X.X_1 * X.X_2

[32m[1mTest Passed[22m[39m
  Expression: X.Y ≈ X.X_1 * X.X_2
   Evaluated: [0.8624866974075824 0.6596724597891246 0.9929891832729016; 0.5475132598976726 0.2768682368620906 0.3616559734025895; 0.9850263938422328 0.7515854712020125 1.5176797640735056] ≈ [0.8624866974075824 0.6596724597891246 0.9929891832729016; 0.5475132598976726 0.2768682368620906 0.3616559734025895; 0.9850263938422328 0.7515854712020125 1.5176797640735056]

Again, the results are saved in the shadow struct `∂X`

In [31]:
∂X.X_1

3×3 Matrix{Float64}:
 1.1786  1.44219  1.82784
 1.1786  1.44219  1.82784
 1.1786  1.44219  1.82784

In [32]:
∂Y_copy * X.X_2'

3×3 Matrix{Float64}:
 1.1786  1.44219  1.82784
 1.1786  1.44219  1.82784
 1.1786  1.44219  1.82784

In [33]:
@test ∂X.X_1 ≈ ∂Y_copy * X.X_2' # that's the gradient of the output wrt X_1, the first term is the original ∂Y that is mutated by the autodiff call 



[32m[1mTest Passed[22m[39m
  Expression: ∂X.X_1 ≈ ∂Y_copy * (X.X_2)'
   Evaluated: [1.1786044248967888 1.442189791306392 1.8278402065528419; 1.1786044248967888 1.442189791306392 1.8278402065528419; 1.1786044248967888 1.442189791306392 1.8278402065528419] ≈ [1.1786044248967888 1.442189791306392 1.8278402065528419; 1.1786044248967888 1.442189791306392 1.8278402065528419; 1.1786044248967888 1.442189791306392 1.8278402065528419]

In [None]:

@test ∂X.X_2 ≈ X_1' * fill!(similar(Y), 1)



