# Introduction To Basic API Operations for DiffKt

**Copyright (c) Meta Platforms, Inc. and affiliates.**

This source code is licensed under the MIT license found in the
LICENSE file in the root directory of this source tree.

## Introduction

Welcome to the Introduction To Basic API Operations for diffkt, (pronounced diff kit).

Differentiable programming is a process of computing derivative over functions automatically. These functions can operate on floating point values, tensors, and user-defined data structures containing them. This tutorial will show you how to use the **diffkt** API to create programs for functions with derivatives so that you can incorporate them into numerical algorithms for scientific computing, optimization, machine learning, and statistics.

## Background on Differentiable Programming

Below are some review papers and a book on differentiable programming. The field is also called automatic differentiation or algorithmic differentiation.

__[A Review of Automatic Differentiation and its Efficient Implementation, (2019)](https://arxiv.org/pdf/1811.05031.pdf)__

__[Automatic Differentiation in Machine Learning: A Survey, (2018)](https://www.jmlr.org/papers/volume18/17-468/17-468.pdf)__

__[Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2nd Ed., (2008)](https://my.siam.org/Store/Product/viewproduct/?ProductId=1005)__

### Some Housekeeping
This notebook uses `api.jar` from the **diffkt** project.<br>

`@file:DependsOn("...")` tells the Kotlin Jupyter notebook the path to a jar that it needs.

In [1]:
@file:DependsOn("../kotlin/api/build/libs/api.jar")

The notebook uses the following imports:

In [2]:
import org.diffkt.*

## Tensors

In **diffkt** there are many different types of differentiable tensors. Tensor means a multi-dimensional array. A float scalar is  a 0D tensor. A vector is a 1D tensor. A 2D array is a 2D tensor. A 3D array is a 3D tensor, and so on.

__[DTensor](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/)__ is the interface for all differentiable tensors in **diffkt**. A differentiable tensor can be a scalar, a 1D tensor, a 2D tensor, a 3D tensor, or have even more dimensions. Scalars also inherit from __[DTensor]( http://www.diffkt.org/api/api/org.diffkt/-d-tensor/)__. A tensor has a number of properties, functions, or extensions defined in the interface. Properties we will discuss about __[DTensor]( http://www.diffkt.org/api/api/org.diffkt/-d-tensor)__ are size, rank, shape, isScalar, and indexing.

A tensor has a __[size](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/size.html)__, which is the number of elements in the tensor,

A tensor has a __[rank](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/rank.html)__, which indicates the number of dimensions: rank 0 - scalar, rank 1 - 1D tensor, rank 2 - 2D tensor, rank 3 - 3D tensor, and so on.

A tensor has a __[shape](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/shape.html)__, which indicates the number of axes and the length of each axis of the tensor.

A tensor has an boolean property to see if it is a scalar, __[isScalar](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/is-scalar.html)__.

Retrieve an element of a tensor use indexing, with the indices indicating the location of the element, such as [0,0] to get the first element of the 2D array.

__[FloatTensor](http://www.diffkt.org/api/api/org.diffkt/-float-tensor/index.html)__ is an an abstract class for the implementation of __[DTensor](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/index.html)__ for floating point numbers. There are multiple types of implementations such as scalar, dense, and sparse tensors.

__[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ is the interface for all differentiable scalars.

__[FloatScalar](http://www.diffkt.org/api/api/org.diffkt/-float-scalar/index.html)__ is an implementation of the interfaces __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ and  __[FloatTensor](http://www.diffkt.org/api/api/org.diffkt/-float-tensor/index.html)__.

__[tensorOf](http://www.diffkt.org/api/api/org.diffkt/tensor-of.html)__ is a factory function that creates a FloatTensor from a set of float numbers. The initial tensor is a 1D array. After creating a tensor with __[tensorOf](http://www.diffkt.org/api/api/org.diffkt/tensor-of.html)__, you may need to __[reshape](http://www.diffkt.org/api/api/org.diffkt/reshape.html)__ the tensor to the shape you want.


### Scalar Example

In [3]:
// Scalar Example

val constant = 1f
val fs = FloatScalar(constant)

println(fs.toString())
println("fs.size = ${fs.size}")
println("fs.rank = ${fs.rank}")
println("fs.shape = ${fs.shape}")
println("fs.isScalar = ${fs.isScalar}")

1.0
fs.size = 1
fs.rank = 0
fs.shape = Shape()
fs.isScalar = true


#### The output should be:

`1.0`<br>
`fs.size = 1`<br>
`fs.rank = 0`<br>
`fs.shape = Shape()`<br>
`fs.isScalar = true`<br>

#### Observations

The value is 1.0.<br>
The tensor has only one element, so the `size` is one.<br>
This is a scalar, so the `rank` is zero.<br>
For a scalar, there are no arguments to `Shape()` indicating the length of the dimensions because the rank is 0. <br>
`isScalar` is true because it is a scalar.

### 1D Tensor Example

In [4]:
// 1D Tensor Example

val ft1 = tensorOf(1f, 2f, 3f)

println(ft1)
println("ft1.size = ${ft1.size}")
println("ft1.rank = ${ft1.rank}")
println("ft1.shape = ${ft1.shape}")
println("ft1.isScalar = ${ft1.isScalar}")
println()
// print the contents of the tensor
println("ft1[0] = ${ft1[0]}")
println("ft1[1] = ${ft1[1]}")
println("ft1[2] = ${ft1[2]}")

[1.0, 2.0, 3.0]
ft1.size = 3
ft1.rank = 1
ft1.shape = Shape(3)
ft1.isScalar = false

ft1[0] = 1.0
ft1[1] = 2.0
ft1[2] = 3.0


#### The output should be:

`[1.0, 2.0, 3.0]`<br>
`ft1.size = 3`<br>
`ft1.rank = 1`<br>
`ft1.shape = Shape(3)`<br>
`ft1.isScalar = false`<br>

`ft1[0] = 1.0`<br>
`ft1[1] = 2.0`<br>
`ft1[2] = 3.0`<br>

#### Observations

The tensor is a 1D tensor with three elements.<br>
The tensor has three elements so the `size` is 3.<br>
This is a 1D tensor, so the `rank` is 1.<br>
The shape is `Shape(3)`, which means 1 dimension with a length of 3.<br>
`isScalar` is false because it is not a scalar.<br>
The indexing gets the values at the locations indicated by the index.<br>

### 2D Tensor Example

In [5]:
// 2D Tensor Example

var ft2 = tensorOf(1f, 2f, 3f, 4f).reshape(2,2)

println(ft2)
println("ft2.size = ${ft2.size}")
println("ft2.rank = ${ft2.rank}")
println("ft2.shape = ${ft2.shape}")
println("ft2.isScalar = ${ft2.isScalar}")
println()
// print the contents of the tensor
println("ft2[0.0] = ${ft2[0,0]}")
println("ft2[0,1] = ${ft2[0,1]}")
println("ft2[1,0] = ${ft2[1,0]}")
println("ft2[1,1] = ${ft2[1,1]}")

[[1.0, 2.0], [3.0, 4.0]]
ft2.size = 4
ft2.rank = 2
ft2.shape = Shape(2, 2)
ft2.isScalar = false

ft2[0.0] = 1.0
ft2[0,1] = 2.0
ft2[1,0] = 3.0
ft2[1,1] = 4.0


#### The output should be:

`[[1.0, 2.0], [3.0, 4.0]]` <br>
`ft2.size = 4` <br>
`ft2.rank = 2` <br>
`ft2.shape = Shape(2, 2)` <br>
`ft2.isScalar = false` <br>

`ft2[0,0] = 1.0`<br>
`ft2[0,1] = 2.0`<br>
`ft2[1,0] = 3.0`<br>
`ft2[1,1] = 4.0`<br>

#### Observations

The tensor is created as a tensor with 4 elements and then is reshaped into a 2x2 tensor.<br>
The tensor has 4 elements, so the `size` is 4. <br>
The tensor is a 2D tensor, so the `rank` is 2.<br>
The `Shape(2,2)` indicates the tensor is 2 dimensional with the length of each axis equal to 2. <br>
`isScalar` is false since this is a 3D tensor.<br>
The multi-dimensional indexing gets the value of a tensor element at the location specified with a 2D index.<br>


### 3D Tensor Example

In [6]:
// 3D Tensor Example

var ft3 = tensorOf(1f, 2f, 3f, 4f, 5f, 6f, 7f, 8f, 9f, 10f, 11f, 12f).reshape(2,2,3)

println(ft3)
println("ft3.size = ${ft3.size}")
println("ft3.rank = ${ft3.rank}")
println("ft3.shape = ${ft3.shape}")
println("ft3.isScalar = ${ft3.isScalar}")

// print the contents of the tensor
println()

for (i in 0..1) {
    for (j in 0..1) {
        for (k in 0..2) {
            println("ft3[${i},${j},${k}] = ${ft3[i,j,k]}")
        }
    }
} 





[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
ft3.size = 12
ft3.rank = 3
ft3.shape = Shape(2, 2, 3)
ft3.isScalar = false

ft3[0,0,0] = 1.0
ft3[0,0,1] = 2.0
ft3[0,0,2] = 3.0
ft3[0,1,0] = 4.0
ft3[0,1,1] = 5.0
ft3[0,1,2] = 6.0
ft3[1,0,0] = 7.0
ft3[1,0,1] = 8.0
ft3[1,0,2] = 9.0
ft3[1,1,0] = 10.0
ft3[1,1,1] = 11.0
ft3[1,1,2] = 12.0


#### The output should be

`[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]` <br>
`ft3.size = 12`<br>
`ft3.rank = 3`<br>
`ft3.shape = Shape(2, 2, 3)`<br>
`ft3.isScalar = false`<br>

`ft3[0,0,0] = 1.0`<br>
`ft3[0,0,1] = 2.0`<br>
`ft3[0,0,2] = 3.0`<br>
`ft3[0,1,0] = 4.0`<br>
`ft3[0,1,1] = 5.0`<br>
`ft3[0,1,2] = 6.0`<br>
`ft3[1,0,0] = 7.0`<br>
`ft3[1,0,1] = 8.0`<br>
`ft3[1,0,2] = 9.0`<br>
`ft3[1,1,0] = 10.0`<br>
`ft3[1,1,1] = 11.0`<br>
`ft3[1,1,2] = 12.0`<br>

#### Observations

The tensor was created with 12 elements, so the `size` is 12.<br>
The tensor is a 3D tensor, so the `rank` is 3.<br>
The `Shape(2,2,3)` indicates the tensor has 3 dimensions. The length of the axes are 2, 2, and 3.<br>
`isScalar` is false since this is a 3D tensor.<br>
The multi-dimensional indexing gets the value of a tensor element at the location specified with a 3D index.<br>


## Scalar Operations

The __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ interface has many operations that can be applied to a differentiable scalar. Click on the **Extensions** tab in the Kotlin docs of __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ to see all the operations available. Note, some operations but not all, will allow you to use traditional arithmatic notation, which is an example of operator overloading. We will look at a few of the operations:<br>

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__,<br>
__['-'](http://www.diffkt.org/api/api/org.diffkt/minus.html)__ or __[minus](http://www.diffkt.org/api/api/org.diffkt/minus.html)__,<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__,<br>
__['/'](http://www.diffkt.org/api/api/org.diffkt/div.html)__ or __[div](http://www.diffkt.org/api/api/org.diffkt/div.html)__, and<br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__.



### Simple Polynomial over a Scalar
We will create a simple polynomial over $x$, a diffentiable scalar variable. Constants $a$ and $b$ are `Float`. Constant $c$ is a __[FloatScalar](http://www.diffkt.org/api/api/org.diffkt/-float-scalar/index.html)__. You can mix both `Float` types and __[FloatScalar](http://www.diffkt.org/api/api/org.diffkt/-float-scalar/index.html)__ in the arithmetic operations. The equation is $ f(x) = (a + bx^2 - cx^3) / c$.



In [7]:
// scalar polynomial

val a = 1f
val b = 2f
val c = FloatScalar(3f)

fun f(x : DScalar) : DScalar {
    val y = (a + b * x.pow(2f) - c * x.pow(3f)) / c
    return y
}

// Evaluation of the scalar polynomial

val x = FloatScalar(3f)
val f = f(x)

println("x = ${x}")
println("f(x) = ${f}")

x = 3.0
f(x) = -20.666666


#### The output should be:

`x = 3.0`<br>
`f(x) = -20.666666`

#### Observations

Both `Float` and __[FloatScalar](http://www.diffkt.org/api/api/org.diffkt/-float-scalar/index.html)__  were mixed in the arithmatic operation.

The following scalar operations were used:

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__,<br>
__['-'](http://www.diffkt.org/api/api/org.diffkt/minus.html)__ or __[minus](http://www.diffkt.org/api/api/org.diffkt/minus.html)__,<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__,<br>
__['/'](http://www.diffkt.org/api/api/org.diffkt/div.html)__ or __[div](http://www.diffkt.org/api/api/org.diffkt/div.html)__, and<br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__.

## Calculating the Derivative of a Scalar Function

There are two different algorithms for calculating the derivative of a function over a __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ variable, the forward derivative algorithm and the reverse derivative algorithm. The forward derivative algorithm is more efficient for when a function has more output variables than input variables. The reverse derivative algorithm is more efficient for a function that has more input variables that output variables. For most situations of optimizing a scalar function, where the output of the function is a single variable, the reverse derivative algorithm is more efficient.

In calling the below functions, one passes a scalar variable, a __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__, to be differentiated and a lambda of the function of the variable. In Kotlin, if you declare the function `fun f(x)` then the lambda is `::f`.

__[forwardDerivative](http://www.diffkt.org/api/api/org.diffkt/forward-derivative.html)__ calculates the derivative of a function over a __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ evaluated at the __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ `x` using the forward derivative algorithm.

__[reverseDerivative](http://www.diffkt.org/api/api/org.diffkt/reverse-derivative.html)__ calculates the derivative of a function over a __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ evaluated at the __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ `x` using the reverse derivative algorithm.

In many cases it is more efficient to calculate the orignal scalar function and its derivative at the same time. In the below functions, they return a `Pair<DTensor, DTensor>` where the first value is called the `primal`, which is the value of a function evaluated at `x`, where `x` is a tensor, and the second value is called the `tangent`, which is the derivative of a function evaluated at `x`, where `x` is a tensor.

__[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__ calculates a function over __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ and its derivative evaluated at the __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ `x` using the forward derivative algorithm.

__[primalAndReverseDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-reverse-derivative.html)__ calculates a function over a __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ and its derivative evaluated at the __[DScalar](http://www.diffkt.org/api/api/org.diffkt/-d-scalar/index.html)__ `x` using the reverse derivative algorithm.

### Derivative of a Simple Polynomial Function over a Scalar

We will calculate the derivative a simple polynomial over $x$, a diffentiable scalar variable. The variables $a$ and $b$ are constants. The variable $c$ is a __[FloatScalar](http://www.diffkt.org/api/api/org.diffkt/-float-scalar/index.html)__. The equation is $ f(x) = (a + bx^2 -cx^3) / c$, and the derivative is $ \frac {df(x)}{dx} = \frac{2b}{c}x - 3x^2 $.

The derivative will be computed with the __[forwardDerivative](http://www.diffkt.org/api/api/org.diffkt/forward-derivative.html)__ function. Since the function $ f(x) $ returns a scalar, the derivative,$ \frac {df(x)}{dx} $ is a scalar.


In [8]:
// scalar polynomial

val a = 1f
val b = 2f
val c = FloatScalar(3f)

fun f(x : DScalar) : DScalar {
       
    val y = (a + b * x.pow(2f) - c * x.pow(3f)) / c
    return y
}

// evaluation of the scalar polynomial and its derivative

val x = FloatScalar(3f)
val f = f(x)
val df = forwardDerivative(x, ::f)

println("x = ${x}")
println("f(x) = ${f}")
println("df(x)/dx = ${df}")

x = 3.0
f(x) = -20.666666
df(x)/dx = -23.0


#### The output should be

`x = 3.0`<br>
`f(x) = -20.666666`<br>
`df(x)/dx = -23.0`<br>

#### Observations

Both `Float` and __[FloatScalar](http://www.diffkt.org/api/api/org.diffkt/-float-scalar/index.html)__ were mixed in the arithmatic operation.

The following scalar operations were used:

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__,<br>
__['-'](http://www.diffkt.org/api/api/org.diffkt/minus.html)__ or __[minus](http://www.diffkt.org/api/api/org.diffkt/minus.html)__,<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__,<br>
__['/'](http://www.diffkt.org/api/api/org.diffkt/div.html)__ or __[div](http://www.diffkt.org/api/api/org.diffkt/div.html)__, and<br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__.

The following function was used to calculate the derivative:

__[forwardDerivative](http://www.diffkt.org/api/api/org.diffkt/forward-derivative.html)__

### Using the `primalAndForwardDerivative` Function

We will calculate the derivative of a simple polynomial over $x$, a diffentiable scalar variable. The variables $a$ and $b$ are constants. The equation is $ f(x) = a + bx^2 $, and the derivative is $ \frac{df(x)}{dx} = 2bx $.

We will use the __[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__  function to calculate both the value of the function and its derivative at the same time. 

Both $f(x)$ and $ \frac{df(x)}{dx}$ are scalars.

In [9]:
// scalar polynomial

val a = 1f
val b = 2f

fun f(x : DScalar) : DScalar {
        
    return a + b * x.pow(2f)
}

// evaluation of scalar polynomial

val x = FloatScalar(3f)
val (fx, df) = primalAndForwardDerivative(x, ::f)

println("x = ${x}")
println("f(x) = ${fx}")
println("df(x)/dx = ${df}")

x = 3.0
f(x) = 19.0
df(x)/dx = 12.0


#### The output should be
`x = 3.0`<br>
`f(x) = 19.0`<br>
`df(x)/dx = 12.0`<br>

#### Observations

The following scalar operations were used:

__[+](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__<br>
__[*](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__<br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__<br>

The following function was used to calculate both the function and the derivative:

__[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__

## Tensor  Operations


The __[DTensor](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/index.html)__ interface has many operations that can be applied to a tensor. Click on the **Extentions** tab in the Kotlin docs of __[DTensor](http://www.diffkt.org/api/api/org.diffkt/-d-tensor/index.html)__ to see all the operations. Some of the operations allow the use of traditional arithmatic notation, or operator overloading. We will look at a few of the operations in the below examples:<br>

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__,<br>
__['-'](http://www.diffkt.org/api/api/org.diffkt/minus.html)__ or __[minus](http://www.diffkt.org/api/api/org.diffkt/minus.html)__,<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__,<br>
__['/'](http://www.diffkt.org/api/api/org.diffkt/div.html)__ or __[div](http://www.diffkt.org/api/api/org.diffkt/div.html)__, <br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__,<br>
__[sin](http://www.diffkt.org/api/api/org.diffkt/sin.html)__,<br>
__[cos](http://www.diffkt.org/api/api/org.diffkt/cos.html)__,<br>
__[matmul](http://www.diffkt.org/api/api/org.diffkt/matmul.html)__,<br>
__[sum](http://www.diffkt.org/api/api/org.diffkt/sum.html)__ and,<br>
__[innerProduct](http://www.diffkt.org/api/api/org.diffkt/inner-product.html)__

## Derivatives of a Function over a Tensor

The symbol nabla, $\nabla$, is an inverted greek symbol $\Delta$. The gradient of a function over a vector of variables is $\nabla f(\mathbf x)$, and is the partial derivatives of the function with respect to each variable. The Jacobian of a vector valued function, either $ J(\mathbf f(\mathbf x))$ or $\mathbf \nabla \mathbf f( \mathbf x)$ is the gradient of each vector component of the function, or the partial derivatives of each vector component of the function with respect to each variable.

The partial derivatives of a function with N inputs and 1 output at a point $ \mathbf x $, where $ \mathbf x $ is a vector of size N, or a function $ f(\mathbf x):R^N \rightarrow R^1 $, is the gradient of the function, which is a function $ \nabla f(\mathbf x): R^N \rightarrow R^N $. The gradient of a function of N variables, where $ \mathbf x = \left [ x_1, x_2, \cdots, x_n \right ] $ is 

$ \nabla f(\mathbf x) = \left [ \frac {\partial f(\mathbf x)} {\partial x_1}, \frac {\partial f(\mathbf x)} {\partial x_2}, \cdots, \frac {\partial f(\mathbf x)} {\partial x_n} \right ]^T$. 

For example, if $ f(x,y) = 4x^2 + 2y $ then $ \nabla f(x, y) = \left [ 8x, 2 \right ]^T $, where $ \nabla f(x,y) = \left [\frac {\partial f(x,y)} {\partial x}, \frac {\partial f(x,y)} {\partial y} \right ]^T$.


The partial derivatives of a function with N inputs and M outputs at a point $ \mathbf x $, where $ \mathbf x $ is of size N, or a function $ \mathbf f(\mathbf x): R^N \rightarrow R^M $, is the Jacobian of the function, or $ \mathbf \nabla \mathbf f(\mathbf x): R^N \rightarrow R^{NxM} $. The point $ \mathbf x $ is a vector of variables, $ \mathbf x = \left [ x_1, x_2, \cdots, x_n \right ] $. The function $ \mathbf f(\mathbf x) $ is a vector of functions evaluated at $ \mathbf x $, $ \mathbf f(\mathbf x) = \left [ f_1(\mathbf x), f_2(\mathbf x), \cdots, f_m(\mathbf x) \right ]^T$. 

The Jacobian of a function is the partial derivatives of each component function by each variable. 

$ \mathbf \nabla \mathbf f(\mathbf x) = \left [ \begin {array} {3} \frac {\partial f_1} {\partial x_1} \cdots \frac {\partial f_1} {\partial x_n} \\ \hspace{0.5em} \vdots \hspace{0.3em} \ddots \hspace{0.3em} \vdots \\ \frac {\partial f_m}{\partial x_1} \cdots \frac {\partial f_m}{\partial x_n} \end {array}\right ] $. 

For example, if $ \mathbf f(x,y) = \left [ 4x^2 + 2y, 2x + 4y^2 \right] $ then 

the Jacobian is $ \mathbf \nabla \mathbf f(x,y) = \left [ \begin {array} {2} 8x, \hspace{0.5em} 2\\ \hspace{0.5em} 2, 8y \end {array} \right ]$.

__[forwardDerivative](http://www.diffkt.org/api/api/org.diffkt/forward-derivative.html)__ calculates the derivative of a function over a tensor, evaluated at the tensor `x`, using the forward derivative algorithm.

__[reverseDerivative](http://www.diffkt.org/api/api/org.diffkt/reverse-derivative.html)__ calculates the derivative of a function over a tensor, evaluated at the tensor `x`, using the reverse derivative algorithm. The reverse derivative algorithm returns the transpose of the derivative calculation, compared to the forward derivative algorithm, when the result is a Jacobian or 2D tensor.

In many cases it is more efficient to calculate the orignal function and its partial derivatives at the same time. In the below functions, they return a `Pair<DTensor, DTensor>`. The first value is called the `primal`, which is the value of a function evaluated at `x`, where `x` is a tensor. The second value is called the `tangent`, which is the derivative of a function evaluated at `x`, where `x` is a tensor.

__[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__ calculates a function over a tensor `x` and its derivative, evaluated at the tensor `x,` using the forward derivative algorithm.

__[primalAndReverseDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-reverse-derivative.html)__ calculates a function over a tensor `x` and its derivative, evaluated at the tensor `x`, using the reverse derivative algorithm. The reverse derivative algorithm returns the transpose of the derivative calculation, compared to the forward derivative algotihms, when the result is a Jacobian or 2D tensor.

### Derivatives of a Polynomial Function over a 1D Tensor

We will calculate the partial derivatives of a polynomial over $ {\mathbf x} $, a 1D __[FloatTensor](http://www.diffkt.org/api/api/org.diffkt/-float-tensor/index.html)__ differentiable tensor. The constants, $\mathbf a $ and $\mathbf b $, are 1D __[FloatTensor](http://www.diffkt.org/api/api/org.diffkt/-float-tensor/index.html)__ tensors. In this example, the addition and multiplication are element-wise.

We will use the __[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__ function to calculate both the value of the function and its derivative at the same time.

The Jacobian is the partial derivatives of vector valued function, a 2D tensor. 

Recall, $ \mathbf \nabla \mathbf f(\mathbf x) = \left [ \begin {array} {3} \frac {\partial f_1} {\partial x_1} \cdots \frac {\partial f_1} {\partial x_n} \\ \hspace{0.5em} \vdots \hspace{0.3em} \ddots \hspace{0.3em} \vdots \\ \frac {\partial f_m}{\partial x_1} \cdots \frac {\partial f_m}{\partial x_n} \end {array}\right ] $. 

For this example, 

$ {\mathbf f}({\mathbf x}) = \left [ a_1 + b_1x_1^2,\hspace{0.5em} a_2 + b_2x_2^2,\hspace{0.5em} a_3 + b_3x_3^2\right ] $, 

so, 

$ \mathbf \nabla {\mathbf f}({\mathbf x}) = \left [ \begin {array} {*{3}{c@{{},{},{}}c}} 2b_1 & 0 & 0 \\ 0 & 2b_2 & 0\\ 0 &  0 & 2b_3 \end {array}\right ] $

In [10]:
// polynomial over a 1D tensor

val a = tensorOf(1f, 2f, 3f)
val b = tensorOf(1f, 2f, 3f)

fun f(x : DTensor) : DTensor {   
    val y = a + (b * (x.pow(2f)))
    return y
}

// evaluation of the polynomial

val x = tensorOf(1f, 2f, 3f)
val (fx, jacobian) = primalAndForwardDerivative(x, ::f)

println("x = ${x}")
println("f(x) = ${fx}")
println("Jacobian(f(x)) = ${jacobian}")


x = [1.0, 2.0, 3.0]
f(x) = [2.0, 10.0, 30.0]
Jacobian(f(x)) = [[2.0, 0.0, 0.0], [0.0, 8.0, 0.0], [0.0, 0.0, 18.0]]


#### The output should be 

`x = [1.0, 2.0, 3.0]` <br>
`f(x) = [2.0, 10.0, 30.0]` <br>
`Jacobian(f(x)) = [[2.0, 0.0, 0.0], [0.0, 8.0, 0.0], [0.0, 0.0, 18.0]]`

#### Observations

The tensor variables are created with __[tensorOf](http://www.diffkt.org/api/api/org.diffkt/tensor-of.html)__<br>

The following tensor operations were used:

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__<br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__<br>

The following function was used to calculate both the function and the derivative:

__[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__

The Jacobian is a 2D tensor. For this function, the partial derivatives are on the diagonal.<br>

$ \mathbf \nabla{\mathbf f}(\left[ 1, 2, 3 \right]) = \left [ \begin {array} {*{3}{c@{{},{},{}}c}} 2 & 0 & 0 \\ 0 & 8 & 0\\ 0 &  0 & 18 \end {array}\right ] $

### Difference Between Forward Derivative and Backward Derivative Algorithms

For a vector valued function where you have N inputs and M outputs, the Jacobian that is returned by the __[forwardDerivative](http://www.diffkt.org/api/api/org.diffkt/forward-derivative.html)__ and __[primalAndForwardDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-forward-derivative.html)__ is an M x N matrix. For the reverse derivative algorithms, __[reverseDerivative](http://www.diffkt.org/api/api/org.diffkt/reverse-derivative.html)__ and __[primalAndReverseDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-reverse-derivative.html)__, the transpose of the Jacobian is returned, which is a N x M Matrix.

For this example there are 3 inputs and 2 outputs.

Let $\mathbf x $ be a 1D tensor with three elements.

Let $g(\mathbf x) = 2 x_0 + 2 x_1^2 + 3 x_1 x_2^3$

Let $h(\mathbf x) = 3 x_0^3 x_1 + 2 x_1^2 + 3 x_2^3$

Then $\mathbf f(\mathbf x) = (g(\mathbf x), h(\mathbf x)) = 2 x_0 + 2 x_1^2 + 3 x_1 x_2^3, 3 x_0^3 x_1 + 2 x_1^2 + 3 x_2^3$ 

The derivative, $\mathbf \nabla \mathbf f(\mathbf x) = \left [ \begin {array} {*{2}{c@{{},{},{}}c}} 2 & 4 x_1 + 3x_2^3 & 9x_1x_2^2 \\ 9x_0^2x_1 & 3x_0^3 + 4x_1 & 9x_2^2 \end {array} \right ]$

The transpose of the derivative, ($\mathbf \nabla \mathbf f(\mathbf x))^T = \left [ \begin {array} {*{3}{c@{{},{}}c}} 2 & 9x_0^2x_1 \\ 4 x_1 + 3x_2^3 & 3x_0^3 + 4x_1 \\ 9x_1x_2^2  & 9x_2^2 \end {array} \right ]$

if $ \mathbf x = \left [ 1, 2, 3 \right ]$,

then $ \mathbf f(\mathbf x) = \left [ 172, 95 \right ] $

and 

$ \mathbf \nabla \mathbf f(\mathbf x) = \left [ \begin {array} {*{2}{c@{{},{},{}}c}} 2 & 89 & 162 \\ 18 & 11 & 81 \end {array} \right ]$

$ (\mathbf \nabla \mathbf f(\mathbf x))^T = \left [ \begin {array} {*{3}{c@{{},{}}c}} 2 &  18 \\ 89 & 11 \\ 162 & 81 \end {array} \right ]$

In [11]:
fun f(x: DTensor) : DTensor {
    
    val g  = 2f * x[0] + 2f * x[1].pow(2f) + 3f * x[1] * x[2].pow(3f) 
    val h  = 3f * x[0].pow(3f) * x[1] + 2f * x[1].pow(2f) + 3f * x[2].pow(3f)
  
    return tensorOf(g as DScalar, h as DScalar) 
}

val x = tensorOf(1f, 2f, 3f)
val (forwardFx, forwardJacobian) = primalAndForwardDerivative(x, ::f)
val (reverseFx, reverseJacobian) = primalAndReverseDerivative(x, ::f)

println("x = ${x}")
println("forward f(x) = ${forwardFx}")
println("forward Jacobian(f(x)) = ${forwardJacobian}")
println("reverse f(x) = ${reverseFx}")
println("reverse Jacobian(f(x)) = ${reverseJacobian}")

x = [1.0, 2.0, 3.0]
forward f(x) = [172.0, 95.0]
forward Jacobian(f(x)) = [[2.0, 89.0, 162.0], [18.0, 11.0, 81.0]]
reverse f(x) = [172.0, 95.0]
reverse Jacobian(f(x)) = [[2.0, 18.0], [89.0, 11.0], [162.0, 81.0]]


#### The output should be

`x = [1.0, 2.0, 3.0]` <br>
`forward f(x) = [172.0, 95.0]` <br>
`forward Jacobian(f(x)) = [[2.0, 89.0, 162.0], [18.0, 11.0, 81.0]]` <br>
`reverse f(x) = [172.0, 95.0]` <br>
`reverse Jacobian(f(x)) = [[2.0, 18.0], [89.0, 11.0], [162.0, 81.0]]` <br>

#### Observations

The reverse derivative algorithm returns the transpose of the Jacobian.

### Derivative of a Dot Product Function

Give a 1D tensor of constants, $\mathbf c = \left[ c_1, c_2, \cdots, c_n \right] $, and a 1D tensor of variables, $ \mathbf x = \left[x_1^2, x_2^2, \cdots, x_n^2 \right] $ then the dot product of $ \mathbf c $ and $ \mathbf x $, or $\mathbf c \cdot \mathbf x $ is defined as $  f(\mathbf x) = \sum c_1 \cdot x_1^2 + c_2 \cdot x_2^2 + \cdots + c_n + x_n^2 $. 

The gradient, as this is a function with one output, or a scalar output, is $ \nabla f(\mathbf x) = \left[ \frac {df(\mathbf x)}{dx_1}, \frac {df(\mathbf x)}{dx_2}, \cdots, \frac {df(\mathbf x)}{dx_n} \right]$ and $ \frac {df(\mathbf x)}{dx_i} = 2 * c_i * x_i $. 

In our case where $ \mathbf c = \left [ 1, 2, 3 \right ]$ and $ \mathbf x = \left [1, 2, 3 \right ]$ then $ \nabla f(\mathbf x) = \left[ 2, 8, 18 \right]$.

We will demostrate calculating the derivative of a dot product three different ways: using a loop, using the sum function, and using the interproduct function.

In [12]:
// Dot Product using a loop

val c = tensorOf(1f, 2f, 3f)  
    
fun f(x: DTensor) : DTensor {
    
    val len = x.size - 1
    
    // note the need to declare the type DTensor
    // all arithmatic operations return type DTensor
    
    var y : DTensor = FloatScalar(0f)
    
    for (i in 0..len) {
        y = y + c[i] * x[i].pow(2f)
    }
    return y
}

val x = tensorOf(1f, 2f, 3f)

val (fx, grad) = primalAndReverseDerivative(x, ::f)

println("c = ${c}")
println("x = ${x}")
println("f(x) = ${fx}")
println("grad(f(x)) = ${grad}")

c = [1.0, 2.0, 3.0]
x = [1.0, 2.0, 3.0]
f(x) = 36.0
grad(f(x)) = [2.0, 8.0, 18.0]


In [13]:
// Dot Product using sum

val c = tensorOf(1f, 2f, 3f)  
    
fun f(x: DTensor) : DTensor {
    
    val y = (c * x.pow(2f)).sum()
    return y
}

val x = tensorOf(1f, 2f, 3f)

val (fx, grad) = primalAndReverseDerivative(x, ::f)

println("c = ${c}")
println("x = ${x}")
println("f(x) = ${fx}")
println("grad(f(x)) = ${grad}")


c = [1.0, 2.0, 3.0]
x = [1.0, 2.0, 3.0]
f(x) = 36.0
grad(f(x)) = [2.0, 8.0, 18.0]


In [14]:
// Dot Product using innerProduct

val c = tensorOf(1f, 2f, 3f)  
    
fun f(x: DTensor) : DTensor {
    
    val y = c.innerProduct(Shape(3), x.pow(2f))
    return y
}

val x = tensorOf(1f, 2f, 3f)

val (fx, grad) = primalAndReverseDerivative(x, ::f)

println("c = ${c}")
println("x = ${x}")
println("f(x) = ${fx}")
println("grad(f(x)) = ${grad}")


c = [1.0, 2.0, 3.0]
x = [1.0, 2.0, 3.0]
f(x) = 36.0
grad(f(x)) = [2.0, 8.0, 18.0]


#### The output should be:

`c = [1.0, 2.0, 3.0]` <br>
`x = [1.0, 2.0, 3.0]` <br>
`f(x) = 36.0` <br>
`grad(f(x)) = [2.0, 8.0, 18.0]`

#### Observations

The tensor variables are created with __[tensorOf](http://www.diffkt.org/api/api/org.diffkt/tensor-of.html)__<br>

The following tensor operations were used:

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__<br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__<br>
__[sum](http://www.diffkt.org/api/api/org.diffkt/sum.html)__<br>
__[innerProduct](http://www.diffkt.org/api/api/org.diffkt/inner-product.html)__<br>

The following function was used to calculate both the function and the derivative:

__[primalAndReverseDerivative](http://www.diffkt.org/api/api/org.diffkt/primal-and-reverse-derivative.html)__

The actual calculations:

$ f(\mathbf x) = 1 \cdot 1 + 2 \cdot 4 + 3 \cdot 9 = 36 $<br>
$ \nabla f(\mathbf x) = \left [  2, 8, 18 \right]$


### A More Complex Example

The purpose of this example is to show more tensor operations.

__['+'](http://www.diffkt.org/api/api/org.diffkt/plus.html)__ or __[plus](http://www.diffkt.org/api/api/org.diffkt/plus.html)__,<br>
__['-'](http://www.diffkt.org/api/api/org.diffkt/minus.html)__ or __[minus](http://www.diffkt.org/api/api/org.diffkt/minus.html)__,<br>
__['*'](http://www.diffkt.org/api/api/org.diffkt/times.html)__ or __[times](http://www.diffkt.org/api/api/org.diffkt/times.html)__,<br>
__['/'](http://www.diffkt.org/api/api/org.diffkt/div.html)__ or __[div](http://www.diffkt.org/api/api/org.diffkt/div.html)__, <br>
__[pow](http://www.diffkt.org/api/api/org.diffkt/pow.html)__,<br>
__[sin](http://www.diffkt.org/api/api/org.diffkt/sin.html)__,<br>
__[cos](http://www.diffkt.org/api/api/org.diffkt/cos.html)__ and,<br>
__[matmul](http://www.diffkt.org/api/api/org.diffkt/matmul.html)__,<br>

In [15]:
// A more complex example
 
val c = tensorOf(1f, 2f, 3f, 4f).reshape(2, 2)

fun f(x:DTensor) : DTensor {
        
    var y = tensorOf(0f, 0f, 0f, 0f).reshape(2, 2)
    
    for (i in 0..2) {
        if (i % 2 == 0) 
            y = y + c * sin(x).pow(i)
        else
            y = y - c * cos(x).pow(i)
    }
    
    y = y / 2f
    
    val scale : DTensor = tensorOf(2f, 0f, 0f, 2f).reshape(2, 2)
    y = y.matmul(scale)
    
    return y
    
}

val x = tensorOf(1f, 2f, 3f, 4f).reshape(2, 2)
val (fx, jacobian) = primalAndReverseDerivative(x, ::f)

println("x = ${x}")
println("f(x) = ${fx}")
println("Jacobian(f(x)) = ${jacobian}")

x = [[1.0, 2.0], [3.0, 4.0]]
f(x) = [[1.1677711, 4.485937], [6.0297217, 8.905575]]
Jacobian(f(x)) = [[[[1.7507683, 0.0], [0.0, 0.0]], [[0.0, 0.3049898], [0.0, 0.0]]], [[[0.0, 0.0], [-0.41488642, 0.0]], [[0.0, 0.0], [0.0, 0.930223]]]]


#### The output should be:

`x = [[1.0, 2.0], [3.0, 4.0]]` <br>
`f(x) = [[1.1677711, 4.485937], [6.0297217, 8.905575]]` <br>
`Jacobian(f(x)) = [[[[1.7507683, 0.0], [0.0, 0.0]], [[0.0, 0.3049898], [0.0, 0.0]]], [[[0.0, 0.0], [-0.41488642, 0.0]], [[0.0, 0.0], [0.0, 0.930223]]]]`

#### Observations

$ \mathbf f(\mathbf x)$ is a vector valued function, so the derivative is the Jacobian.

## The End

This notebook demonstrated how to create a differentable tensor, how to construct a function that applies operations to the tensor, and how to take the derivative of the function.