# References

1. Intro to autodifferentation:<br>
- https://www.tensorflow.org/guide/autodiff
- https://www.neidinger.net/SIAMRev74362.pdf
- https://jmlr.org/papers/volume18/17-468/17-468.pdf
2. Symbolic differentiation:<br>https://en.wikipedia.org/wiki/Computer_algebra
3. Numerical differentiation:<br>https://en.wikipedia.org/wiki/Numerical_differentiation

# Necessary imports

In [1]:
import tensorflow as tf # For accessing TensorFlow library

# Definition

**(Includes discussions on surrounding and related concepts)**

Machine learning methods require the evaluation of derivatives _(ex. calculating the gradient descent i.e. gradient of the loss function of a model)_. In the context of computers, a function is a program or subroutine that takes some input and produces a corresponding output. Here, differentiation means calculating the gradient of a computation with respect to some inputs. $n$th order differentiation of a function can be done in the following ways:
- Symbolic differentiation
- Numerical differentiation
- Automatic differentiaion (also called algorithmic differentiation)

**Symbolic differentiation** is the differentiation of a symbolic mathematical expression using the mathematical techniques, such as differentiation rules, factorization and simplification. Converting a computer program into a viable mathematical expression is itself a major challenge, making this method undesirable differentiating functions in the context of computers. However, this method offers more mathematical rigour, making it more useful in scientific computations.
<br><br>
**Numerical differentiation**, i.e. the method of finite differences, is the differentiation of a function using a finite number of difference quotients i.e. $\frac{f(x + h)-f(x)}{h}$, or other related formulae (here, $x$ refers to the input, $f$ refers to the function, $h$ refers to the step-value). For example, to calculate the gradient of a function at a point (i.e. for a given input value), difference quotients are taken a finite number of times for a certain range of values around the input value, using a fixed step-value i.e. $h$.
<br><br>
These difference quotients are then used to provide an estimate for the gradient at the desired point (either through averaging the difference quotients, or through some other method). The main problem with this method is the round-off error, due to the discrete step-values used, and due to the fact that the range of values around the required point cannot be arbitrarily small.
<br><br>
**Automatic differentiation**, also called computational differentiation, algorithmic differentiation, or simply autodifferentiation or autodiff, is a set of techniques to evaluate the derivative of a function specified by a computer program. It exploits the fact that every computer program (of any complexity) executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division...) and elementary functions (exp, log, sin, cos...). By applying the chain rule repeatedly to these operations, derivatives of any order can be computed automatically i.e. as the function itself is being evaluated. The results are accurate to working precision, and use at a few times more arithmetic operations than the original program.

_**In the context of a neural network**_, to differentiate the loss function automatically, the program needs to remember what operations happen in what order during the forward pass (i.e. the calculation process through which output is produced from given input i.e. neural network layers are traversed from input layer to output layer). Then, during the backward pass (i.e. backpropagation), the program can traverse this list of operations in reverse order to compute gradients for each layer and hence for the whole model.

# Gradient tape

TensorFlow provides the **GradientTape** API _(that creates an interface between the program and the TensorFlow library)_ for autodifferentiation, which involves the calculation of the gradient of a computation process with respect to some inputs, usually TensorFlow **variables**.
<br><br>Note that this is a tool to perform autodifferentiation, not a tool to track autodifferentiation. Rather, it helps keep track of relevant operations that would constitute a function (computation).
<br><br>
TensorFlow contains the means to record relevant operations executed inside the context (i.e. scope) of a **GradientTape** onto a  data structure called **tape**. TensorFlow then uses the **tape** to calculate the gradients of a recorded computation process.

In [3]:
# Independent variable of the computation
x = tf.Variable([1, 2, 3], dtype = tf.float32)

# Computations put in the context of a GradientTape API call
with tf.GradientTape() as tape:
    y = x**2 + x**3
    
# Evaluating the gradient of y = x**2 + x**3 for each value of x (i.e. 1, 2 and 3)
"""
Note that gradient of y with respect to x in this case would be given by
y'= 2*x + 3*x**2
Using this derivative, we can confirm the results of autodifferentiation below.
"""
print(tape.gradient(y, x))

tf.Tensor([ 5. 16. 33.], shape=(3,), dtype=float32)


**NOTE 1: INDEPENDENT VARIABLE REQUIREMENTS**
<br>**1.1.**<br>
The order of the arguments in the **.gradient** function determines the independent variable of the function.
<br>**1.2.**<br>
Note that the **.gradient** function looks for the attribute **\_id** in the independent variable. All object types or collection types do not have this attribute. Hence, we must use TensorFlow object types such as **Variable** or **Tensor** or **Module**, which do have this attribute.
<br>**1.3.**<br>
Autodifferentiation is only performed for floating point values. Hence, ensure that the **dtype** attribute of the independent variable is some form of floating point.
<br><br>
**NOTE 2: PERSISTENCE OF RESULT**<br>
Note that the **GradientTape** API call is not persistent i.e. the results retrieved from the endpoint (i.e. link to a service) **gradient** is not stored, and are only retrieved for a single call. To make these results persistent, you can assign them to an identifier, as in **d = tape.gradient(y, x)**.
