# Demystifying Neural Networks 

---

# Exercises - Autograd

Let's do some differentiation!

For a start install `autograd`.

In [None]:
%bash
conda install autograd

Or

In [None]:
%bash
pip isntall autograd

The main function inside `autograd` is `grad`.
It differentiates a function at a point it has been evaluated
with relation to it first argument
(or another argument if you pass `argnum=`).
Also `autograd` wraps `numpy` so that `numpy` arrays
can be differentiated with relation to.

In [1]:
from autograd import numpy as np
from autograd import grad

Let's start with an example, we will take the function

$$
f(x) = (x - 3)^2
$$

And we know that

$$
\frac{df}{dx} = 2(x - 3)
$$

Now we will use `grad` to do this for us.
But `grad` evaluates the derivative at a point, so we will use $x = 7$.

Note: **`autograd` only works on floating point numbers**,
don't be caught by literally writing integers.

In [2]:
def fun(x):
    return (x - 3)**2

fun_grad = grad(fun)
x = 7.
print(fun(x))
print(fun_grad(7.))

16.0
8.0


Which seems about correct, $(7 - 3)^2 = 16$ and $2(7 - 3) = 8$.
Let's try this out.

#### 1. Evaluate the function $f(x) = (x - 10)^2$ and its derivative (using `autograd`) at

A) $x = 3$

B) $x = 10$

C) $x = 12$

In [3]:
def f(x):
    return (x - 10)**2

f_grad = grad(f)
print(f_grad(3.))
print(f_grad(10.))
print(f_grad(12.))

-14.0
0.0
4.0


#### 2. Evaluate, on paper, the derivative of $f(x) = (x - 10)^2$

Note that in this case we can mostly ignore the existence of the $-10$
and use the basic rule of derivative powers:

$$
\frac{dx^n}{dx} = n \cdot x^{(n - 1)}
$$

Do it by analogy with the derivative of $(x - 3)^2$ above.

$$
\frac{d(x - 10)^2}{dx} = 2(x - 10)
$$

And we can see that $2(3 - 10) = -14$, $2(10 - 10) = 0$ and $2(12 - 10) = 4$

#### 3. Use `autograd` to evaluate the (slightly more complex) derivatives of the following functions

A) $f(x) = xe^x$ at $x = 7$

B) $f(x) = \frac{x + 2^x}{e^x}$ at $x = 3.5$

C) $f(x) = \ln(2 + x) + cos^2(x)$ at $x = -0.5$

Note: $e$ is the Euler's number and exists in `numpy` as `np.e`,
$\ln$ is the natural logarithm - with $e$ as its base - in `numpy`
`np.log` performs the natural logarithm.

In [4]:
def f1(x):
    return x * np.e**x

def f2(x):
    return (x + 2**x)/(np.e**x)

def f3(x):
    return np.log(2 + x) + np.cos(np.cos(x))

f1_grad = grad(f1)
f2_grad = grad(f2)
f3_grad = grad(f3)
print(f1(7.), f1_grad(7.))
print(f2(3.5), f2_grad(3.5))
print(f3(-0.5), f3_grad(-0.5))

7676.432108999208 8773.065267427666
0.44733523545030124 -0.18032800393862408
1.0444776022734237 0.2978942899546265


---

`autograd` differentiates only one of the arguments (by default the first).
Yet, that argument may be a list or other sequence,
that way we are able to differentiate multidimensional functions.
For example we ma have

$$
f(x, y) = x^2 + 3y
$$

And we want to differentiate with relation to (wrt) both $x$ and $y$.
We can do the following

In [5]:
def f(args):
    x, y = args
    return x**2 + 3*y

f_grad = grad(f)
args = [3., 2.]
print(f(args))
print(f_grad(args))

15.0
[array(6.), array(3.)]


We got some extra `numpy` artifacts in there
but in general `autograd` did compute

$$
\frac{\partial f}{\partial x} = 2x
$$

and

$$
\frac{\partial f}{\partial y} = 3
$$

And got it right, $2*3 = 6$ and $3 = 3$.

#### 4. Evaluate all three partial derivatives for $f(x, y, z) = x^2y3^z$ at $x = 3$, $y = -1$ and $z = 2$

i.e. evaluate $\frac{\partial f}{\partial x}$, $\frac{\partial f}{\partial y}$
and $\frac{\partial f}{\partial z}$ using `autograd`.

In [6]:
def f(args):
    x, y, z = args
    return x**2*y*3**z

f_grad = grad(f)
print(f([3., -1., 2.]))
print(f_grad([3., -1., 2.]))

-81.0
[array(-54.), array(81.), array(-88.98759538)]


#### 5. (optional) Verify the results from the previous exercise on paper

#### 6. Evaluate the derivative of matrix multiplication

Given

$$
W = \left[
\begin{matrix}
0.3 & -0.2 & 0.1 \\
0.1 &  0.2 & 0.1 \\
\end{matrix}
\right],
V = \left[
\begin{matrix}
0.1 & -0.3 \\
0.1 &  0.5 \\
0.2 &  0.7 \\
\end{matrix}
\right]
$$

Use `autograd` to evaluate  $\frac{\partial f}{\partial W}$
and $\frac{\partial f}{\partial V}$ for the functions

A) $f(W, V) = mean(W \cdot V)$

B) $f(W, V) = mean(W \cdot V)$

Note: are the results the same?

In [7]:
W = np.array([
    [.3, -.2, .1],
    [.1,  .2, .1],
])
V = np.array([
    [.1, -.3],
    [.1,  .5],
    [.2,  .7],
])

In [8]:
def f1(args):
    W, V = args
    return np.mean(W @ V)

def f2(args):
    W, V = args
    return np.mean(V @ W)

f1_grad = grad(f1)
f2_grad = grad(f2)
print(f1([W, V]))
print(f2([W, V]))
print(f1_grad([W, V]))
print(f2_grad([W, V]))

0.025
0.048888888888888885
[array([[-0.05 ,  0.15 ,  0.225],
       [-0.05 ,  0.15 ,  0.225]]), array([[0.1 , 0.1 ],
       [0.  , 0.  ],
       [0.05, 0.05]])]
[array([[0.04444444, 0.04444444, 0.04444444],
       [0.1       , 0.1       , 0.1       ]]), array([[0.02222222, 0.04444444],
       [0.02222222, 0.04444444],
       [0.02222222, 0.04444444]])]
