# Problem 4.3.1 
For the Problem 4.2.2 compute the derivatives $\frac{\partial L}{\partial W_0}, \frac{\partial L}{\partial W_1}, \frac{\partial L}{\partial b}$ using the centred differences method with $\epsilon=0.1$ and compare with derivatives found in 4.2.2(c). 

## Recall:

$X = \begin{bmatrix} 3 & 1 & -1 \\ 1 & -2 & 2 \end{bmatrix}, Y = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$


$W = \begin{bmatrix} W_0 & W_1 \end{bmatrix} = \begin{bmatrix} 2 & -1 \end{bmatrix},      b=-1 $


$L = \frac{1}{m} \sum\limits_{j=0}^{m-1} log(cosh(A_j-Y_j))$

$\frac{\partial L}{\partial W_0} \approx \frac{L(W_0 + \epsilon) -L(W_0 - \epsilon)}{2 \epsilon} = \frac{L(2 + 0.1) -L(2 - 0.1)}{2 (0.1)}$ 

$\frac{\partial L}{\partial W_1} \approx \frac{L(W_1 + \epsilon) -L(W_1 - \epsilon)}{2 \epsilon} = \frac{L(-1 + 0.1) -L(-1 - 0.1)}{2 (0.1)}$ 

$\frac{\partial L}{\partial W_1} \approx \frac{L(b + \epsilon) -L(b - \epsilon)}{2 \epsilon} = \frac{L(-1 + 0.1) -L(-1 - 0.1)}{2 (0.1)}$ 

In [5]:
import numpy as np

In [9]:
X = np.array([[3, 1, -1], [1, -2, 2]])
Y = np.array([[1, 2, 3]])
W = np.array([[2, -1]])
b = np.array([[-1]])
eps = 0.1

In [10]:
# define loss function
def Loss(A, Y):
    return np.mean(np.log(np.cosh(A-Y)))

In [11]:
# estimated dLdW0
print("dLdW0 ****")
W = np.array([[2+eps,-1]])
Z = np.dot(W,X) + b
A = Z
print("A plus: {}".format(A))
Lossp = Loss(A,Y)
print("Loss plus: {}".format(Lossp))
W = np.array([[2-eps,-1]])
Z = np.dot(W,X) + b
A = Z
print("A minus: {}".format(A))
Lossm = Loss(A,Y)
print("Loss minus: {}".format(Lossm))
dLdW0 = (Lossp - Lossm)/2/eps
print("Estimated dL/dW0: {}".format(dLdW0))
print("dL/dW0 Using Gradient Descent: {}".format(1.58225273))

dLdW0 ****
A plus: [[ 4.3  3.1 -5.1]]
Loss plus: 3.509000437933375
A minus: [[ 3.7  2.9 -4.9]]
Loss minus: 3.1926808726987503
Estimated dL/dW0: 1.5815978261731223
dL/dW0 Using Gradient Descent: 1.58225273


In [15]:
# estimated dLdW1
print("dLdW1 ****")
W = np.array([[2,-1+eps]])
Z = np.dot(W,X) + b
A = Z
print("A plus: {}".format(A))
Lossp = Loss(A,Y)
print("Loss plus: {}".format(Lossp))
W = np.array([[2,-1-eps]])
Z = np.dot(W,X) + b
A = Z
print("A minus: {}".format(A))
Lossm = Loss(A,Y)
print("Loss minus: {}".format(Lossm))
dLdW1 = (Lossp - Lossm)/2/eps
print("Estimated dL/dW1: {}".format(dLdW1))
print("dL/dW1 Using Gradient Descent: {}".format(-0.84271104))

dLdW0 ****
A plus: [[ 4.1  2.8 -4.8]]
Loss plus: 3.2688289137383593
A minus: [[ 3.9  3.2 -5.2]]
Loss minus: 3.4368058889465085
Estimated dL/dW1: -0.8398848760407462
dL/dW1 Using Gradient Descent: -0.84271104


In [18]:
# estimated dLdb
print("dLdb ****")
W = np.array([[2, -1]])
b = np.array([[-1+eps]])
Z = np.dot(W,X) + b
A = Z
print("A plus: {}".format(A))
Lossp = Loss(A,Y)
print("Loss plus: {}".format(Lossp))
b = np.array([[-1-eps]])
Z = np.dot(W,X) + b
A = Z
print("A minus: {}".format(A))
Lossm = Loss(A,Y)
print("Loss minus: {}".format(Lossm))
dLdb = (Lossp - Lossm)/2/eps
print("Estimated dL/db: {}".format(dLdb))
print("dL/db Using Gradient Descent: {}".format(0.25221638))

dLdb ****
A plus: [[ 4.1  3.1 -4.9]]
Loss plus: 3.375889763221154
A minus: [[ 3.9  2.9 -5.1]]
Loss minus: 3.3255197139710244
Estimated dL/db: 0.2518502462506489
dL/db Using Gradient Descent: 0.25221638
