# Problem 4.2.2 

Consider linear regression using the following training data:

Let $X = \begin{bmatrix} 3 & 1 & -1 \\ 1 & -2 & 2 \end{bmatrix}, Y = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$

Assume that 

$W = \begin{bmatrix} W_0 & W_1 \end{bmatrix} = \begin{bmatrix} 2 & -1 \end{bmatrix},      b=-1 $


**Log-Cosh Loss:**

$L = \frac{1}{m} \sum\limits_{j=0}^{m-1} log(cosh(A_j-Y_j))$

Use the Log-Cosh Loss function to

In [43]:
import numpy as np

In [44]:
X = np.array([[3, 1, -1], [1, -2, 2]])
Y = np.array([[1, 2, 3]])
W = np.array([[2, -1]])
b = np.array([[-1]])
m = X.shape[1]

**(a)** Perform forward propagation for the above W,b.

In [45]:
# forward propagation
Z = np.dot(W,X) + b
A = Z
print("Z: {}".format(Z))
print("A: {}".format(A))

Z: [[ 4  3 -5]]
A: [[ 4  3 -5]]


**(b)**	Compute the Loss function after forward propagation.

In [54]:
L = np.mean(np.log(np.cosh(A-Y)))
print("Loss: {}".format(L))

Loss: 3.0329782311924767


**(c)**	Perform back propagation for the above training data and parameter matrices to determine $\nabla_WL$ and $\nabla_bL$

In [47]:
# back propagation
grad_AL = np.tanh(A-Y) / m
print("grad_AL: {}".format(grad_AL))
dAdZ = np.ones(Y.shape)
print("dAdZ: {}".format(dAdZ))
grad_ZL = grad_AL*dAdZ
print("grad_ZL: {}".format(grad_ZL))
grad_WL = np.dot(grad_ZL,X.T)
grad_bL = np.sum(grad_ZL,axis=1,keepdims=True)
print("grad_WL: {}".format(grad_WL))
print("grad_bL: {}".format(grad_bL))

grad_AL: [[ 0.33168492  0.25386472 -0.33333326]]
dAdZ: [[1. 1. 1.]]
grad_ZL: [[ 0.33168492  0.25386472 -0.33333326]]
grad_WL: [[ 1.58225273 -0.84271104]]
grad_bL: [[0.25221638]]


**(d)** Perform 1 epoch of training using Gradient Descent with learning rate of 0.1 and recompute the loss function with the updated W,b

In [48]:
alpha = 0.1
# update W and b
W = W - alpha*grad_WL
b = b - alpha*grad_bL
print("update W and b")
print("W epoch 1: {}".format(W))
print("b epoch 1: {}".format(b))
# recompute loss
Z = np.dot(W,X) + b
A = Z
L = np.mean(np.log(np.cosh(A-Y)))
print("Loss epoch 1: {}".format(L))

update W and b
W epoch 1: [[ 1.84177473 -0.9157289 ]]
b epoch 1: [[-1.02522164]]
Loss epoch 1: 3.0329782311924767


**(e)**	Compute the prediction based on input feature matrix X above and the updated W,b from (d)

In [52]:
Z = np.dot(W,X) + b
A = Z
print("prediction")
print("Z: {}".format(Z))
print("A: {}".format(A))

prediction
Z: [[ 3.58437365  2.64801088 -4.69845416]]
A: [[ 3.58437365  2.64801088 -4.69845416]]


**(f)**	Compute the accuracy of the prediction in (e) when compared against the actual Y specified above.

In [53]:
# Accuracy calculation
Accuracy = np.mean(np.absolute(A-Y))
print("Accuracy: {}".format(Accuracy))

Accuracy: 3.6436128953478506
