# CS 4820
# Assignment 7: Understanding how back-propagation works in a Deep Neural Net

In assignment 3, in order to understand the feed-forward process in a Deep Neural Net and what is exactly happening when the function `model.predict()` is called, we manually calculated outputs from all neurons layer by layer.

This assignment is a continuation of a3, whose goal is to demystify the function `model.fit()`, i.e., break down how a Neural Net learns.

## 0. Selective common activation functions and their derivatives (completed)

### 0.1 Sigmoid

In [1]:
import numpy as np

# Either use the built-in function in Keras
from tensorflow.keras.activations import sigmoid

# Or use our own
# def sigmoid(x):
#    return 1.0 / (1.0 + np.exp(-x))

def derivative_sigmoid(y):
    return y * (1-y)

### 0.1 Hyperbolic Tangent

In [2]:
from tensorflow.keras.activations import tanh

def derivative_tanh(y):
    return 1 - y * y

### 0.2 ReLU

In [3]:
from tensorflow.keras.activations import relu

def derivative_relu(y):
    return np.maximum(0,y)

### 0.3 Softmax and Cross Entropy Loss

In [4]:
from tensorflow.keras.activations import softmax
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.losses import binary_crossentropy

def derivative_softmax_ce_loss(y_hat, y):
    return y_hat - y

## 1. Manual Calculations 

### 1.1 Manually calculate the outputs from all neurons layer by layer (completed)

In [5]:
import numpy as np
import tensorflow as tf

X_train = np.array([[0.9,1.2,0.7,0.8]]) 
print('Inputs:\n', X_train, '\n')
print(type(X_train))

Inputs:
 [[0.9 1.2 0.7 0.8]] 

<class 'numpy.ndarray'>


In [6]:
a=np.array([[1,0,-1,0]]).T
weights_0 = np.concatenate(
    (
        a,
        np.roll(a,1,axis=0),
        np.roll(a,3,axis=0)
    ), 
    axis=1
)
thetas_0 = np.full((3,),0.1)
print('Weights between input and hidden layer 1:\n', weights_0)
print('Thetas in hidden layer 1:\n', thetas_0)

y_1=tanh(X_train.dot(weights_0)+thetas_0)
print('Outputs from hidden layer 1:\n', y_1.numpy(), '\n')

Weights between input and hidden layer 1:
 [[ 1  0  0]
 [ 0  1 -1]
 [-1  0  0]
 [ 0 -1  1]]
Thetas in hidden layer 1:
 [0.1 0.1 0.1]
Outputs from hidden layer 1:
 [[ 0.29131261  0.46211716 -0.29131261]] 



In [7]:
b=np.array([[-1], [1.3], [-0.8]])
weights_1 = np.concatenate(
    (
        b,
        np.roll(b,1,axis=0),
        np.roll(b,2,axis=0)
    ), 
    axis=1
)

thetas_1 = np.full((3,),0.1)
print('Weights between hidden layer 1 and hidden layer 2:\n', weights_1)
print('Thetas in hidden layer 2:\n', thetas_1)

y_2=tanh(y_1.numpy().dot(weights_1)+thetas_1)
print('Outputs from hidden layer 2:\n', y_2.numpy(), '\n')

Weights between hidden layer 1 and hidden layer 2:
 [[-1.  -0.8  1.3]
 [ 1.3 -1.  -0.8]
 [-0.8  1.3 -1. ]]
Thetas in hidden layer 2:
 [0.1 0.1 0.1]
Outputs from hidden layer 2:
 [[ 0.56659243 -0.7504016   0.38022725]] 



In [8]:
c=np.array([[-1], [0], [1.1]])
weights_2 = np.concatenate(
    (
        c,
        np.roll(c,1,axis=0),
        np.roll(c,2,axis=0),
        np.roll(c,3,axis=0)
    ), 
    axis=1
)

thetas_2 = np.full((4,),0.1)
print('Weights between hidden layer 2 and output layer:\n', weights_2)
print('Thetas in output layer:\n', thetas_2)

# output=Activation('softmax')(y_2.numpy().dot(weights_2)+thetas_2).numpy()

y_hat=softmax(tf.convert_to_tensor((np.dot(y_2, weights_2)+thetas_2))).numpy()
print('Outputs from output layer:\n', y_hat, '\n')

Weights between hidden layer 2 and output layer:
 [[-1.   1.1  0.  -1. ]
 [ 0.  -1.   1.1  0. ]
 [ 1.1  0.  -1.   1.1]]
Thetas in output layer:
 [0.1 0.1 0.1 0.1]
Outputs from output layer:
 [[0.14432633 0.66121078 0.05013656 0.14432633]] 



### 1.2 Manually calculate the cross entroy loss (completed)

In [9]:
y_train = np.array([[0, 1, 0, 0]])  # feel free to change these values
loss = categorical_crossentropy(y_train, y_hat)
# print(loss)

### 1.3 Manually calculate the new weights and thetas as the loss/error gradient back-propagates through the layers <span style="color:red">(to be completed)</span>

__IMPORTANT__: In this section, you must perform all the calculations in the form of __numpy matrix operations__, or no point will be assigned.

Note here that instead of actually updating the weights and thetas after we've figured out all the deltas (i.e. the amount of changes that we would like to apply to these weights and thetas), we simply store them in a set of separate variables because we still need the original values of them in Sections 2 and 3 below for validation purposes.

In [10]:
lr = 0.1 # learning rate

error_gradient_output_layer = derivative_softmax_ce_loss(y_hat, y_train)

print(error_gradient_output_layer)

[[ 0.14432633 -0.33878922  0.05013656  0.14432633]]


#### 1.3.1 _Weights_ between hidden layer 2 and output layer as well as _Thetas_ in the output layer

In [11]:
delta_weights_output_layer = error_gradient_output_layer.T.dot(y_2).T * lr

weights_2_updated = weights_2 - delta_weights_output_layer

print(weights_2_updated)

[[-1.00817742  1.11919554 -0.0028407  -1.00817742]
 [ 0.01083027 -1.0254228   1.10376226  0.01083027]
 [ 1.09451232  0.01288169 -1.00190633  1.09451232]]


In [12]:
delta_thetas_output_layer = error_gradient_output_layer * lr

thetas_2_updated = thetas_2 - delta_thetas_output_layer

print(thetas_2_updated)

[[0.08556737 0.13387892 0.09498634 0.08556737]]


Aggregate error gradients in order to be ready for hidden layer 2

In [13]:
aggregated_error_gradient_layer_2 = error_gradient_output_layer.dot(weights_2.T * derivative_tanh(y_2).numpy())
print(aggregated_error_gradient_layer_2)

[[-0.44901899  0.17211113  0.22872531]]


#### 1.3.2 _Weights_ between hidden layer 1 and hidden layer 2 as well as _Thetas_ in the hidden layer 2

In [15]:
delta_weights_1 = aggregated_error_gradient_layer_2.T.dot(y_1).T * lr

weights_1_updated = weights_1 - delta_weights_1

print(weights_1_updated)

[[-0.98691951 -0.80501381  1.29333694]
 [ 1.32074994 -1.00795355 -0.81056979]
 [-0.81308049  1.30501381 -0.99333694]]


In [16]:
delta_thetas_1 = aggregated_error_gradient_layer_2 * lr

thetas_1_updated = thetas_1 - delta_thetas_1

print(thetas_1_updated)

[[0.1449019  0.08278889 0.07712747]]


Aggregate error gradients in order to be ready for hidden layer 1

In [17]:
aggregated_error_gradient_layer_1 = aggregated_error_gradient_layer_2.dot(weights_1.T * derivative_tanh(y_1))
print(aggregated_error_gradient_layer_1)

[[ 0.55701915 -0.73832976  0.32417294]]


#### 1.3.3 _Weights_ between input layer and hidden layer 1 as well as _Thetas_ in the hidden layer 1

In [18]:
delta_weights_0 = aggregated_error_gradient_layer_1.T.dot(X_train).T * lr

weights_0_updated = weights_0 - delta_weights_0

print(weights_0_updated)

[[ 0.94986828  0.06644968 -0.02917556]
 [-0.0668423   1.08859957 -1.03890075]
 [-1.03899134  0.05168308 -0.02269211]
 [-0.04456153 -0.94093362  0.97406616]]


In [19]:
delta_thetas_0 = aggregated_error_gradient_layer_1 * lr

thetas_0_updated = thetas_0 - delta_thetas_0

print(thetas_0_updated)

[[0.04429809 0.17383298 0.06758271]]


## 2. Build a model in Keras to verify the calculations above <span style="color:red">(to be completed)</span>

Note here that in order to play a fair game when validating the new weights and thetas, make sure you use the `SGD` optimizer when compiling your ANN mode:

```
model.compile(SGD(learning_rate=0.1, momentum=0.0, nesterov=False), ......)
```

Also, every time before you perform a new round of training of the ANN model, **you need to restart the kernel** such that the model always starts learning from scratch.

In [20]:
# build the model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

model = Sequential()

model.add(Dense(3, input_shape=(4,), activation='tanh'))
model.add(Dense(3, activation='tanh'))
model.add(Dense(4, activation='softmax'))
model.compile(SGD(learning_rate=0.1, momentum=0.0, nesterov=False),
             loss='categorical_crossentropy')

In [21]:
# set all weights and thetas with the ones used in Section 1 above
model.layers[0].set_weights([weights_0, thetas_0])
model.layers[1].set_weights([weights_1, thetas_1])
model.layers[2].set_weights([weights_2, thetas_2])

In [22]:
# train the ANN with the single data instance used in Section 1 above

model.fit(X_train, y_train, epochs=1, verbose=2, batch_size=1)

1/1 - 0s - loss: 0.4137


<tensorflow.python.keras.callbacks.History at 0x17015babe08>

In [23]:
# obtain the new weights and thetas

weights_2_updated_model = model.get_weights()[4]
thetas_2_updated_model = model.get_weights()[5]

weights_1_updated_model = model.get_weights()[2]
thetas_1_updated_model = model.get_weights()[3]

weights_0_updated_model = model.get_weights()[0]
thetas_0_updated_model = model.get_weights()[1]

## 3. Validations and Conclusions

### 3.1 The following is what is expected:

```
Checking weights_2 (weights between hidden layer 2 and output layer):

[[-1.0081774   1.1191956  -0.0028407  -1.0081774 ]
 [ 0.01083027 -1.0254228   1.1037623   0.01083027]
 [ 1.0945123   0.01288169 -1.0019063   1.0945123 ]]
[[-1.00817742  1.11919554 -0.0028407  -1.00817742]
 [ 0.01083027 -1.0254228   1.10376226  0.01083027]
 [ 1.09451232  0.01288169 -1.00190633  1.09451232]]
They match!

----------

Checking thetas_2 (thetas in the output layer):

[0.08556737 0.13387892 0.09498635 0.08556737]
[[0.08556737 0.13387892 0.09498634 0.08556737]]
They match!

----------```

In [24]:
print('Checking weights_2 (weights between hidden layer 2 and output layer):\n')
print(weights_2_updated_model)
print(weights_2_updated)
if (np.isclose(weights_2_updated_model, weights_2_updated).all()):
    print('They match!')
else:
    print('They don\'t match.')
print('\n----------\n')

print('Checking thetas_2 (thetas in the output layer):\n')
print(thetas_2_updated_model)
print(thetas_2_updated)
if (np.isclose(thetas_2_updated_model, thetas_2_updated).all()):
    print('They match!')
else:
    print('They don\'t match.')
print('\n----------\n')

Checking weights_2 (weights between hidden layer 2 and output layer):

[[-1.0081774   1.1191956  -0.0028407  -1.0081774 ]
 [ 0.01083027 -1.0254228   1.1037623   0.01083027]
 [ 1.0945123   0.01288169 -1.0019063   1.0945123 ]]
[[-1.00817742  1.11919554 -0.0028407  -1.00817742]
 [ 0.01083027 -1.0254228   1.10376226  0.01083027]
 [ 1.09451232  0.01288169 -1.00190633  1.09451232]]
They match!

----------

Checking thetas_2 (thetas in the output layer):

[0.08556737 0.13387892 0.09498635 0.08556737]
[[0.08556737 0.13387892 0.09498634 0.08556737]]
They match!

----------



### 3.2 The following is what is expected:
```
Checking weights_1 (weights between hidden layer 1 and hidden layer 2):

[[-0.9869195  -0.80501384  1.2933369 ]
 [ 1.3207499  -1.0079535  -0.8105698 ]
 [-0.8130805   1.3050138  -0.9933369 ]]
[[-0.98691951 -0.80501381  1.29333694]
 [ 1.32074994 -1.00795355 -0.81056979]
 [-0.81308049  1.30501381 -0.99333694]]
They match!

----------

Checking thetas_1 (thetas in the hidden layer 1):

[0.14490189 0.08278889 0.07712747]
tf.Tensor([[0.1449019  0.08278889 0.07712747]], shape=(1, 3), dtype=float64)
They match!

----------```

In [25]:
print('Checking weights_1 (weights between hidden layer 1 and hidden layer 2):\n')
print(weights_1_updated_model)
print(weights_1_updated)
if (np.isclose(weights_1_updated_model, weights_1_updated).all()):
    print('They match!')
else:
    print('They don\'t match.')
print('\n----------\n')

print('Checking thetas_1 (thetas in the hidden layer 2):\n')
print(thetas_1_updated_model)
print(thetas_1_updated)
if (np.isclose(thetas_1_updated_model, thetas_1_updated).all()):
    print('They match!')
else:
    print('They don\'t match.')
print('\n----------\n')

Checking weights_1 (weights between hidden layer 1 and hidden layer 2):

[[-0.9869195  -0.80501384  1.2933369 ]
 [ 1.3207499  -1.0079535  -0.8105698 ]
 [-0.8130805   1.3050138  -0.9933369 ]]
[[-0.98691951 -0.80501381  1.29333694]
 [ 1.32074994 -1.00795355 -0.81056979]
 [-0.81308049  1.30501381 -0.99333694]]
They match!

----------

Checking thetas_1 (thetas in the hidden layer 2):

[0.14490189 0.08278889 0.07712747]
[[0.1449019  0.08278889 0.07712747]]
They match!

----------



### 3.3 The following is what is expected:
```
Checking weights_0 (weights between input layer and hidden layer 1):

[[ 0.94986826  0.06644966 -0.02917555]
 [-0.0668423   1.0885996  -1.0389007 ]
 [-1.0389913   0.05168307 -0.0226921 ]
 [-0.04456153 -0.94093364  0.9740662 ]]
[[ 0.94986828  0.06644968 -0.02917556]
 [-0.0668423   1.08859957 -1.03890075]
 [-1.03899134  0.05168308 -0.02269211]
 [-0.04456153 -0.94093362  0.97406616]]
They match!

----------

Checking thetas_0 (thetas in the input layer):

[0.04429809 0.17383295 0.06758272]
tf.Tensor([[0.04429809 0.17383298 0.06758271]], shape=(1, 3), dtype=float64)
They match!

----------```

In [26]:
print('Checking weights_0 (weights between input layer and hidden layer 1):\n')
print(weights_0_updated_model)
print(weights_0_updated)
if (np.isclose(weights_0_updated_model, weights_0_updated).all()):
    print('They match!')
else:
    print('They don\'t match.')
print('\n----------\n')

print('Checking thetas_0 (thetas in the hidden layer 1):\n')
print(thetas_0_updated_model)
print(thetas_0_updated)
if (np.isclose(thetas_0_updated_model, thetas_0_updated).all()):
    print('They match!')
else:
    print('They don\'t match.')
print('\n----------\n')

Checking weights_0 (weights between input layer and hidden layer 1):

[[ 0.94986826  0.06644966 -0.02917555]
 [-0.0668423   1.0885996  -1.0389007 ]
 [-1.0389913   0.05168307 -0.0226921 ]
 [-0.04456153 -0.94093364  0.9740662 ]]
[[ 0.94986828  0.06644968 -0.02917556]
 [-0.0668423   1.08859957 -1.03890075]
 [-1.03899134  0.05168308 -0.02269211]
 [-0.04456153 -0.94093362  0.97406616]]
They match!

----------

Checking thetas_0 (thetas in the hidden layer 1):

[0.04429809 0.17383295 0.06758272]
[[0.04429809 0.17383298 0.06758271]]
They match!

----------

