# Explanation on DLND first-neural-network issues related to transpose

### Author: Nahua Kang
**Source**: This file was downloaded from the repository of Nahua Kang [click here](https://github.com/nahuakang/deep-learning/blob/9e38ec1353a129d508202c1db7596964c0ec6de7/DLND-Projects/first-neural-network/DLND%20first-neural-network%20Notes%20on%20Transpose.ipynb) adding it to this repo to find it easier in the future, all credits go to Nahua Kang

Since many of us, including myself, have had mistakes with the mismatching matrices' dimensions when calculating delta_weights_i_h and delta_weights_h_o, I decide to make a debugging process full of print() to explain why we made our mistakes and hopefully the insights will help us in the future when we design neural networks with multiple output elements.

The notebook is separated in 4 sections:
1. Data Preparation (feel free to skip)
2. Explanation on issues related to transpose in train method of NeuralNetwork class
3. Why we must use x[:, None] instead of x.T for transpose in Project 1?
4. Minor detail in the update of delta_weights for hidden and output layers

## 1. Data Preparation

This part is literally the same as in our project 1 jupyter notebook. Feel free to skip to the next subsection.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
data_path = 'Bike-Sharing-Dataset/hour.csv'

rides = pd.read_csv(data_path)

In [9]:
dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']
for each in dummy_fields:
    dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)
    rides = pd.concat([rides, dummies], axis=1)

fields_to_drop = ['instant', 'dteday', 'season', 'weathersit', 
                  'weekday', 'atemp', 'mnth', 'workingday', 'hr']
data = rides.drop(fields_to_drop, axis=1)

In [6]:
quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']
# Store scalings in a dictionary so we can convert back later
scaled_features = {}
for each in quant_features:
    mean, std = data[each].mean(), data[each].std()
    scaled_features[each] = [mean, std]
    data.loc[:, each] = (data[each] - mean)/std

In [7]:
# Save data for approximately the last 21 days 
test_data = data[-21*24:]

# Now remove the test data from the data set 
data = data[:-21*24]

# Separate the data into features and targets
target_fields = ['cnt', 'casual', 'registered']
features, targets = data.drop(target_fields, axis=1), data[target_fields]
test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fields]

In [8]:
# Hold out the last 60 days or so of the remaining data as a validation set
train_features, train_targets = features[:-60*24], targets[:-60*24]
val_features, val_targets = features[-60*24:], targets[-60*24:]

## 2. Explanation of issues related to transpose

Specifically, I will be running the train method under NeuralNetwork class for one sample of data. Therefore, I will not write down the complete class but simply the one-time for loop inside the train method.

In [10]:
# Writing the hidden layer activation function
sigmoid = lambda x: 1/(1 + np.exp(-x))

In [11]:
# Assigning the number to input_nodes, hidden_nodes, and output_nodes
# in accordance with the dataset features, the desired hidden node number and the desired output number
input_nodes, hidden_nodes = train_features.shape[1], 50
output_nodes = 1

In [14]:
# Declare the weights and delta_weights accordingly, just like in Project 1
weights_input_to_hidden = np.random.normal(0.0, input_nodes**-0.5, (input_nodes, hidden_nodes));
weights_hidden_to_output = np.random.normal(0.0, hidden_nodes**-0.5, (hidden_nodes, output_nodes));
delta_weights_i_h = np.zeros(weights_input_to_hidden.shape)
delta_weights_h_o = np.zeros(weights_hidden_to_output.shape)

# Check the dimensions of the weights and delta_weights in both the hidden and output layers
print("Dims of weights for hidden and output layers:" , weights_input_to_hidden.shape, weights_hidden_to_output.shape)
print("Dims of deltas for hidden and output layers:", delta_weights_i_h.shape, delta_weights_h_o.shape)

Dims of weights for hidden and output layers: (56, 50) (50, 1)
Dims of deltas for hidden and output layers: (56, 50) (50, 1)


For the part below, since the debugging prints are messy in the for-loop, please read through the printing in the output and refer back to the code. My code is rather verbose but I hope you can understand it.

In [24]:
# Running the for-loop under NeuralNetwork class' train method once to demonstrate the dimension changes:
for X, y in zip(train_features.values[:1], train_targets["cnt"].values[:1]):
    print("Dim means dimension. Dot means dot product.")
    
    print()
    print("X should be dim (1, 56), so basically 1 row with 56 features...")
    print("But in Numpy, X is actually of dim", X.shape)
    print("By matrix dot product, matrix A of (n, p) dot product matrix B of (p, m) results in matrix C of (n, m)")
    print("Therefore, X (1, 56) dot weights (56, 50) result should be in dim (1, 50)")
    hidden_inputs = np.dot(X, weights_input_to_hidden)
    print("But the result in Numpy is actually of dim", hidden_inputs.shape)
    hidden_outputs = sigmoid(hidden_inputs)
    
    print()
    final_inputs = np.dot(hidden_outputs, weights_hidden_to_output)
    final_outputs = final_inputs # Assuming activation function for output layer is f(x) = x
    print("hidden_outputs (1, 50) dot weights (50, 1) should result in an output of (1, 1)...")
    print("But the result in Numpy is actually of dim", final_outputs.shape)
    
    print()
    output_error = y - final_outputs
    output_error_term = output_error * 1 # Since the derivative of output activation f(x) = dx/dx = 1
    print("The dim of output_error_term in Numpy should be (1, 1), but is in fact:", output_error_term.shape)
    hidden_error = np.dot(weights_hidden_to_output, output_error_term)
    hidden_error_term = hidden_error * hidden_outputs * (1 - hidden_outputs)
    print("The dim of weights_hidden_to_output is", weights_hidden_to_output.shape)
    print("The dim of hidden_error_term in Numpy should be (50, 1), but is in fact:", hidden_error_term.shape)
    
    print()
    print("Remember again that the dims of delta_weights_i_h, delta_weights_h_o are:", delta_weights_i_h.shape, delta_weights_h_o.shape)
    print("Remember again that the dims of inputs X and final_outputs should be (1, 56), (1, 1)")
    print("So delta_weights_i_h's dim should be the same as the dot product between some form of X and hidden_error_term")
    print("So delta_weights_h_o's dim should be the same as the dot product between some form of hidden_outputs and output_error_term")
    
    print()
    print("(X' means X transpose) X' of dim(56, 1) dot hidden_error_term of dim(1, 50) would result in dim(56, 50)")
    print("This dim(56, 50) is aligned with delta_weights_i_h's dimension.")
    print("(hidden_outputs' means hidden_outputs transpose) The dim of hidden_outputs is (1, 50)")
    print("The same applies to delta_weights_h_o: hidden_outputs' of dim(50, 1) dot output_error_term of dim(1, 1) would result in dim(50, 1)")
    print("This dim(50, 1) is aligned with delta_weights_h_o's dimension.")
    
    print()
    print("Since in Numpy output_error_term does not have dim(1, 1) but", output_error_term.shape)
    print("Since in Numpy hidden_error_term does not have dim(50, 1) but", hidden_error_term.shape)
    print("We can instead perform hidden_error_term * X' or more precisely 'hidden_error_term * X[:, None]'")
    delta_weights_i_h += hidden_error_term * X[:, None]
    print("Because hidden_error_term * X[:, None] has dim", (hidden_error_term * X[:, None]).shape, ", just what we wanted from above")
    delta_weights_h_o += output_error_term * hidden_outputs[:, None]
    print("The same applies to delta_weights_h_o. Perform output_error_term * hidden_outputs[:, None] achieves the same")
    print("Because output_error_term * hidden_outputs[:, None] has dim", (output_error_term * hidden_outputs[:, None]).shape, ", just what we wanted from above")
    
    print()
    print("SO WHY CAN'T WE USE THE METHOD X.T FOR TRANSPOSE BUT MUST USE X[:, None]?")

Dim means dimension. Dot means dot product.

X should be dim (1, 56), so basically 1 row with 56 features...
But in Numpy, X is actually of dim (56,)
By matrix dot product, matrix A of (n, p) dot product matrix B of (p, m) results in matrix C of (n, m)
Therefore, X (1, 56) dot weights (56, 50) result should be in dim (1, 50)
But the result in Numpy is actually of dim (50,)

hidden_outputs (1, 50) dot weights (50, 1) should result in an output of (1, 1)...
But the result in Numpy is actually of dim (1,)

The dim of output_error_term in Numpy should be (1, 1), but is in fact: (1,)
The dim of weights_hidden_to_output is (50, 1)
The dim of hidden_error_term in Numpy should be (50, 1), but is in fact: (50,)

Remember again that the dims of delta_weights_i_h, delta_weights_h_o are: (56, 50) (50, 1)
Remember again that the dims of inputs X and final_outputs should be (1, 56), (1, 1)
So delta_weights_i_h's dim should be the same as the dot product between some form of X and hidden_error_term
S

## 3. Why we must use X[:, None] instead of X.T?

This is because in Numpy, when we have 3 inputs from the dataset, it does not come in the dimension of (1, 3) like how we describe the matrices of the neural network on paper. Instead, the inputs come in the Numpy shape of (3,). Look at the numpy array x as an example:

In [26]:
# An example
x = np.array([1, 2, 3])
# On paper, x should have dim(1, 3), but in Numpy it has dim(3,)
x.shape

(3,)

Now, if we want to take the transpose of x using the .T method, we would assume that the resulting x' has dimension (3, 1) as opposed to (1, 3):

In [29]:
x.T.shape

(3,)

Ah! Despite using the transpose method, x.T still has dimension (3,). This is not the best transpose method to use if we need to conduct * or np.dot on two matrices, like the ones we did in the train method of NeuralNetwork class!

Using x[:, None], however, will solve this issue! This is why we prefer using x[:, None] for this course, I guess:

In [30]:
x[:, None].shape

(3, 1)

## 4. Minor detail in the update of delta_weights in hidden and output layers

You might also wonder why we used:
    delta_weights_i_h += hidden_error_term \* X[:, None]
    delta_weights_h_o += output_error_term \* hidden_outputs[:, None]
for updating the delta_weights. There's no np.dot() involved here! Why?

Well, since the dimensions of hidden_error_term and output_error_term are not appropriate for conducting np.dot(), using the simple matrices multiplication with asterisk \* will perform the same job for our implementation. See the example below for what the asterisk \* does.

In [31]:
np.array([1, 2, 3]) * np.array([1, 2, 3])[:, None]

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

It is the same as the one below, in which we specifically designed it so the first array has dimension (3, 1) and the second array has dimension (1, 3) instead of (3,). This is how we would update the delta_weights on paper, but not how we would perform in Python.

In [35]:
np.dot(np.array([1,2,3])[:, None], np.array([[1,2,3]]))

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

I hope this notebook has helped you! Feel free to give me feedback in Slack chat or at kangnahua(at)gmail.com! Best luck to your learning :)