# PYTORCH WALKTHROUGH


Pytorch is a powerful framework for training neural networks, is produced by facebook and designed to be optimised with GPUs (although not necessary). 

## Theory - Neural Networks


Neural Networks are called such as they are analogs of real biological systems which comprise the learning elements of our brains. In terms of computing, they have some similarities and emulate core features (propogation). The inputs have weights applied to them, with an additiona bias (not shown in diagram) summed up then passed through the activation function.


![image](https://upload.wikimedia.org/wikipedia/commons/6/60/ArtificialNeuronModel_english.png)  

Mathematically this looks like: 

$$
\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\  
y &= f\left(\sum_i w_i x_i +b \right)
\end{align}
$$

With vectors this is the dot/inner product of two vectors:

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots  x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_1 \\
           w_2 \\
           \vdots \\
           w_n
\end{bmatrix}
$$

# KEY COMMANDS

`torch.sum(features*weights) + bias)`  
`torch.exp(-x)`

## RESHAPING

There are a few options here:   


[`weights.reshape()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape)  
[`weights.resize_()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.resize_)  
[`weights.view()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view).        **# preferred**

## MATRIX MULTIPLICATION  

`torch.mm()`  **preferred**  
`torch.matmul()`

In [18]:
# First, import PyTorch
import torch

In [19]:
def activation(x):
    """ Sigmoid activation function 
    
        Arguments
        ---------
        x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))

In [20]:
torch.manual_seed(7)                  ### Generate some data
features = torch.randn((1, 5))        # Features are 5 random normal numbers
weights = torch.randn_like(features)  # Trandn_like = copy shape of features
bias = torch.randn((1, 1))            # random normal value 

# WITHOUT MATRIX MULTIPLICATION

### Summing up features, weights and bias into activation function

In [21]:
## Calculate the output of this network using the weights and bias tensors
y1 = activation((features*weights).sum() + bias) # sums up two arrays

### TORCH METHOD

In [22]:
y = activation(torch.sum(features*weights) + bias)

In [23]:
print(y)
print(y1)

tensor([[0.1595]])
tensor([[0.1595]])


# WITH MATRIX MULTIPLICATION

Features usually have many rows:
Weights usually have many columns: 

value : **features([[-0.8948, -0.3556,  1.2324,  0.1382, -1.6822]])**   

&nbsp;  **weights([[-0.8948],  
&nbsp; &nbsp;  &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp;  [-0.3556],  
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp;     [ 1.2324],  
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp;     [ 0.1382],  
&nbsp; &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp;    [-1.6822]])**

In [24]:
#torch.mm(features,weights) # FAILS DUE TO SHAPE m1: [1 x 5], m2: [1 x 5]
weights_tranposed = weights.view(5,1) # Reshape and fix

In [27]:
weights_tranposed.shape

torch.Size([5, 1])

# Apply activation function

In [28]:
y = activation(torch.mm(features, weights_tranposed)+ bias)
print(y)

tensor([[0.1595]])


# FULL TORCH CODE

In [1]:
# First, import PyTorch
import torch
def activation(x):
    """ Sigmoid activation function 
    
        Arguments
        ---------
        x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))

torch.manual_seed(7)                  ### Generate some data
features = torch.randn((1, 5))        # Features are 5 random normal numbers
weights = torch.randn_like(features)  # Trandn_like = copy shape of features
bias = torch.randn((1, 1))            # random normal value 

print('shape of features: ' + str(features.shape))
print('shape of weights: ' + str(weights.shape))

weights_tranposed = weights.view(5,1) # Reshape and fix
print('shape of weights is now: ' + str(weights_tranposed.shape))
y = activation(torch.mm(features, weights_tranposed)+ bias)
print(y)

shape of features: torch.Size([1, 5])
shape of weights: torch.Size([1, 5])
shape of weights is now: torch.Size([5, 1])
tensor([[0.1595]])


# STACKING MORE DIMENSIONS

The above just took the multiplication of weights on inputs as output and applied activation function.
But we can add another set of weights w2 to give a second output, thus having two neurons 

### SINGLE NEURON 

` output = [sum(Inputs * weights) + bias]    ----> activation function`




### ADD MORE NEURONS

What has been done so far whilst awesome, is just one neuron, *tip: no of neurons = no of activations* . To increase learning efficiency, rate and power we want to add more neurons. This means expressing our weights as a matrix, and adding more. So we have something (excluding bias and activation)that looks like: 

` inputs * weights1 = hidden layer `  
` hiddenlayer * weights2 = output` 

The reason we have to express weights as a matrix, is because we have inputs that are **one row** and **three columns**. This means our first weights should be **three rows** and **two columns**. Each column being an output hidden node. Since we now have **2 hidden nodes**, we run the process again with weights that are **two rows** and **one column** the output final node.

#### EACH WEIGHT HAS THE ROWS WHICH ARE EQUAL TO THE NUMBER OF INCOMING LAYER

### THREE NEURONS  SUDO CODE

**All values random normal populated**

`inputs_shape = [1,3]`  
`w1_shape =  [3,2]`  
`hidden_layer = activate[matrixmultiply[inputs,w1] + bias1]`   
`w2_shape = [2,1]`  
`ouptut = activate[matrixmultiply[h1,w2] + bias2]` 






![md](multilayer_diagram_weights.png)

The first layer shown on the bottom here are the inputs, understandably called the **input layer**. The middle layer is called the **hidden layer**, and the final layer (on the right) is the **output layer**. We can express this network mathematically with matrices again and use matrix multiplication to get linear combinations for each unit in one operation. For example, the hidden layer ($h_1$ and $h_2$ here) can be calculated 

$$
\vec{h} = [h_1 \, h_2] = 
\begin{bmatrix}
x_1 \, x_2 \cdots \, x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_{11} & w_{12} \\
           w_{21} &w_{22} \\
           \vdots &\vdots \\
           w_{n1} &w_{n2}
\end{bmatrix}
$$

The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply

$$
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$$

In [11]:
import torch
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 3 random normal variables
features = torch.randn((1, 3))
no_of_features_columns = features.shape[1]


# Define the size of each layer in our network
number_inputs = no_of_features_columns     
number_hidden = 2                          
number_outputs = 1                        

# Weights for inputs to hidden layer
W1 = torch.randn(number_inputs, number_hidden) # must match number of inputs to desired hidden

W2 = torch.randn(number_hidden, number_outputs) # must mat

# and bias terms for hidden and output layers
B1 = torch.randn((1, number_hidden))
B2 = torch.randn((1, number_outputs))

print("Shape of Features : " + str(features.shape))
print(str(features))
print("Shape of Weights 1 : " + str(W1.shape))
print("Shape of weights 2: " + str(W2.shape))

hidden_layer = activation(torch.mm(features,W1) + B1)
output = activation(torch.mm(hidden_layer, W2) + B2)



print('Two hidden layers H1 & H2 are : ' + str(hidden_layer))
print('The final value: ' + str(output))

Shape of Features : torch.Size([1, 3])
tensor([[-0.1468,  0.7861,  0.9468]])
Shape of Weights 1 : torch.Size([3, 2])
Shape of weights 2: torch.Size([2, 1])
Two hidden layers H1 & H2 are : tensor([[0.6813, 0.4355]])
The final value: tensor([[0.3171]])
