# Using PyTorch Tensors to build Neural Networks

In this notebook, 
- you'll get introduced to [PyTorch](http://pytorch.org/), 
  - a framework for building and training neural networks. 
  - PyTorch in a lot of ways behaves like the arrays you love from Numpy. 
     
      *These Numpy arrays, after all, are just tensors.   
  - PyTorch takes these tensors and makes it simple to 
      move them to GPUs for the faster processing needed 
      when training neural networks. 
  - It also provides a module that automatically calculates gradients (for backpropagation!) and another module specifically for building neural networks. 
  
  - *All together, PyTorch ends up being more coherent with Python and the Numpy/Scipy stack compared to TensorFlow and other frameworks.



## Single Layer Neural Networks

**Deep Learning**
- based on artificial neural networks which have been around in some form since the late 1950s. 

  **Neurons**
  - "units" or individual parts approxumating neurons by  
     which networks are built. 
  - each unit has some number of **weighted inputs**. 
      
      These weighted inputs are summed together (a linear 
      combination) then passed through an activation 
      function to get the unit's output.



### Conceptually

**Big Picture**

<img src="https://github.com/lbleal1/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/assets/simple_neuron.png?raw=1" width=400px>



**Mathematically**

We can represent this into two ways. Though, I see these ways as kind of sequential. I learned this one from Andrew Ng's Machine Learning Course in Coursera--that we can represent summation(usually coded as for loops) as tensors. 

*From Summation*
$$
\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\
y &= f\left(\sum_i w_i x_i +b \right)
\end{align}
$$

*to Tensor Language*

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots  x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_1 \\
           w_2 \\
           \vdots \\
           w_n
\end{bmatrix}
$$

$$
\begin{align}
y &= f(h) \\
\end{align}
$$

***Remark: Why do this translation?***                    
Well, it seems that in programming languages, tensor operations are faster than loops and it's kinda neat.

### Programming

#### Tensors

***Remark:***                                                
It turns out neural network computations are just a bunch of linear algebra operations on *tensors*

**Tensors**
- a generalization of matrices. 
  
  **Vector**
   - a 1-dimensional tensor
   
  **Matrix**
   - a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). 
   
***Remark:***     
The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

<img src="https://github.com/lbleal1/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/assets/tensor_examples.svg?raw=1" width=600px>

With the basics covered, it's time to explore how we can use PyTorch to build a simple neural network.

In [0]:
# First, import PyTorch
import torch

#### Building Single Layer Neural Networks

So we need to code the following with the knowledge of tensors:
1. The summation of the weights multiplies to features and the bias
2. The activation function (our only function for now)


***Remark: PyTorch and Numpy***
- PyTorch tensors can be added, multiplied, subtracted, etc, just like Numpy arrays. 
- In general, you'll use PyTorch tensors pretty much the same way you'd use Numpy arrays. 
- They come with some nice benefits though such as GPU acceleration which we'll get to later. 
  
  *For now, use the generated data to calculate the output of this simple single layer network. 



In [0]:
### Generate some data for the features, weights and bias
torch.manual_seed(7) # Set the random seed so things are predictable

features = torch.randn((1, 5)) # Features are 5 random normal variables
weights = torch.randn_like(features) # True weights for our data, random normal variables again
bias = torch.randn((1, 1)) # and a true bias term

In [3]:
print(weights)
print(bias)

tensor([[-0.8948, -0.3556,  1.2324,  0.1382, -1.6822]])
tensor([[0.3177]])


In [0]:
def activation(x):
    """ Sigmoid activation function 
        x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))

In [5]:
# 1. multiply the features and the weights then add the bias
result = torch.matmul(weights, features.t()) + bias

# 2. use activation function to get the output
activation(result)

tensor([[0.1595]])

##### Pytorch Implementation Details

**Why use  `torch.matmul()`?**

`matmul` stands for Matrix Multiplication. As we stated earlier, instead of summation, we can do tensor operations. 

Though this was touched previously, to emphasize and much more focused in this operation -- you'll want to use matrix multiplications since they are more efficient and accelerated using modern libraries and high-performance computing on GPUs.



##### Shape Problems

Noticed that we use `features.t()` above which means we transposed it so that the matrix multiplication would work but there's a more flexible way of doing this. 



**Note:** To see the shape of a tensor called `tensor`, use `tensor.shape`. If you're building neural networks, you'll be using this method often.

There are a few options here: [`weights.reshape()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape), [`weights.resize_()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.resize_), and [`weights.view()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view).

* `weights.reshape(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)` sometimes, and sometimes a clone, as in it copies the data to another part of memory.
* `weights.resize_(a, b)` returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory. Here I should note that the underscore at the end of the method denotes that this method is performed **in-place**. Here is a great forum thread to [read more about in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.
* `weights.view(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)`.

Matt recommends using `.view()`, but any of the three methods will work for this. So, now we can reshape `weights` to have five rows and one column with something like `weights.view(5, 1)`.

So applying these, we modify our solution:

In [6]:
# 1. multiply the features and the weights then add the bias
result = torch.matmul(weights, features.view(5,1)) + bias

# 2. use activation function to get the output
activation(result)

tensor([[0.1595]])

## Extending to Multi-Layer Neural Networks



#### Conceptually

***Remark:***                                             
That's how you can calculate the output for a single neuron. The real power of this algorithm happens with multi-layer neural networks.

**Big Picture**
<img src='https://github.com/lbleal1/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/assets/multilayer_diagram_weights.png?raw=1' width=450px>

***Remark:***                                             This is read from bottom to top.

*Parts*
1. **input layer**                                        
first layer containing the inputs/"features" shown on the bottom

2. **hidden layer**                                         
the middle layer, that's why the notation also uses `${h_1}$ and ${h_2}

3. **output layer**                                      
the last layer 


*Computation Note*

When you start stacking these individual units into layers and stacks of layers, into a network of neurons, the output of one layer of neurons becomes the input for the next layer. 

**Mathematically**

*Hidden Layers*
$$
\vec{h} = [h_1 \, h_2] = 
\begin{bmatrix}
x_1 \, x_2 \cdots \, x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_{11} & w_{12} \\
           w_{21} &w_{22} \\
           \vdots &\vdots \\
           w_{n1} &w_{n2}
\end{bmatrix}
$$

The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply

$$
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$$

***Remark:***
Note that $\mathbf{W_1}$ are the first set of weights between the input and the h's while $\mathbf{W_2}$ are the second set of weights between the h's and the output layer.

### Coding

In [0]:
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 3 random normal variables
features = torch.randn((1, 3)) 

# Define the size of each layer in our network
 
# Number of input units, must match number of input features 
# features.shape returns a list [1,3]  
# so we can get the number of input unites with the second element thus
n_input = features.shape[1]    

n_hidden = 2                    # Number of hidden units 
n_output = 1                    # Number of output units

# Weights for inputs to hidden layer
W1 = torch.randn(n_input, n_hidden) # size = [3,2]
# Weights for hidden layer to output layer
W2 = torch.randn(n_hidden, n_output) # size = [2. 1]

# and bias terms for hidden and output layers
B1 = torch.randn((1, n_hidden))
B2 = torch.randn((1, n_output))

In [11]:
h = activation(torch.mm(features, W1) + B1)
y =  activation(torch.mm(h,W2) + B2)
y



tensor([[0.3171]])

***Remark:***

The number of hidden units a parameter of the network, often called a **hyperparameter** to differentiate it from the weights and biases parameters. As you'll see later when we discuss training a neural network, the more hidden units a network has, and the more layers, the better able it is to learn from data and make accurate predictions.

## BONUS: Numpy to Torch and back

Special bonus section! PyTorch has a great feature for converting between Numpy arrays and Torch tensors. To create a tensor from a Numpy array, use `torch.from_numpy()`. To convert a tensor to a Numpy array, use the `.numpy()` method.

In [13]:
import numpy as np
a = np.random.rand(4,3)
a

array([[0.54603373, 0.37691907, 0.74092096],
       [0.06198962, 0.01454321, 0.10520602],
       [0.85584354, 0.5030598 , 0.09519593],
       [0.22553444, 0.78939382, 0.45312411]])

In [14]:
b = torch.from_numpy(a)
b

tensor([[0.5460, 0.3769, 0.7409],
        [0.0620, 0.0145, 0.1052],
        [0.8558, 0.5031, 0.0952],
        [0.2255, 0.7894, 0.4531]], dtype=torch.float64)

In [15]:
b.numpy()

array([[0.54603373, 0.37691907, 0.74092096],
       [0.06198962, 0.01454321, 0.10520602],
       [0.85584354, 0.5030598 , 0.09519593],
       [0.22553444, 0.78939382, 0.45312411]])

The memory is shared between the Numpy array and Torch tensor, so if you change the values in-place of one object, the other will change as well.

In [16]:
# Multiply PyTorch Tensor by 2, in place
b.mul_(2)

tensor([[1.0921, 0.7538, 1.4818],
        [0.1240, 0.0291, 0.2104],
        [1.7117, 1.0061, 0.1904],
        [0.4511, 1.5788, 0.9062]], dtype=torch.float64)

In [17]:
# Numpy array matches new values from Tensor
a

array([[1.09206746, 0.75383813, 1.48184193],
       [0.12397924, 0.02908641, 0.21041205],
       [1.71168707, 1.00611959, 0.19039186],
       [0.45106888, 1.57878765, 0.90624823]])