# Pytorch Test Youtube Building Models
Notebook for following along with Pytorch model building, using [Pytorch](https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html) website tutorial. This notebook will be similar to the previous Pytorch Test notebooks, as the [youtube content](https://www.youtube.com/watch?v=OSqIP-mOWOI) covers similiar works.

### Choices for data

<br>

### Libaries and Modules
Importing the necessary libaries and modules for the notebook.

In [1]:
#Import cell
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math
import numpy as np
import pandas as pd
import pickle as pk
import torch
import torchvision

import torch.functional as F

print("Imports complete")

Imports complete


<br>

### Importing data sets
Importing the data for the models.

<b>Import sample data set and corresponding time/geo data</b>

In [2]:
#Importing data sets

print("Data sets successfully imported.")

Data sets successfully imported.


In [3]:
#Setting seed value
torch.manual_seed(1247)

<torch._C.Generator at 0x22a9d0be1b0>

<br>

### Class Definitions
<b>Classes:</b><br>
<ul>
<li>TinyModel - 2 linear lays going from 100 -> 200 -> 10 with a linear activation function
<li>LeNet - image recognition model with 2 convolution layers
<li>LSTMTagger - long short-term memory model for word processing using recurrent layers
</ul>

In [4]:
#Class definition cell

class TinyModel(torch.nn.Module):
    def __init__(self):
        super(TinyModel, self).__init__()
        
        self.linear1 = torch.nn.Linear(100, 200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.softmax = torch.nn.Softmax()
        return None
    
    def forward(self, x):
        x = self.lienar1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x

    
class LeNet(torch.nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        #1 input image channel (black/white), 6 outputs channels, 5x5 square
        self.conv1 = torch.nn.Conv2d(1, 6, 5) 
        self.conv2 = torch.nn.Conv2d(6, 16, 3)
        self.fc1 = torch.nn.Linear(16*6*6, 120)
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)
        return None
    
    def forwad(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(selv.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] #all dimensions except batch
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
    
    
class LSTMTagger(torch.nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim
        self.word_embeddings = torch.nn.Embedding(vocab_size, embedding_dim)
        self.lstm = torch.nn.LSTM(embedding_dim, hidden_dim)
        self.hidden2tag = torch.nn.Linear(hidden_dim, tagset_size)
        return None

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstem(embeds.view(len(sentence), 1 , -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores
    
    
    
print("Classes defined.")

Classes defined.


<br>

### Calculation functions
<b>Functions:</b><br>
<ul>
<li>
</ul>

In [5]:
#Calculation functions cell


print("Calculation functions defined.")

Calculation functions defined.


<br>

### Plotting functions
<b>Functions:</b>
<ul>
<li> 
</ul>

In [6]:
#Plotting functions Cell


print("Plotting functions defined.")

Plotting functions defined.


<br>

### Main code
Initialising an instance of a `torch.nn.Module` subclass.

In [7]:
tinymodel = TinyModel()
print(f"The model:{tinymodel}")
print(f"\nJust one layer:{tinymodel.linear2}")
print(f"\nModel params:")
for param in tinymodel.parameters():
    print(param)

print("\nLayer params:")
for param in tinymodel.linear2.parameters():
    print(param)

The model:TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)

Just one layer:Linear(in_features=200, out_features=10, bias=True)

Model params:
Parameter containing:
tensor([[ 0.0951, -0.0762, -0.0486,  ..., -0.0990,  0.0712,  0.0750],
        [-0.0887,  0.0800, -0.0901,  ..., -0.0717,  0.0428,  0.0932],
        [-0.0406,  0.0233,  0.0484,  ..., -0.0287,  0.0457,  0.0862],
        ...,
        [ 0.0942, -0.0757, -0.0316,  ...,  0.0772,  0.0359, -0.0166],
        [-0.0064,  0.0526,  0.0046,  ...,  0.0919,  0.0238,  0.0153],
        [ 0.0382, -0.0931, -0.0061,  ...,  0.0824,  0.0752,  0.0088]],
       requires_grad=True)
Parameter containing:
tensor([-0.0773,  0.0962,  0.0766, -0.0870,  0.0812,  0.0033, -0.0650, -0.0869,
         0.0829, -0.0573, -0.0590,  0.0306, -0.0266, -0.0673,  0.0084,  0.0159,
         0.0078, -0.0771,  0.0153,  0.0554, 

#### Common Layer Types
<ul>
<li>Linear layers - also known as a fully connected layeer. this layer has every input influence every output based on the respective weights. For an m input and n output model, the weights matrix is m x n.</li>
<li>Convolutional layers - these are built to handle data with a high degree of spatial correlation, such as computer vision and NLP applications, where a word's immediate context can affect the meaning of a sentance.</li>
<li>Recurrent Layers - recurrent neural networks (RNNs) are used for sequential data such as time-series measurements. They work by maintaining a hiddenstate that acts as a sort of memory for what it has seen in the sequence so far.</li>
<li>Transformers - these are a more complex type of layer that can be implemented by using PyTorch's Transformer layer.
</ul>

In [8]:
#Linear layers
lin = torch.nn.Linear(3, 2)
x = torch.rand(1, 3)
print(f"Input: {x}")

print("\nWeight and Bias parameters:")
for param in lin.parameters():
    print(param)
    
y = lin(x)
print(f"\nOutput: {y}")

Input: tensor([[0.5971, 0.1120, 0.2886]])

Weight and Bias parameters:
Parameter containing:
tensor([[ 0.0505, -0.4112, -0.0971],
        [-0.2712, -0.3064,  0.0698]], requires_grad=True)
Parameter containing:
tensor([-0.5107, -0.3357], requires_grad=True)

Output: tensor([[-0.5546, -0.5118]], grad_fn=<AddmmBackward0>)


In [9]:
#Convolution layers
convModel = LeNet()

print("Model parameters:")
for param in convModel.parameters():
    print(param)

Model parameters:
Parameter containing:
tensor([[[[ 0.0891, -0.1954, -0.0355,  0.1595, -0.0345],
          [ 0.1008, -0.1689,  0.1628,  0.1368, -0.1587],
          [-0.0903,  0.1025, -0.1409,  0.0107,  0.1492],
          [ 0.0435,  0.0455, -0.1815,  0.1056,  0.1978],
          [-0.1340, -0.1309,  0.1040,  0.0075,  0.1682]]],


        [[[-0.1464,  0.0072,  0.0358, -0.0202, -0.1781],
          [-0.0221, -0.0982, -0.0450, -0.0877,  0.0251],
          [-0.0129,  0.1344,  0.0515, -0.1935,  0.1371],
          [-0.0007, -0.1996, -0.1738, -0.1553, -0.0743],
          [ 0.0329,  0.1731, -0.1686,  0.0893,  0.0125]]],


        [[[-0.0741, -0.0294, -0.0395,  0.1419,  0.0582],
          [ 0.1275, -0.1120, -0.0611, -0.1778,  0.1995],
          [-0.0712, -0.1980, -0.0409,  0.0053, -0.0466],
          [-0.1475, -0.1932,  0.0222,  0.0900, -0.1568],
          [ 0.0362, -0.0657,  0.0073,  0.1162, -0.0804]]],


        [[[-0.1658,  0.0629,  0.0057, -0.1536, -0.1726],
          [ 0.0437, -0.0520, -0.0597

Parameter containing:
tensor([ 0.0394,  0.0917,  0.0719,  0.0327,  0.0745,  0.0202, -0.0327, -0.0780,
        -0.0162,  0.0328], requires_grad=True)


In [10]:
#Recurrent Layers
wordModel = LSTMTagger(3, 3, 2, 3)

print("Model parameters:")
for param in wordModel.parameters():
    print(param)

Model parameters:
Parameter containing:
tensor([[-0.1040,  0.2637,  1.3784],
        [ 0.4779, -0.7009, -0.5634]], requires_grad=True)
Parameter containing:
tensor([[-0.5592,  0.0213, -0.4020],
        [ 0.2151,  0.2858,  0.1368],
        [ 0.4651,  0.0345, -0.1933],
        [ 0.1373,  0.5442, -0.3136],
        [ 0.5513,  0.3759,  0.1699],
        [ 0.3860,  0.1759,  0.3351],
        [ 0.5257, -0.2390,  0.5155],
        [-0.3143,  0.3466,  0.2076],
        [ 0.1044,  0.1527, -0.4123],
        [-0.5165, -0.2330,  0.1988],
        [ 0.4263,  0.4991,  0.2078],
        [-0.4073,  0.0891, -0.5220]], requires_grad=True)
Parameter containing:
tensor([[-1.1519e-01,  4.5169e-01, -3.0392e-01],
        [ 3.0917e-01, -3.6760e-01,  2.4568e-01],
        [-8.0687e-02, -2.3125e-01, -3.9486e-01],
        [ 5.1166e-01, -4.8438e-01, -1.5461e-01],
        [ 1.3329e-01, -3.7928e-01, -5.1830e-01],
        [ 4.9181e-01,  3.6203e-01, -5.0394e-01],
        [ 4.3627e-01,  5.6900e-01, -2.4273e-01],
        [-1.3

#### Other Layers and Functions
<b>Data Manipulation Layers</b><br>
There are other layer types that perform important functions in models, but don't participate in the learning process themselves.

In [11]:
#Max pooling - reduce a tensor by combining cells and assigns the max value
my_tensor = torch.rand(1, 6, 6)
print(my_tensor)

maxpool_layer = torch.nn.MaxPool2d(3)
print(maxpool_layer(my_tensor))
print("\nThe output will be the max value of each quadrant of the 6x6 input.")

tensor([[[0.9170, 0.3990, 0.1040, 0.1591, 0.7577, 0.3983],
         [0.9367, 0.6180, 0.2156, 0.9999, 0.8917, 0.2504],
         [0.8671, 0.7930, 0.2897, 0.8062, 0.3821, 0.9943],
         [0.6684, 0.0039, 0.2674, 0.7655, 0.3312, 0.5704],
         [0.4121, 0.5864, 0.7071, 0.8731, 0.3085, 0.2736],
         [0.4149, 0.2735, 0.7143, 0.8971, 0.8670, 0.7226]]])
tensor([[[0.9367, 0.9999],
         [0.7143, 0.8971]]])

The output will be the max value of each quadrant of the 6x6 input.


<b>Normalization layers</b><br>
Re-center and normalize the output of one layer before feeding it to another. This allows for higher learning rates without exploding/vanishing gradients, especially as activation functions tend to work best around 0.

In [12]:
my_tensor = torch.rand(1, 4, 4)*20 + 5
print(my_tensor)
print(my_tensor.mean())

norm_layer = torch.nn.BatchNorm1d(4)
normed_tensor = norm_layer(my_tensor)
print(normed_tensor)

print(normed_tensor.mean())

tensor([[[12.1837, 11.4260, 17.5870, 16.6960],
         [ 7.9450,  5.4306, 21.0291, 19.5452],
         [15.2861, 15.0910, 11.7967, 17.2329],
         [10.0695,  6.8099, 10.3096, 21.2993]]])
tensor(13.7336)
tensor([[[-0.8479, -1.1285,  1.1532,  0.8232],
         [-0.8059, -1.1715,  1.0966,  0.8808],
         [ 0.2225,  0.1226, -1.5646,  1.2196],
         [-0.3748, -0.9701, -0.3310,  1.6760]]],
       grad_fn=<NativeBatchNormBackward0>)
tensor(0., grad_fn=<MeanBackward0>)


<b>Dropout layers</b><br>
These are a tool for encouraging sparce representations in your models - pushing it to do inference with less data.

In [13]:
my_tensor = torch.rand(1, 4, 4)
dropout = torch.nn.Dropout(p=0.4) #masks/reduces input dataset
print(dropout(my_tensor))
print(dropout(my_tensor))

tensor([[[0.0000, 0.9826, 0.0000, 0.0000],
         [1.1151, 1.1499, 0.0000, 0.0000],
         [0.9037, 0.0000, 0.9600, 0.0000],
         [0.2011, 0.9801, 0.0000, 0.0000]]])
tensor([[[0.7842, 0.9826, 0.0000, 0.6618],
         [0.0000, 0.0000, 0.0000, 0.6640],
         [0.0000, 0.0000, 0.0000, 0.0412],
         [0.2011, 0.9801, 1.0548, 1.3240]]])


<b>Activation Functions</b><br>
These are used to introduce deep learning with non-linear activation function, otherwise the entire network could be reduced to a single matrix multiplication.

<b>Loss Functions</b><br>
These tell us how far away the model's prediction is from the correct answer. Pytorch offers a variety of functions, such as MSE (mean squared error = L2 norm), Cross Entropy Loss and Negative Likelihood Loss (useful for classifiers), and others.

<br>