# TOC

__Chapter 4 - Fundamentals of machine learning__

1. [Import](#Import)
1. [Supervised learning](#Supervised-learning)
1. [Unsupervised learning](#Unsupervised-learning)
1. [Reinforcement learning](#Reinforcement-learning)
1. [Overfitting](#Overfitting)
    1. [Reducing the size of the network](#Reducing-the-size-of-the-network)
    1. [Applying weight regularization](#Applying-weight-regularization)
    1. [Dropout](#Dropout)
1. [Learning rate picking strategies](#Learning-rate-picking-strategies)

# Import

<a id = 'Import'></a>

In [1]:
# standard libary and settings
import os
import sys
import importlib
import itertools
from PIL import Image
from glob import glob
import warnings

warnings.simplefilter("ignore")
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))

# data extensions and settings
import numpy as np

np.set_printoptions(threshold=np.inf, suppress=True)
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.options.display.float_format = "{:,.6f}".format

# pytorch tools
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torch.autograd import Variable
from torchvision import datasets, models, transforms

# visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

import mlmachine as mlm
from prettierplot.plotter import PrettierPlot
import prettierplot.style as style

# Supervised learning

PyTorch can be applied to supervised learning problems just like popular machine learning libraries such as scikit-learn. Here is a brief list of problem types and generic examples:

-__Classification__ - classifying dogs and cats based on images.

-__Regression__ - predicting stock prices. 

-__Image segmentation__ - pixel-classification. as a specific example, it is important for a self-driving car to identify what each pixel belongs to so that it can identify where various objects exist/begin/end, such as other cars, pedestrians, trees, and so on.

-__Speech recognition__ - Alexa, Siri, OK Google.

-__Language translation__ - translating speech form one language to another language.

<a id = 'Supervised-learning'></a>

# Unsupervised learning

When there are no labels in the data, the task is considered to be unsuperivsed. two commonly-used techniques are:

- __Clustering__ - groups similar data points / observations together
- __Dimensionality reduction__ - reduces the number of dimensions so that we can evaluate / visualize high-dimensional data to discover any hidden patterns.

<a id = 'Unsupervised-learning'></a>

# Reinforcement learning

An application of reinforcement learning that received mainstream news coverage recently is Google's DeepMind project called AlphaGo that defeated the Go world champion. This topic will not be covered in this book.

<a id = 'Reinforcement-learning'></a>

# Overfitting

Overfitting occurs when a model performs well on a training dataset but generalizes poorly to unseen data. Neural networks can be adjusted in several differt ways to attmpt addressing overfitting issues.



<a id = 'Overfitting'></a>

## Reducing the size of the network

Removing certain layers from an unnecessarily deep network can prevent the network from memorizing the training data. Training data memorization typically results in poor performance on validation/unseen data.



<a id = 'Reducing-the-size-of-the-network'></a>

In [2]:
# excessively deep network
class Architecture1(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Architecture1, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.relu(out)
        out = self.fc3(out)
        return out


# same network as above, but one ReLU layer and one linear layer has been removed, which may be less likely to overfit
class Architecture2(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Architecture2, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

## Applying weight regularization

Regularization reduces the complexity of model by penalizing large weights. PyTorch provides an easy way to utilize l2/ridge regularization through the weight_decay parameter in the optimizer. by default weight_decay is set to 0.

```python
model = Architecture1(10, 20, 2)
optimizer = torch.optim.Adam(model.parameters(), lr = 1e-4, weight_decay = le-5)
```



<a id = 'Applying-weight-regularization'></a>

In [3]:
# instantiate model and create optimizer
model = Architecture1(10, 20, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, weight_decay=1e-5)

## Dropout

Dropout is a oommonly used and powerful regularization in deep learning. It is a technique applied to intermediate layers in the model during the training phase. A threshold value is set (between 0 and 1), and then dropout randomly masks/zeros-out a proportion of weights equal to that threshold. A higher dropout threshold applies heavier regularization. Common value are between 0.2 and 0.5. Dropout is only applied during training, and during the testing phase values are scaled down by a factor equal to the dropout.

Here is a snippet illustrating how PyTorch implements dropout as an additional layer:

```python
nn.dropout(x, training = True)
```



<a id = 'Dropout'></a>

# Learning rate picking strategies

Tuning the learning rate is something that can be pursued using the torch.optim.lr_scheduler package.

__StepLR__ - This scheduer takes two paramters: step size, which denotes the number of epochs where the learning rate needs to change, and gamma, which decides how much the learning rates changes. For example, for a learning of 0.01, with a step size of 10 and a gamme of 0.1, then for the first 10 epochs the learning rates changes to 0.001, and by the last of learning rate has changed to 0.0001. Her followis an example framework:

```python
scheduler = StepLR(optimizer, step_size = 30, gamma = 0.1)
for epoch in range(100)
    scheduler.step()
    train(...)
    validate(...)
```

- __MultiStepLR__ - This works similarly to StepLR, except for the fact that the steps are not at regular intervals. Steps are given as lists. If the list given is [10, 15, 30], this will control when the learning rate changes. The learning rate is multiplied by its gamma value. Here is an example.

```python
scheduler = MultiStepLR(optimizer, milestones = [10, 15, 30], gamma = 0.1)
for epoch in range(1000):
    scheduler.step()
    train(...)
    validate(...)
```

__ExponentialLR__ - This sets the learning rate to a multiple of the learning rate with gamma values for each epoch.

__ReduceLROnPlateau__ - This is a commonly used learning rate tuning strategy where the learning rate changes when a particular metric, such as training loos, validation loss or accuracy begins to stagnate. It is a common practice to reduce the learning rate by 2 to 1- times its original value. An example:

```python
optimizer = torch.optim.SGD(model.parameters(), lr = 0.1, momentum = 0.9)
scheduler = ReduceLROnPlateau(optimizer, 'min')
for epoch in range(10:
    train(...)
    val_loss = validate(...)
    scheduler.step(val_loss)
```


<a id = 'Learning-rate-picking-strategies'></a>