[View in Colaboratory](https://colab.research.google.com/github/kenzhemir/kenzhemir.github.io/blob/master/How_to_train_a_dragon.ipynb)

# Why Convolutional Neural Networks? 

AlphaGO

![alt text](https://cdn57.androidauthority.net/wp-content/uploads/2016/03/FACEBOOK-RESULTS-CARD-DAY-5-01-840x440.jpg)

OpenAI and Dota 2

![alt text](https://i2.wp.com/anith.com/wp-content/uploads/2017/08/elon-musks-self-taught-ai-bot-destroyed-an-esports-pro-in-dota-2.jpg?fit=1200%2C630&ssl=1)

# What is classification?
Basic task in Computer Vision

## Training: 
![alt text](https://cdn-images-1.medium.com/max/1600/1*oB3S5yHHhvougJkPXuc8og.gif)

## Testing: 
![alt text](https://sourcedexter.com/wp-content/uploads/2017/05/tensorflow-1.gif)

# Why it is difficult?!

1.   Semantic Gap - Did machines see the same way we do? Nope
2.   Illumination
3.   View point
4.   Deformation
5.   Occlusion
6.   Background Clutter
7.   Intraclass variation



## How to approach the Problem???  

Just collect the data and hope algorithm will figure it out (aka Data-Driven Approach): 

1. Collect a dataset of images and labels (Easier sad than done)
2. Use Machine Learning to train a classifier
3. Evaluate the classifier on new images

![alt text](https://imgs.xkcd.com/comics/machine_learning.png)



# Classifiers 

1. Nearest Neighbor (Remeber existing data and find the closest match at inference) \\
Too slow (espessially if the dataset is big and complex)
2. Linear Classifier f(x,W) = Wx


# Now how we can implement that?
#### What you'll learn

1. What is a neural network and how to train it
2. How to build a basic 1-layer FC neural network using PyTorch
3. How to add more layers
4. How to pick hyperparams for your model
5. How to build convolutional networks and train them

How about overfitting, dropout, learning rate decay? 


# Install Pytorch to Colab VM 
Pytorch is an awesome Deep Learning Framework from Facebook [link text](http://pytorch.org) \\
Colab - Google actually gives you a free GPU for a limited time (12 hours) and you can run your code in Jupyter Notebook in the brower ))) \\
Isn't that GREAAAAT ?


In [0]:
# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())

accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.3.0.post4-{platform}-linux_x86_64.whl torchvision

# Import all necessary packages

In [0]:
import torch
import torch.nn as nn
import PIL.Image as image
import torchvision.models as model
from torch.autograd import Variable as v
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
import torchvision.datasets as datasets
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

# Load data
For this experiments we will be using MNIST dataset

In [0]:
root = './data'
download = True
batch_size = 64
cuda = torch.cuda.is_available()
kwargs = {'num_workers': 1, 'pin_memory': True} if cuda else {}
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1]

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST(root, train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=batch_size, shuffle=True, **kwargs)


test_loader = torch.utils.data.DataLoader(
    datasets.MNIST(root, train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size = batch_size, shuffle=True, **kwargs)

# Look at your data

In [0]:
def imshow(inp, name, title=None):
    import pylab
    """Imshow for Tensor."""
    #inp = inp.numpy().transpose((1, 2, 0))
    inp = inp.squeeze().numpy()
    mean = np.array([0.1307])
    std = np.array([0.3081])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp,  cmap=pylab.gray())
    result = image.fromarray((inp * 255).astype(np.uint8))
  #  result.save(name)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated
    
for batch_idx, (data, target) in enumerate(train_loader):
  sample = data[20] # change number to any id in the batch 
  imshow(sample, 'sample')
  break
  

![alt text](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/d5222c6e3d15770a.png)

# Create a model for training
New model class is defined as a new nn.Module

## FC

Let's start with a simple Fully Connected Network

In [0]:
class Simple_FC(nn.Module):
    def __init__(self, input_size=784, num_classes=10):
        super(Simple_FC, self).__init__()
        self.fc1 = nn.Linear(input_size, num_classes)  
    
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.fc1(x) # weighted sum of input and network weights -> Wx+b, b - bias
        return out # return logits - activations before softmax layer 

### Softmax 

transfers scores (logits) into class probabilities. \\
Entries are normalized -> sum up to 1

![alt text](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/604a9797da2a48d7.png)

In [0]:
class Complex_FC(nn.Module):
    def __init__(self, input_size=784, hidden_size=500, num_classes=10):
        super(Complex_FC, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        x = x.view(-1, 784)
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In [0]:
class Deep_FC(nn.Module):
    def __init__(self, input_size=784, num_classes=10):
        super(Deep_FC, self).__init__()
        self.fc1 = nn.Linear(input_size, 200) 
        self.activation = nn.Sigmoid()
        self.fc2 = nn.Linear(200, 100)  
        self.fc3 = nn.Linear(100, 60) 
        self.fc4 = nn.Linear(60, 30)  
        self.fc5 = nn.Linear(30, num_classes)  
    
    def forward(self, x):
        x = x.view(-1, 784)
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x)
        x = self.activation(x)
        x = self.fc3(x)
        x = self.activation(x)
        x = self.fc4(x)
        x = self.activation(x)
        x = self.fc5(x)
        return x

## CNN
Now let's see how we get even more from the network with a better architecture (CNN) 

In [0]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(16*20, 50) # Fully Connected Layers 
        self.fc2 = nn.Linear(50, 10)
        self.bn1 = nn.BatchNorm2d(10)
        self.bn2 = nn.BatchNorm2d(20)
        self.bn3 = nn.BatchNorm1d(50)

    def forward(self, x):

        x = F.relu(F.max_pool2d(self.bn1(self.conv1(x)), 2))
        x = F.relu(F.max_pool2d(self.bn2(self.conv2(x)), 2))
        x = x.view(-1, 16*20)
        x = F.relu(self.bn3(self.fc1(x)))
        x = F.dropout(x, p=0.25)
        x = self.fc2(x)
        
        return x

In [0]:
 def adjust_learning_rate(lr, optimizer, epoch):
    """Sets the learning rate to the initial LR decayed by 10 after 150 and 225 epochs"""
    lr = lr * (0.1 ** (epoch // 8)) # 6

    print ('Learning rate: ' + str(lr))

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr  

![alt text](https://i.stack.imgur.com/FjvuN.gif)

## Pick model for testing

In [0]:
model = Simple_FC()
#model = Complex_FC()
#model = Deep_FC()
#model = CNN()
if cuda:
    model.cuda()

# Define a Loss Function(Criterion) and an Optimizer


In [0]:
lr = 0.001 # 0.01  
momentum = 0.9
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

## How to choose your learning rate ???

Hyperparameter - not-leranable parameter and picked manually in the begging of training \\
Try and see what works best \\

Check behavious: 
1. Simple_FC with lr = 0.01, 0.001, 0.0001
2. Complex_FC with lr = 0.01, 0.001, 0.0001
3. CNN with lr = 0.1, 0.01 

![alt text](http://cs231n.github.io/assets/nn3/learningrates.jpeg)

## Momentum ??? 
 - running average of previous updates! Small oscillations 
 
 `V = momentum*V + (1-momentum)*gradient_of_parameters`
 
![alt text](https://cdn-images-1.medium.com/max/1600/1*5-GPmnonHVQiIj2EPG3Fgw.png)

Without Momentum -> 

```
update = learning_rate * gradient_of_parameters
parameters = parameters - update

```

![alt text](https://qph.fs.quoracdn.net/main-qimg-7adad11c6ee947a96e917e2a8205392d)

With Momentum 
```
V = momentum*V + (1-momentum)*gradient_of_parameters
W = W - learning_rate * V

```

![alt text](https://deeplearning4j.org/img/updater_2.png)



## Loss? How cross-entropy loss is calculated? 
Loss is a way to quantifying what it means to have a “good” classifier
![alt text](https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/img/1d8fc59e6a674f1c.png)


**Note!!!** In Pytorch obtaining log-probabilities in a neural network is easily achieved by adding a  `LogSoftmax`  layer in the last layer of your network and use `NLLLoss` (The negative log likelihood loss)
You may use `CrossEntropyLoss` instead, if you prefer not to add an extra layer.

## How is essentially optimization works? (in the Perfect World)
![alt text](https://cdn-images-1.medium.com/max/1600/1*bl1EuPH_XEGvMcMW6ZloNg.jpeg)




In [0]:
log_interval = 100
def train(epoch):
    for batch_idx, (data, target) in enumerate(train_loader):
       
        if cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward() # calculate gradients
        optimizer.step() # update network parameters
            
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.00f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))
            
            # Compute accuracy
            _, argmax = torch.max(output, 1)
            accuracy = (target == argmax.squeeze()).float().mean()

## But real life is not perfect. So your real function will probably look like that. Still quit good 
In this picture, loss is represented as a function of 2 parameters. In reality, there are many more
![alt text](https://cdn-images-1.medium.com/max/1200/1*msObu3xbQzSnKvtCW2z6YQ.png)

# Specify the training Procedure

# Check the model accuracy 

In [0]:
def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        if cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        test_loss += criterion(output, target).data[0]
        pred = output.data.max(1)[1] # get the index of the max log-probability
        correct += pred.eq(target.data).cpu().sum()

    test_loss = test_loss
    test_loss /= len(test_loader) # loss function already averages over batch size
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

# Let's Finally Train your Created Model

In [0]:
epochs = 1
for epoch in range(1, epochs + 1):
   # adjust_learning_rate(lr, optimizer, epoch)
    train(epoch)

In [0]:
test()

# What to do next 


1.   More complex Datasets (CIFAR, ImageNet). You even can make your own
2.   Transfer Learning from Pre-trained models. 
3.   How my filters look like ???? Feature Visualization



# Are not done with AI? Not yet. What are the limits????

1.   No system is secure. Classifier can easily be tricked? How Adversarial examples are?
2.   To train a classifier we need data (much that humans require). What if we cannot find more? Or we are too lazy to annotate? 
3.   Your algorithm is as good as your data. What if your data is biased? (**Spoiler Alert!!!** Your data is biased)

# Advesarial Examples

Images appearing similiar to humans, but misclassified by the neural networks. 
Adversarial Examples causes a state-of-the-art neural network to mis-classify any input image to whatever class we choose

**Example: **

![alt text](http://cleverhans.io/assets/adversarial-example.png)


## How we can generate Advesarial Examples
![alt text](http://dev.wode.ai/repo/TensorFlow-Tutorials-HvassLabs/images/11_adversarial_examples_flowchart.svg)



Advesarial Glasses
![alt text](https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/7f57e9939560562727344c1c987416285ef76cda/9-Figure4-1.png)

Figure 4: Examples of successful impersonation and dodging attacks. Fig. (a) shows SA (top) and SB (bottom) dodging against DNNB . Fig. (b)–(d) show impersonations. Impersonators carrying out the attack are shown in the top row and corresponding impersonation targets in the bottom row. Fig. (b) shows SA impersonating Milla Jovovich (by Georges Biard; source: https://goo.gl/GlsWlC); (c) SB impersonating SC ; and (d) SC impersonating Carson Daly (by Anthony Quintano; source: https://goo.gl/VfnDct)

## How to defend against Advesarial Examples? 

1. Include into training 
2. Train a separate network to detect an Advesarial Examples