# Quick Start PyTorch

Welcome to FloydHub! If you're here, then you've just achieved your first success on FloydHub: running your first GPU-powered Job. 🎉 Congrats! Now let's go through the classic 'Hello World' of deep learning: the hand-written digit classification task. In this example, we'll be using PyTorch as our deep learning framework.

## First things first - how to set up for success

Let's first outline which machine learning packages we'll need to run this experiment:

- [Numpy](http://www.numpy.org/) is the fundamental package for scientific computing with Python
- [PyTorch](http://pytorch.org/) is the the deep learning framework we'll be using this example
- [torchvision](http://pytorch.org/docs/master/torchvision/index.html) package consists of popular datasets, model architectures, and common image transformations for computer vision
- [matplotlib](https://matplotlib.org/) is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

Note: the most important thing to learn when run a Jupyter Notebook is the **`Shift + Enter`** shortcut, which runs any command in a Code Cell.

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data.dataloader as dataloader
import torch.optim as optim

from torch.utils.data import TensorDataset
from torch.autograd import Variable
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import make_grid

import matplotlib.pyplot as plt
%matplotlib inline

# CUDA?
cuda = torch.cuda.is_available()

# Seed for replicability
torch.manual_seed(1)
if cuda:
    torch.cuda.manual_seed(1)

## ## The MNIST Dataset
The [MNIST](http://yann.lecun.com/exdb/mnist/) dataset is available for free and provides a great starting point for experimenting with deep learning. Normally, we'd want to create a separate Dataset on FloydHub for our data - but, in this case, the torchvision package actually provides a direct MNIST API that we'll be using.

## Experiment structure
PyTorch experiments are typically structured in this way:

* Create a PyTorch dataset (if not already provided by the framework)
* Create an iterator on the dataset using the DataLoader API. This combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset.

Note below that we have specified the `/MNIST` folder as an absolute path - this is important. If we were to use the local directory, then we would have been mixing the output of the instance with our input data, which would consuming additional space and mixing code with data.


In [None]:
# Hyperparameters
batch_size = 8
num_workers = 1

# Download the MNIST dataset into the /MNIST folder if not mounted via FH --data
train = MNIST('/MNIST', train=True, download=True, transform=transforms.Compose([
    transforms.ToTensor(), # ToTensor does min-max normalization. 
]), )

test = MNIST('/MNIST', train=False, download=True, transform=transforms.Compose([
    transforms.ToTensor(), # ToTensor does min-max normalization. 
]), )

if cuda:
    # Create DataLoader for cuda
    dataloader_args = dict(shuffle=True, batch_size=batch_size, num_workers=num_workers, pin_memory=True)
else:
    dataloader_args = dict(shuffle=True, batch_size=batch_size, num_workers=0, pin_memory=False)
    
train_loader = dataloader.DataLoader(train, **dataloader_args)
test_loader = dataloader.DataLoader(test, **dataloader_args)

Let's take a closer look at MNIST:

In [None]:
train_data = train.train_data
train_data = train.transform(train_data.numpy())

print('[Train]')
print(' - Numpy Shape:', train.train_data.cpu().numpy().shape)
print(' - Tensor Shape:', train.train_data.size())
print(' - Transformed Shape:', train_data.size())
print(' - min:', torch.min(train_data))
print(' - max:', torch.max(train_data))
print(' - mean:', torch.mean(train_data))
print(' - std:', torch.std(train_data))
print(' - var:', torch.var(train_data))

In [None]:
test_data = test.test_data
test_data = test.transform(test_data.numpy())

print('[Test]')
print(' - Numpy Shape:', test.test_data.cpu().numpy().shape)
print(' - Tensor Shape:', test.test_data.size())
print(' - Transformed Shape:', test_data.size())
print(' - min:', torch.min(test_data))
print(' - max:', torch.max(test_data))
print(' - mean:', torch.mean(test_data))
print(' - std:', torch.std(test_data))
print(' - var:', torch.var(test_data))

So we have 60k black & white images of 28x28 pixels for training, and 10k for testing. Let's visualize what this samples look like:

In [None]:
# Show a sample
plt.imshow(test.test_data.cpu()[0].numpy(), cmap='gray')
plt.title('DIGIT: %i' % test.test_labels.cpu()[0])
plt.axis('off')
plt.show()

In [None]:
# This is what it looks like seen as raw numbers in range 0-254,
# where 0 represent black pixels and 254 white pixels.

print (test.test_data.cpu()[0].numpy())

In [None]:
# More Visualization on grid format

# Plot a grid of images
def imshow(inp_tensor, title):
    """image show for Tensor"""
    img = inp_tensor.numpy().transpose((1, 2, 0))
    plt.imshow(img)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)

# Get a batch
trainset_iter = iter(train_loader)
images, labels = trainset_iter.next()
# show images
batch_imgs = make_grid(images, nrow=batch_size)
plt.axis('off')
imshow(batch_imgs, title=[x for x in labels])

Now it's time to prepare the DataLoader to perform the training - we have increased the batch size reducing the noise.

In [None]:
# Hyperparameters
batch_size = 64

if cuda:
    # Create DataLoader for cuda
    dataloader_args = dict(shuffle=True, batch_size=batch_size, num_workers=num_workers, pin_memory=True)
else:
    dataloader_args = dict(shuffle=True, batch_size=batch_size, num_workers=0, pin_memory=False)
    
train_loader = dataloader.DataLoader(train, **dataloader_args)
test_loader = dataloader.DataLoader(test, **dataloader_args)

## Model

We use a 2 layer Neural Networks with ReLU as activation function, BatchNormalization and Dropout as regularizer. What does it means all these things:

- [Neural Networks](https://en.wikipedia.org/wiki/Artificial_neural_network), a class of statistical models which take inspiration from the Human Neural System and represent the computation as a DAG(Direct Acyclic Graph), for a great introduction see this video: [But what *is* a Neural Network? | Deep learning](https://youtu.be/aircAruvnKk).
- [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) is the activation function, a non-linearity operation which takes inspiration from human biology. It's the magic that drive the learning process.
- [Batch Normalization](https://arxiv.org/abs/1502.03167). Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.
- [Dropout](http://jmlr.org/papers/v15/srivastava14a.html) is a technique which randomly drop units (along with their connections) from the neural network during training. This prevent the overfitting(bad generalization) in Deep neural nets.

In [None]:
# Hyperparameters
input_size = 784 # 28 * 28
hidden_size = 548
hidden_size2 = 252
num_classes = 10
learning_rate = 1e-3

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.bc1 = nn.BatchNorm1d(hidden_size)
        
        self.fc2 = nn.Linear(hidden_size, hidden_size2)
        self.bc2 = nn.BatchNorm1d(hidden_size2)
        
        self.fc3 = nn.Linear(hidden_size2, num_classes)
        
        
    def forward(self, x):
        x = x.view((-1, 784))
        h = self.fc1(x)
        h = self.bc1(h)
        h = F.relu(h)
        h = F.dropout(h, p=0.5, training=self.training)
        
        h = self.fc2(h)
        h = self.bc2(h)
        h = F.relu(h)
        h = F.dropout(h, p=0.2, training=self.training)
        
        h = self.fc3(h)
        out = F.log_softmax(h)
        return out

model = Model()

if cuda:
    model.cuda() # CUDA!

# Cross Entropy as loss function
loss_fn = nn.CrossEntropyLoss()
# If you are running a GPU instance, compute the loss on GPU
if cuda:
    loss_fn.cuda()    

# Adam Optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

## Train

Perform a 5 epochs(5 time over the full dataset) training and plot Loss and Accuracy over the epochs. Why do we make this? Training is driven by 2 metrics:

- Minimize the Loss, means that the error between model prediction and datatset label must the less as possible. This metric drives the training since the model parameters are update with respect to the Loss to improve the prediction.
- High Accuracy, means to correctly predict the sample to classify.

In [None]:
# Hyperparameters
num_epochs = 5
print_every = 100

# Metrics
train_loss = []
train_accu = []

# Model train mode
model.train()
# Training the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # image unrolling
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        if cuda:
            images, labels = images.cuda(), labels.cuda()
        
        # Forward + Backward + Optimize
        optimizer.zero_grad()
        outputs = model(images)
        loss = loss_fn(outputs, labels)
        
        # Load loss on CPU
        if cuda:
            loss.cpu()
        loss.backward()
        optimizer.step()
        
        ### Keep track of metric every batch
        # Loss Metric
        train_loss.append(loss.data[0])
        # Accuracy Metric
        prediction = outputs.data.max(1)[1]   # first column has actual prob.
        accuracy = prediction.eq(labels.data).sum()/batch_size*100
        train_accu.append(accuracy)
        
        # Log
        if (i+1) % print_every == 0:
            print ('Epoch: [%d/%d], Step: [%d/%d], Loss: %.4f, Accuracy: %.4f' 
                   % (epoch+1, num_epochs, i+1, len(train)//batch_size, loss.data[0], accuracy))

In [None]:
# Plot Loss over time
plt.xlabel("Steps")
plt.ylabel("Loss")
plt.title("Loss function over time")
plt.plot(np.arange(len(train_loss)), train_loss)

In [None]:
# Plot Accuracy over time
plt.xlabel("Steps")
plt.ylabel("Accuracy")
plt.title("Accuracy function over time")
plt.plot(np.arange(len(train_accu)), train_accu)

## Evaluate

Evaluate the trained model over the full test set.

In [None]:
model.eval()
correct = 0
for data, target in test_loader:
    data, target = Variable(data.view(-1, 28*28), volatile=True), Variable(target)
    if cuda:
        data, target = data.cuda(), target.cuda()
    output = model(data)
    # Load output on CPU
    if cuda:
        output.cpu()
    prediction = output.data.max(1)[1]
    correct += prediction.eq(target.data).sum()

print('\nTest set: Accuracy: {:.2f}%'.format(100. * correct / len(test_loader.dataset)))

## Summary
Great! You achieved 98% accuracy and familiarized yourself with Jupyter Notebooks running on FloydHub. This is just the beginning of your deep learning journey on FloydHub! We're excited to see what your next experiments.

If you need help with deep learning content or you cannot find an answer to your questions in our [documentation](https://docs.floydhub.com/), then you can open a topic in our community [forum](https://forum.floydhub.com/). We're eager to continue improving FloydHub to help your deep learning journey.

Happy Learning!