**Hello Everyone**
<br/>
I have been using keras and tensorflow for an year and both have been good with few advantages and limitations.
<br/>
Keras is high level API built on Theano and tensorflow and is easy to use while tensorflow is a complex one.But in flexibility point of view, TF in much more flexible and offers more advanced operations than keras which comes useful while doing research.
*So I then decided to find something which is easy to use, provides the intuition of entire flow control and offers flexibilty of doing advanced operations.*This is where I started to work and fall in love with it.
For the crowd who haven't heard much of it;
<br/>
**PyTorch is an Open source Machine Learning library for python, based on Torch.** 
* **The biggest difference between TF and PYtorch is that, in tensorflow, graph is defined statically before running a model whereas in PyTorch, graph construction is dynamic and nodes can be defined, changed and executed on the go.**
* **Building or binding custom extensions written in C, C++ or CUDA is doable with both frameworks. TensorFlow again requires more boiler plate code though is arguably cleaner for supporting multiple types and devices. In PyTorch you simply write an interface and corresponding implementation for each of the CPU and GPU versions.**

* **Compiling the extension is also straight-forward with both frameworks and doesn’t require downloading any headers or source code outside of what’s included with the pip installation.**

This tutorial focusses on implementing Image classification task using PyTorch using Sequential API. The difference between torch.nn and torch.nn.functional is a matter of convenience and taste. *torch.nn is more convenient for methods which have learnable parameters.*
<br/>
**This tutorial is sufficient to create your first model in PyTorch and learn most of it's concepts and this documentation is very straightforward and descriptive.**
<br/>
Initially, I have noticed people face issues in these issues with pytorch:
* **Data Loading**
* **Model Evaluation**
<br/>
so I have intended to make these tasks more simpler in this tutorial. This tutorial is sufficient for building your first classifier using PyTorch and optimizing it using your custom data. It is structured in such way that you can get hold of entire pipeline of Image Classification tasks. Just Fork the kernel and execute it sequentially and you will get practical taste of PyTorch.

Training a classifier involves these steps:
* **Data Loading and Preprocessing**
* **Model Selection**
* **Defining a loss function**
* **Training the classifier, validation and Parameter tuning**
<br/>

We are now set for a ride in all these using PyTorch. I have invested my research methodology while designing this tutorial so that any beginner can easily understand.
<br/>
    Let's begin by importing the required libraries

In [27]:
import numpy as np
import pandas as pd
import torch.nn.functional as F
import math
from torch.optim import lr_scheduler
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn import metrics
import torch
import itertools
from torchvision import models
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.autograd import Variable
from torch import nn
from torch.utils.data import Dataset, DataLoader
import os
from torch.nn import MaxPool2d
import chainer.links as L
from PIL import Image
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
plt.ion()

**This data is wrongly matched. Please execute this code to have the correct mapping of X and y values**

In [28]:
data = np.load('../input/Sign-language-digits-dataset/X.npy')
target = np.load('../input/Sign-language-digits-dataset/Y.npy')
Y = np.zeros(data.shape[0])
Y[:204] = 9
Y[204:409] = 0
Y[409:615] = 7
Y[615:822] = 6
Y[822:1028] = 1
Y[1028:1236] = 8
Y[1236:1443] = 4
Y[1443:1649] = 3
Y[1649:1855] = 2
Y[1855:] = 5
X_train, X_test, y_train, y_test = train_test_split(data, Y, test_size = .02, random_state = 2) ## splitting into train and test set

# Custom Data Loading and Processing

Here we are working with numpy arrays which can be directly converted to torch variable for mathematical operations.Many datasets contain images which can be imported using PIL library using
<br/>
**PIL.IMAGE..fromarray(img_name)**
<br/>
We use PIL instead of OpenCV because its Torch default image loader and is compatible with ToTensor() method. The PyTorch website has all the tutorials of this section using their inbuild Dataset class and it can be a toilsome task to load data in case of Custom Dataset.While loading the data, these point should be closely worked upon:
* Setting a batch Size
* Shuffling the data
* Parallelizing the tasks using multiprocecssing workers.
<br/>

The **DataLoader** function  provides all of these features and you can specify them before heading to next task. The label or the target variable can be one an array or one-hot encoded depending on the loss function. For having a look at the different loss functions, please head to [Pytorch loss functions](https://pytorch.org/docs/master/nn.html#loss-functions)

Data loading in PyTorch can be separated in 2 parts:
<br/>
* Data must be wrapped on a Dataset parent class where the methods __getitem__ and __len__ must be overrided. Not that,  the data is not loaded on memory by now.
* The Dataloader reads the data and puts it into memory.

### torchvision
It is used to load and prepare dataset. Using it you can create *transformations* on the input data.
<br/>
#### transforms
It is used for preprocessing images and performing operations sequentially.
<br/>
#### num_workers
It is used for multiprocessing.Normally, **num_workers = 4 * (number of gpus)** works well.


In [29]:
class DatasetProcessing(Dataset):
    def __init__(self, data, target, transform=None): #used to initialise the class variables - transform, data, target
        self.transform = transform
        self.data = data.reshape((-1,64,64)).astype(np.uint8)[:,:,:,None]
        self.target = torch.from_numpy(target).long() # needs to be in torch.LongTensor dtype
    def __getitem__(self, index): #used to retrieve the X and y index value and return it
        return self.transform(self.data[index]), self.target[index]
    def __len__(self): #returns the length of the data
        return len(list(self.data))

In [30]:
transform = transforms.Compose(
    [transforms.ToPILImage(), transforms.ToTensor(), transforms.Normalize(mean=(0.5,), std=(0.5,))])
dset_train = DatasetProcessing(X_train, y_train, transform)
train_loader = torch.utils.data.DataLoader(dset_train, batch_size=4,
                                          shuffle=True, num_workers=4)

In [31]:
dset_test = DatasetProcessing(X_test, y_test, transform)
test_loader = torch.utils.data.DataLoader(dset_test, batch_size=4,
                                          shuffle=True, num_workers=0)

In [32]:
for num, x in enumerate(X_train[0:6]):
    plt.subplot(1,6,num+1)
    plt.axis('off')
    plt.imshow(x)
    plt.title(y_train[num])

# Training

## Define a CNN

![](https://www.researchgate.net/publication/318174473/figure/fig2/AS:614173574701099@1523441799965/The-topology-of-the-event-driven-Convnet-used-for-this-work-Where-n-is-the-size-of-the.png)










Some of the important libraries
### [nn](https://pytorch.org/docs/master/nn.html)
This library contains  function for building conv nets and importing a loss function.
Some loss functions are:
* **binary_cross_entropy**: Function that measures the Binary Cross Entropy between the target and the output.
* **nll_loss**: measures the The negative log likelihood loss.
* **cross_entropy**: This criterion combines log_softmax and nll_loss in a single function.

### [optim](https://pytorch.org/docs/master/optim.html)
torch.optim is a package implementing various optimization algorithms. To construct an Optimizer you have to give it an iterable containing the parameters (all should be Variable s) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.Some of the optimizers defined by this library are:
* **Adam**: It has been proposed in [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
* **LBFGS**: Implements L-BFGS algorithm.
* **RMSprop**: Proposed by G. Hinton in his [course](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
* **SGD**: Nesterov momentum is based on the formula from [On the importance of initialization and momentum in deep learning]
<br/>


Let’s use a **Classification Cross-Entropy loss** and **SGD with momentum**.

In [33]:
class Net(nn.Module):    
    def __init__(self):
        super(Net, self).__init__()
          
        self.features = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
          
        self.classifier = nn.Sequential(
            nn.Dropout(p = 0.5),
            nn.Linear(32 * 32 * 32, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p = 0.5),
            nn.Linear(512, 10),
        )
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        
        return x     

# How to use optimizers?
To construct an Optimizer you have to give it an iterable containing the parameters (all should be Variable s) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.<br/>
*If you want to move the model and variables to cuda, use **model_object.cuda()** and **variable_name.cuda()***

**Follow these steps to build your first model in pytorch**

#### Clearing the Gradients - optimizer.zero_grad()
** Clear out the gradients accumulated for the parameters of the network before calling loss.backward() and optimizer.step()**
#### Compute the loss - criterion( predicted_target, target)
**Compute the loss between the predicted value and the target value within the loss function previously defined**
#### Backpropogation - loss.backward()
**Back-prop all the layers in all the layers of the network**
#### Taking an optimization step - optimizer.step()
** Update the parameters of the network **


In [34]:
model = Net()

optimizer = optim.Adam(model.parameters(), lr=0.01)

criterion = nn.CrossEntropyLoss()

exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()

In [35]:
def train(epoch):
    model.train()
    exp_lr_scheduler.step()

    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        #print(data.size())
        if torch.cuda.is_available():
            data = data.cuda()
            target = target.cuda()
        #print(target)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        
        loss.backward()
        optimizer.step()
        
        if (batch_idx + 1)% 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                100. * (batch_idx + 1) / len(train_loader), loss.data[0]))

## Model Evaluation
Now, it's time to evaluate out model. <br/>
**Note that I have used Functional API( F.cross_entropy( args )) so as to give an idea how it works.**

In [36]:
def evaluate(data_loader):
    model.eval()
    loss = 0
    correct = 0
    
    for data, target in data_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        if torch.cuda.is_available():
            data = data.cuda()
            target = target.cuda()
        
        output = model(data)
        loss += F.cross_entropy(output, target, size_average=False).data[0]
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()
        
    loss /= len(data_loader.dataset)
        
    print('\nAverage loss: {:.4f}, Accuracy: {}/{} ({:.3f}%)\n'.format(
        loss, correct, len(data_loader.dataset),
        100. * correct / len(data_loader.dataset)))

### You can run this model for more  epochs

In [37]:
n_epochs = 2

for epoch in range(n_epochs):
    train(epoch)
    evaluate(train_loader)

**One powerful technique to increase the  performance of any deep learning model is through adding more data. So Let's use data augmentationn and see if we get some performance improvement.**

## Data Augmentation

In [38]:
train_transform= transforms.Compose([
            transforms.ToPILImage(),
            transforms.RandomHorizontalFlip(), # Horizontal Flip
            transforms.Scale(256), #Scaling the input
            transforms.RandomCrop(64, padding=4), # Centre Crop
            transforms.ToTensor(),  #Convereting the input to tensor
            transforms.Normalize([0.5], [0.5])
])
dset_train = DatasetProcessing(X_train, y_train, train_transform)
train_loader = torch.utils.data.DataLoader(dset_train, batch_size=4,
                                          shuffle=True, num_workers=4)

In [39]:
n_epochs = 5

for epoch in range(n_epochs):
    train(epoch)


**Please give your feedback on this script and Upvote if you liked it.**