<a href="https://colab.research.google.com/github/olahfemi/HackExpo/blob/master/introtocnns_torch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Convolutional Neural Networks

Most of the success achieved so far in the field of Deep Learning can be attributed to the development of a special class of Artificial Neural Networks called "Convolutional Neural Networks." They were initially developed by Yann Lecun et al in 1998. At inception, they were used primarily for image classifaction, however, their application has evolved and has become the primary building block for virtually every field of Deep Learning, from Image Recognition, Detection, Segmentation to Speech Synthesis and Recognition, Face Detection and Recognition, Sequence to Sequence Learning, Generative Adversarial Networks, Reinforcement Learning, Variational AutoEncoders, Flow Models and much more. 

They are generally well suited for tasks involving high dimensional unstructured data. However, they are not well suited for structured data like tables. 

While CNNs are applicable to almost any field of Deep Learning, we shall explain both the theoretical details and practical applications from the viewpoint of Image Classification, however, the lessons learned here can be easily transferred to almost any other field of Artificial Intelligence.


To fully understand the way CNNs work, a basic understanding of the structure of Images is very helpful.

#Structure of Images
![Arrangement of Pixels in an Image](https://github.com/johnolafenwa/CNNLecture/raw/master/images/img_arrangement.png)

This is a cat. The image looks beautiful to the eye but in computer memory, it is purely 3 Dimensional Array of Numbers. Images are all made up of pixels, these pixels are numbers ranging from 0 to 255. A Standard RGB image is made up of three layers of pixels as showb above. The RED layer, the Green Layer and the Blue Layer. Often and in the rest of this document, we shall use the word "channels" instead of layers. 
In each channel(layer) we have pixels arranged in grids. The manner in which these pixels are arranged will determine the way the image looks like. One important thing to note is that in each channel , the value of each pixel represents the intensity of the pixel. For example, in the red channel, a value of 255 represents absolute red while a much lesser value indicates a lower intensity.

Consider a 4 x 4 Image made of 3 channels, each of the channel will be a 4 x 4 grid of pixels like below

![4 x4 grid of pixels](https://github.com/johnolafenwa/CNNLecture/raw/master/images/pixelgrid.png)

As you can see above, the lower values have a color tending towards black while the higher values are lighter. Combining a this structure across 3 difference channels forms the beautiful pictures we see daily.

In summary, an Image is made of channels, and each channel is a 2 Dimensional grid of pixel values. GrayScale (Black and White) pictures have only a single channel while RGB images have 3 Channels.

Convolutional Neural Networks takes great advantage of this structure to detect patterns in an image.


#HOW CONVOLUTIONAL NEURAL NETWORKS WORKS

Humans recognize objects through learned patterns. You know a car by patterns such as the wheels, the windscreen, the rear mirror etc.

![Car](https://github.com/johnolafenwa/CNNLecture/raw/master/images/car.jpg)

Humans learn this patterns implicitly from experience and once we detect these patterns in new scenes, we use them to make decisions. 

The traditional approach to build a computer vision system that can tell cars apart from animals is to hardcode this features directly into the system such that we explicitly define different parts of a car and animal and their pixel arrangements. However, the number of such possible patterns is very large, due to the variation in the way these objects appear. Convolutional Neural Networks solves this problem by learning the patterns that distinguishes different classes of items directly from data. 
By presenting a 1000 pictures of cars to a convolutional neural network, it can implicitly learn all the instrisic patterns that makes a car look like a car. In this light, convolutional neural networks are "Pattern Detectors". Once they have learned the neccessary patterns, they can then use these patterns to classify new images correctly.


# CNN KERNELS AND CHANNELS
Kernels are the most important parameters in a CNN, they represent the patterns the network learns. A kernel is fixed size parameter, common sizes are 1 x 1 , 2 x 2, 3 x 3 and 5 x 5 kernels.
To illistrate what kernels are and how they work, let's consider a classic horizontal line detector kernel below.
Here are two images

![image 1](https://github.com/johnolafenwa/CNNLecture/raw/master/images/img1.png)   
![image 2](https://github.com/johnolafenwa/CNNLecture/raw/master/images/img3.png)

The first image clearly has a horizontal red line, the second image on the other hand does not contain a horizontal line. 

This is clear to the human eye, below is a kernel that can automatically detect this.
![kernel](https://github.com/johnolafenwa/CNNLecture/raw/master/images/line_detector.png)

We shall compute the dot product of the kernel with each of the two images.

## IMAGE 1 ACTIVATION
activation = (200 x 1.5) + (200 x 1.5) + (0 x -1.5) + (0 x -1.5) = 600

With an activation value of 600, we can be very confident that a horizontal line was detected in the image.

## IMAGE 2 ACTIVATION
activation =  (200 x 1.5) + (0 x 1.5) + (200 x -1.5) + (0 x -1.5) = 0

Here the activation value is zero, because no horizontal line was detected.

In a convolution operation, multiple kernels are applied on the same image, this helps the network rely on more than one pattern. In the example above, we hardcoded the horizontal line detector, however, CNNs are trained to come up with their own kernels. The number of kernels applied at each layer of the convolution operation is described as the number of output channels.


The convolution operation will be applied across all dimensions of the image to produce multiple activation values as shown below:

![convolution operation](http://deeplearning.net/software/theano/_images/no_padding_no_strides.gif)

Here a 3 x 3  convolution operation is applied with a stride of 1 on a 4 x 4 image to produce a 2 x 2 activation map. This 2 x 2 activation map is used as the input image in the next layers.

## PADDING
Often, we want the activation map to retain the size of the input image, unlike above where the size was halved. To achieve this, we can simply pad the image with zeros. Below is an example

![padding](http://deeplearning.net/software/theano/_images/same_padding_no_strides.gif)

As you can see here the output remains 5 x 5, the size of the input image.

## STRIDE
Sometimes, we want to deliberately reduce the size of the activation map, this can be achieved by using a stride greater than 1. Example below.

![stride](http://deeplearning.net/software/theano/_images/padding_strides_odd.gif)

Here the output is exactly half the input due to a stride of 2.

# CODING TIME
The theory of convolutions is quite a lot, in this treatise, i have only covered the minimum required. 
Additional resources to learn about convolutions are treated at the end of this notebook. Now, we shall apply convolutions to train Image Classification Models.


While the logic of convolutions can be more complex, a number of DeepLearning frameworks has made it much easier.

Here we shall be using TorchFusion, a modern Deep Learning framework built on Pytorch. We shall install both frameworks below.

In [0]:
!pip3 install torch --upgrade
!pip3 install torchfusion --upgrade

Using TorchFusion and Pytorch, we shall design and train a Convolutional Neural Network that can accurately tell a person's profession based on their dressing.

The dataset we shall use to train the model is "IdenProf", a dataset of 11 000 pictures of professionals spread across ten categories including: Chef, Doctor, Engineer, Farmer, Firefighter, Judge, Mechanic, Pilot, Police, Waiter

Download the dataset from https://github.com/OlafenwaMoses/IdenProf/releases/download/v1.0/idenprof-jpg.zip and extract the data.
The dataset is composed of two main folders, the train and test folder. Each folder contains sub-folders for each of the ten classes.

Next, import the neccessary packages

In [0]:
!wget https://github.com/OlafenwaMoses/IdenProf/releases/download/v1.0/idenprof-jpg.zip
!unzip idenprof-jpg.zip

In [0]:
import torch
from torchfusion.layers import *
import torch.nn as nn
from torch.optim import Adam
from torchfusion.learners import StandardLearner
from torchfusion.metrics import *
from torchfusion.datasets import *
import torchvision.transforms as transforms


TRAIN_FOLDER = "./idenprof/train"
TEST_FOLDER = "./idenprof/test"
BATCH_SIZE = 32

Below we define a CNN Model using the Keras Functional API

In [0]:
class ConvLayer(nn.Module):
    def __init__(self,in_filters, out_filters,stride=1):
      super(ConvLayer,self).__init__()
      
      self.net = nn.Sequential(
      Conv2d(in_filters,out_filters,kernel_size=3,stride=stride),
      BatchNorm2d(out_filters),
      nn.ReLU()
      )
    
    def forward(self,x):
      return self.net(x)
      
      return output
    
class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
      super(SimpleNet,self).__init__()
      
      self.net = nn.Sequential(
      
          ConvLayer(3,16,stride=2),
          ConvLayer(16,32),
          ConvLayer(32,32),
          
          ConvLayer(32,32,stride=2),
          ConvLayer(32,64),
          ConvLayer(64,64),
          
          ConvLayer(64,64,stride=2),
          ConvLayer(64,128),
          ConvLayer(128,128) ,
          nn.Dropout(0.5),
          
          GlobalAvgPool2d(),
          
          Linear(128,10)
      )
    def forward(self,x):
      return self.net(x)

model = SimpleNet()

if torch.cuda.is_available():
  model = model.cuda()
  
learner = StandardLearner(model)

print(learner.summary((3,224,224)))
  
  

We need to load the IndenProf Dataset

In [0]:
train_transforms = transforms.Compose([
    transforms.RandomCrop(224,padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])

test_transforms = transforms.Compose([
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])

train_loader = imagefolder_loader(transform=train_transforms,batch_size=32,shuffle=True,root=TRAIN_FOLDER)
test_loader = imagefolder_loader(transform=test_transforms,shuffle=False,batch_size=32,root=TEST_FOLDER)

Below the model is trained

# Note
Due to an issue with Google Colab and PIL, before running the training, go to Runtime and click restart runtime, else you will run into an error.

In [0]:
optimizer = Adam(model.parameters(),lr=0.001)

loss_fn = nn.CrossEntropyLoss()

train_metrics = [Accuracy()]
test_metrics = [Accuracy()]


if __name__ == "__main__":
    learner.train(train_loader,train_metrics=train_metrics,optimizer=optimizer,loss_fn=loss_fn,model_dir="./my-torch-models",test_loader=test_loader,test_metrics=test_metrics,num_epochs=200,batch_log=False)


Using the setup above, the model achieves a accuray of about 80% after 63 epochs. The best model is already saved in our model directory. Using this trained model, we can easily predict the class of new images.

Below we download some images to use for testing.

In [0]:
!wget https://github.com/johnolafenwa/CNNLecture/raw/master/images/testimages.zip
!unzip testimages.zip

Below is the full script to re-define the network, load the trained model and predict the classes of new images.

In [0]:
import torch
from torchfusion.layers import *
import torch.nn as nn
from torchfusion.learners import StandardLearner
import torchvision.transforms as transforms
from PIL import Image
import os

class ConvLayer(nn.Module):
    def __init__(self,in_filters, out_filters,stride=1):
      super(ConvLayer,self).__init__()
      
      self.net = nn.Sequential(
      Conv2d(in_filters,out_filters,kernel_size=3,stride=stride),
      BatchNorm2d(out_filters),
      nn.ReLU()
      )
    
    def forward(self,x):
      return self.net(x)
      
      return output
    
class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
      super(SimpleNet,self).__init__()
      
      self.net = nn.Sequential(
      
          ConvLayer(3,16,stride=2),
          ConvLayer(16,32),
          ConvLayer(32,32),
          
          ConvLayer(32,32,stride=2),
          ConvLayer(32,64),
          ConvLayer(64,64),
          
          ConvLayer(64,64,stride=2),
          ConvLayer(64,128),
          ConvLayer(128,128) ,
          nn.Dropout(0.5),
          
          GlobalAvgPool2d(),
          
          Linear(128,10)
      )
    def forward(self,x):
      return self.net(x)

model = SimpleNet()

if torch.cuda.is_available():
  model = model.cuda()
  
  
image_transforms = transforms.Compose([
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])

learner = StandardLearner(model)
learner.load_model("my-torch-models/best_models/model_43.pth")

IMAGES_FOLDER = "./testimages"

IMAGES = os.listdir(IMAGES_FOLDER)
PREDICTIONS = []
class_map = {0:"Chef", 1:"Doctor", 2:"Engineer", 3:"Farmer", 4:"Firefighter", 5:"Judge", 6:"Mechanic", 7:"Pilot", 8:"Police", 9:"Waiter"}

for image_file in IMAGES:
    image_file = os.path.join(IMAGES_FOLDER,image_file)
    img = Image.open(image_file).convert("RGB")
    
    #perform preprocessing
    img = image_transforms(img)
    
    #add batch dimension
    img = img.unsqueeze(0)
    
    prediction = learner.predict(img).argmax().item()
    
    #append predictions
    PREDICTIONS.append(prediction)
    
    

#print predictions:

for image_file,prediction in zip(IMAGES,PREDICTIONS):
  
    class_name = class_map[prediction]
    
    print("File: {} , Class Prediction: {} Class Name: {}".format(image_file,prediction,class_name))
    
    
    


  

# ABOUT THE WRITER
John Ishola Olafenwa is the CTO of [DeepQuestAI](https://deepquestai.com). He is a Deep Learning researcher, a machine learning engineer and the creator of [TorchFusion](https://github.com/johnolafenwa/TorchFusion)

You can reach him on twitter [@johnolafenwa](https://twitter.com/johnolafenwa)

Email:  johnolafenwa@gmail.com

![John Olafenwa](https://deepquestai.com/about/john.png)
