<a name='main'></a>
# **AI IN PRACTICE : HOW TO TRAIN AN IMAGE CLASSIFIER**
### **Author: Sheetal Reddy**
### **Contact : sheetal.reddy@ai.se**

---
**Introduction** 

The training "AI in Practice" will give you, at a basic level, knowledge about how to train a pre-trained model, the pre-requisites, what techniques that are used and how to continue experimenting in the finetuning of the model. 

In this training we are going to use image classification, open data-set, a pre-trained model and Colab* to train your model. 

A pre-trained model gives you the possibility to finetune an existing model trained on a large amount of data to better fit your purposes and by that also save you time. We will go through the more of the advantages later in the training. 

There are many pre-trained models available for different purposes you can find some of them here: https://pytorch.org/docs/stable/torchvision/models.html

**The objective**

The objective of this training is to give you enough knowledge to feel confident when entering an AI project. 
By understanding the steps requerired to train a model you will an advantage when working in AI related project. 
This by giving you both an theoretical knowledge but also by you being able to practice hands on - how to train a model.  

**Learning objectives**

After the training you will be able to: 
* Describe the necessary steps to train a model
* Use a Jupyter Notebook - Google Colab
* Be able to train a model 
  - Prepare datasets
  - Finetune pre-trained models
  - Visualize and quantify results

**Pre-requisites**

To be able to get the most out of this training we expect you to be aware of: 

*   The subject of AI 
*   The importance of data 

**Training instructions**

The training is primarly performed individially but you will be placed in a group.

There will be some group questions and exercises but you are expected to performe the tasks your-self.

There is a Common Terminology section in the end of your Colab document. The concepts or wording available in the Common Terminology section will be marked with an (*)   

There are also some links in the document if you want to learn more in the different sections

Let us know if you have any questions or your group members – **but first google it!** "Googling " is one of the most common ways that data scientists work with understanding new techniques and ways of working.

**Duration**
*  Expected time to finish the training is in total 3 hours. 

**The challenge** 

*  The challenge in this training, is to finetune the pre-trained model to the use case and dataset - capable of **image classification**, see below for explanation. We will also later on in this training go through more on the benefits of working with a pre-trained model. 
*   In this case you will work with improving/training the model using a data set containing different images including scenes.
*   The outcome of your work will result in a model that can classify "nature scenes" with a higher accuracy.

**Image classification**


So why did we choose image classification for this training? 
*  Image classification is a technique that is used to classify or predict the class of a specific object in an image. Image classification is one of the most important applications of computer vision. The main goal of this technique is to accurately identify the features in an image.  Its applications range from classifying objects in self-driving cars to identifying blood cells in the healthcare industry, from identifying defective items in the manufacturing industry to build a system that can classify persons wearing masks or not.

*  Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. To learn more: 
https://en.wikipedia.org/wiki/Computer_vision#:~:text=Computer%20vision%20is%20an%20interdisciplinary,human%20visual%20system%20can%20do.








# **Lets start the training !**  

**How to train a pre-trained model**

To train a model you usually need to plan according to the following steps below. The first three steps will set the foundation for what you will be able to train your model on and what results you will be able to expect. 

We will use this structure and go through the steps in the training one by one.

1.  [Define adequately our problem (objective, desired outputs…).](#main) 
2. [Setup the computing environment](#computing_env)
3. [Gather data](#computing_env) 
4. [Prepare the data](#data_preparation)
5. [Train the model and choose a measure of success.](#training) - In this training the measure of succes is to have a model with a low error rate.
6. [An overview of how a model learns](#results).


## **1.Define the problem**

A problem well defined is a problem half-solved. 

Understanding the problem and developing the requirements isn't something you typically get right on the first attempt; this is often an iterative process where we initially define a set of rough requirements and refine the detail as we gain more information. 

By asking and aswering the five questions you are in a good way to be able to define a problem. 

1. What is the nature of the problem that requires solving?
2. Why does the problem require a solution?
3. How should solutions to the problem be approached?
4. What aspect of the problem will a deep learning model solve?
5. How is the solution to the problem intended to be interacted with?

Since we already have a defined problem or in this case a challenge: **To finetune a pre-trained model with the aim to classify "scenes" with a high accuracy.**

We will go ahead with setting the computing environment

<a name='computing_env'></a>
## **2. Setting up the computing environment**

**Change the runtime setting of your colab notebook to GPU*:**
Graphics Processing Units (GPUs), computing power, can significantly accelerate the training process for many deep learning models. Training models for tasks like image classification, video analysis, and natural language processing involves compute-intensive matrix multiplication and other operations that can take advantage of a GPU's massively parallel architecture.

Training a deep learning model that involves intensive compute tasks on extremely large datasets can take days to run on a single processor. However, if you design your program to offload those tasks to one or more GPUs,  you can reduce training time to hours instead of days.

**How to change your runtime setting to GPU* in your environment**

The first thing you want to do is to in this Colab page go to the menubar and follow the following steps "Körning > Ändra körningstyp > Välj "GPU". This will set the google colab environment up with a free GPU that will be used to train your models. If you have CPU selected it will still work, only much slower.

## **3. Gather the Dataset**

Gathering and preparing data requires great care. It usally involves taking below steps into considaration.  

1. Determine what information you want or need to Collect to solve the problem
2. Set a timeframe for data collection
3. Determine your data collection method
4. Collect the data
5. Analyze the data and implement your findings

The correct gathering of data is completely dependent on the problem you would like or need to solve. 

**Domain of the problem**

Depending upon the domain of your problem, you may either use standard datasets collected by others or start collecting your own data. As you intend to use neural networks, then you should be aware that your dataset should be large, or else those techniques may not be very useful.
What is the domain of your problem? Is it related to Computer Vision, Natural Language Processing, Sensor data, or some XYZ?

In our case its related to Computer Vision for that reason we need to gather a large set of images. There are various ways to gather image data and you need to specify what images that are relevant for solving the problem.  

It is important to plan ahead on how much data one may acquire. You cannot just store in a hard-disk and save it in directories and assume you are ready to go. A lot of effort goes in data storage, organization, annotation and pre-processing. 

**Data Privacy** 

Data privacy is an important part if individual people’s personal information is to be stored. Some data can be stored in simple text files but for other you may want to develop a database (or a light version) for faster access. If the data is too big to fit in memory, then big data techniques may need to be adopted (e.g. Hadoop framework). 

For this training we chosen not to include any personal data and we have also chosen to a pretty small dataset so its possible to store in a laptop. You will learn more about the data for this training as we go along the training. 


**Instructions to add the dataset to your drive**

1. Download the dataset from the dropbox folder by clicking here
https://www.dropbox.com/s/gf6d2t1zbogjjgg/AI_IN_PRACTICE.zip?dl=1
2. Upload the **AI_IN_PRACTICE.zip** file to your google drive. 
3. Make sure you have a file called **AI_IN_PRACTICE.zip** in your **Mydrive** (In swedish **Min enhet**) in google drive 

You will learn about the data traits later in the training. 

Now you are all set to start running the code cells one by one ! The cells are they grey "boxes" that you will find throughout the Colab document. The fast and cool way to run a cell is to press shift+enter/ctrl + enter. 





In [None]:
#The code in this cell connects your google drive space to the jupyter notebook and sets up fastai in your colab environment.
#This will enable the code in your jupyter notebook to access the dataset in your google drive. 

#Install fastbook(contains fastai setup) in the colab environment. 
!pip install -Uqq torchtext==0.8.1
!pip install -Uqq fastbook

#Importing fastai into the jupyter notebook
import fastbook

#setup fastai and mounts your google drive space in /content/gdrive 
fastbook.setup_book()

print('Setup complete')

Now your google drive is mounted at /content/gdrive/MyDrive. It is only accesable through your Jupyter notebook for your view. 
Click on the above link to make sure your drive is mounted in the right location.

If you experince any error,  let the organizer know.


Now you should run the next cell to unzip/extract the dataset. 

In [None]:
#When pressing the run button the code in this cell will unzip the AI_IN_PRACTICE.zip dataset and create a scenes folder in your google drive in MyDrive.

#The code below Unzips the AI_IN_PRACTICE.zip file
!unzip -q '/content/gdrive/MyDrive/AI_IN_PRACTICE.zip' -d '/content/gdrive/MyDrive/'
print('The unzip is complete now and you can move to the next cell !')
#This might take a while - Do not rerun the cell in between
#When the code is executed correctly you will see this message "The unzip is complete now and you can move to the next cell !"
#If you still do a rerun you will get the following message: "replace /content/gdrive/MyDrive/AI_IN_PRACTICE/scenes/train/sea/1.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename:" press "A" and press Enter

Now we have the unziped dataset in the location /content/gdrive/MyDrive/AI_IN_PRACTICE/scenes

Click on the link above to make sure you have scenes folder in your MyDrive. You should be able to see the different folders in the scenes dataset such as models, train, train_medium and valid.

If you experience any error when you click the link, it means that the dataset is not at the right location.

**Import the necessary packages**

In python, which fastai* uses as a building block, we import packages (containing code) to our code using import statement as shown below for eg : import os 

It is a convinient way to import all the open source packages that are interesting and important for solving the challenge. There are many open source packages being produced and which ones to use for the specific problem needs to be explored. 

The importance of the packages we are using are described below in the code cell. We are going to work with the fastai libary which sits on top of PyTorch*. The fastai libary provides many useful functions that enable us to quickly and easily build neural networks  (NN) and train our models. To learn more about NN please watch the move through this link: https://www.youtube.com/watch?v=bfmFfD2RIcg


In [None]:

#The code in this cell imports all the necesssary packages useful for training your model.

from fastbook import *
# imports fastai vision package to work with images 
from fastai.vision.all import *

# imports fastai metrics like error_rate
from fastai.metrics import error_rate # 1-accuracy

#import numpy libraries for matrix manipulations 
import numpy as np 

import os
from sklearn.metrics import confusion_matrix
from sklearn.utils import shuffle

#import plotting and visualization libraries
import matplotlib.pyplot as plt

#import libraries to read and write images
import cv2  
     
matplotlib.rc('image', cmap='Greys')
print('Good Job ! You are on the right track')

<a name='data_preparation'></a>
# **4. Data Preparation**

Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It is an important step prior to processing and often involves reformatting data, making corrections to data and the combining of data sets to enrich data.

Data preparation is often a lengthy undertaking for data professionals or business users, but it is essential as a prerequisite to put data in context in order to turn it into insights and eliminate bias resulting from poor data quality.

For example, the data preparation process usually includes standardizing data formats, enriching source data, and/or removing outliers.

Dataset preparation can be divided into five steps 

1. [Data Exploration](#data_exploration)
2. [Data Cleaning](#data_cleaning) 
3. [Data Augmentation](#data_augmentation)
4. [Data Splitting](#data_splitting)
5. [Visualize data](#data_visualization) 





<a name='data_exploration'></a>
## **4.1. Data Exploration**

In the data exploration stage, we understand and try to answer some basic questions about the dataset. The question are listed below and are there for you to get a fast overview of the dataset you're handeling. In some cases this will give you enough information to understand if your dataset will be able to solve your problem or not. 

1. How big is the dataset? 
2. How many train files and validation/test files do we have?
3. How many classes are there in the dataset ?
4. How many data samples are there per class ? 

To be able to answer the above questions, we need to let our code know where our dataset is located. We do that by running the below code cell.

**Location of Scenes Dataset**

In [None]:
#The code in this cell adds the location where the data exists to a path variable.
path = '/content/gdrive/MyDrive/AI_IN_PRACTICE/scenes'
print('Cell execution Completed')

In [None]:
#The code in this cell stores all the locations of the train and test images in the dataset. 

#gets image locations from scenes/train folder and save them to train_files 
train_files=get_image_files(path+'/train')

#get image locations from scenes/valid  folder and save them  to test_files 
test_files=get_image_files(path+'/valid')

print('Cell execution Completed')

If you need more information about the code in the code cells, Use doc() for more documentation. An example of how to use doc() is given below.

In [None]:
doc(get_image_files)

**Amount of files in the scenes dataset**

In [None]:
#The code in this cell prints the number of images used for training and test/validation. The numbers are fixed to the dataset.
print('Number of images used for training   '+ str(len(train_files)))
print('Number of images used for validation   '+ str(len(test_files)))

**Amount of Classes**

In [None]:
#The code in this cell prints the classes in our dataset
labels = os.listdir(path+'/train')
print(labels)

In [None]:
#The code in this cell counts the number of samples per class in the train dataset. Plotted blow in the chart. 
counts = [0]*len(labels)
for i in train_files:
  for j in range(0,len(labels)):
    if labels[j] in str(i):
      counts[j]= counts[j]+1
print('Counts extracted')

In [None]:
#The code below defines a function for plotting the number of samples per class
def plot_bar_counts():
    # this is for plotting purpose
    index = np.arange(len(labels))
    plt.bar(labels, counts)
    plt.xlabel('labels', fontsize=5)
    plt.ylabel('No of data samples', fontsize=15)
    plt.xticks(index, labels, fontsize=15, rotation=30)
    plt.title('Train data analysis')
    plt.show()

In [None]:
#Plots the bar code of the training samples
plot_bar_counts()


#[**4.2. Data Cleaning**](#data_cleaning) 

In this training , we will do the data cleaning in the next pilot session. 

<a name='data_augmentation'></a>
## **4.3. Data Augmentation**

Data augmentation is the technique of increasing the size of data used for training a model but also to create real life situations. For reliable predictions, the deep learning models often require a lot of training data, which is not always available. Therefore, the existing data is augmented in order to make a better generalized model.

Although data augmentation can be applied in various domains, it's commonly used in computer vision. Some of the most common data augmentation techniques used for images are:

**Position augmentation**
*   Scaling
*   Cropping
*   Flipping
*   Padding
*   Rotation 
*   Translation 
*   Affine tranformation (ex:warping)

**Color augmentation**
*   Brightness
*   Contrast 
*   Saturation 
*   Hue

**Fun fact**: Color augmentations are the basis for the  **Instagram filters** we use to make us look picture perfect :) 

Below we go through some of the techniques and visualize different augmentations using one sample image



In [None]:
import random

num = random.randint(0, len(train_files)-1)

#Load a random image to visiaulize the image augmentations
img = PILImage(PILImage.create(train_files[num]))

#show the image
show_image(img)

## **Random Crop Augmentaion**

Random crop is a data augmentation technique wherein we create a random subset of an original image. This helps our model generalize better because the object(s) of interest we want our models to learn are not always wholly visible in the image or the same scale in our training data.

In [None]:
# The code in this cell applies Randomized crop to the image loaded above
'''
RandomResizedCrop(n): Randomly crops an image to size (nxn)
'''
n=224
crop = RandomResizedCrop(n)
_,axs = plt.subplots(3,3,figsize=(9,9))
for ax in axs.flatten():
    cropped = crop(img)
    show_image(cropped, ctx=ax);
  

## **Crop pad**

Crop Pad is an additional augmentaqtion technique to increase the scenes data set by padding an image.


In [None]:
# The code in this cell applies crop_pad to the image loaded above  
_,axs = plt.subplots(1,3,figsize=(12,4))
for ax,sz in zip(axs.flatten(), [150, 300, 500]):
    show_image(img.crop_pad(sz), ctx=ax, title=f'Size {sz}');

## **Rotation Augmentation**

A source image is random rotated clockwise or counterclockwise by some number of degrees, changing the position of the object in frame. 
Random Rotate is a useful augmentation in particular because it changes the angles that objects appear in your dataset during training. Random rotation can improve your model without you having to collect and label more data.

In [None]:
# The code in this cell applies given rotations the image.

timg = TensorImage(array(img)).permute(2,0,1).float()/255.
def _batch_ex(bs): return TensorImage(timg[None].expand(bs, *timg.shape).clone())

'''

thetas - Angles which the original image is rotated to.

For ex: thetas = [-15,0,15]

Displays three images rotated to -15 degrees, 0 degrees and 15 degrees respectively

'''
thetas = [-30,-15,0,15,30]
imgs = _batch_ex(5)
deflt = Rotate()
listy = Rotate(p=1.,draw=thetas)
show_images( listy(imgs) ,suptitle='Manual List Rotate',titles=[f'{i} Degrees' for i in thetas])

## **Warping Augmentation**

Appling warping technique adds distorted images to the scenes dataset.   

In [None]:
scales = [-0.4, -0.2, 0., 0.2, 0.4]
imgs=_batch_ex(5)
vert_warp = Warp(p=1., draw_y=scales, draw_x=0.)
horz_warp = Warp(p=1., draw_x=scales, draw_y=0.)
show_images( vert_warp(imgs) ,suptitle='Vertical warping', titles=[f'magnitude {i}' for i in scales])
show_images( horz_warp(imgs) ,suptitle='Horizontal warping', titles=[f'magnitude {i}' for i in scales])

**Flip**

Flips a batch of images.

In [None]:
with no_random(32):
    imgs = _batch_ex(2)
    deflt = Flip()
    show_images( deflt(imgs) ,suptitle='Default Flip')
    

Let's now batch all these augmentation/transformation together and apply them in the code cell below. 

We also change the size of the images to make sure every image is of the same shape and size (normalize). This allows the GPU to apply the same instructions on all the images. 

When we normalize the images, the pixel channels standard deviations are reduced to help train models. If you do have problems training your model, one thing to do is check if you have normalized it. 

***NOTE: The types of data augmentations are very specific to the dataset. In our case we only rotate the image by a smaller degree to maintain representability of the real world. If we consider Medical images (Ex:cell Images), It is okay to rotate them by a larger degree(ex: 180 degrees)***


In [None]:
#tfms = None
#The code in this cell collects all the data augmentations into one variable which can be applied to our dataset in the later stages.
tfms =[*aug_transforms(size=224, min_scale=0.75, max_rotate=10, max_zoom=1.05, max_warp=.1, do_flip=True), Normalize.from_stats(*imagenet_stats)]


In [None]:
#if you are running on GPU instance , this code cell will work, Otherwise it will throw an error ! 
#if you are not running on GPU, comment the second line (y = y.to(device=torch.device("cuda:0"))). 
y = _batch_ex(9)
y = y.to(device=torch.device("cuda:0"))
for t in tfms: y = t(y, split_idx=0)
_,axs = plt.subplots(1,5, figsize=(12,3))
for i,ax in enumerate(axs.flatten()):
   show_image(y[i], ctx=ax)


<a name='data_splitting'></a>
## **4.4 Data Splitting**

Now its time to split your data for training and validation. The training data usually contains 70% of the image dataset and the trainingValidation dataset the remaining 30%. 

Run the code below to perform the splitting. 

In [None]:
#The code in this cell loads the whole train and valid images into a data variable. Also applies the tfms variable that we created in the previous cells. 
np.random.seed(42)

'''
The method below loads train and valid subfolders in the code  (data =)

train : name of the  train subfolder 
valid : name of the valid subfolder
item_tfms : transforms performed on the individual image
batch_tfms : transforms performed on the batch 
bs : batch size
 
'''
data = ImageDataLoaders.from_folder(path,train='train', valid ='valid', item_tfms=Resize(224), batch_tfms=tfms, bs=10)

Before we move on to next code cell, we need to be clear with the below question. Believe me, Its Important ! 

**What Is  Batch Size?** 

To refresh you menemory please look at the video explaining NN here:  https://www.youtube.com/watch?v=bfmFfD2RIcg

*  The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.

*  Think of a batch as a for-loop iterating over one or more samples and making predictions. At the end of the batch, the predictions are compared to the expected output variables and an error is calculated. From this error, the update algorithm is used to improve the model, e.g. move down along the error gradient.

*  A training dataset can be divided into one or more batches.
*  Batch Size(bs) can be changed in this code cell [here](#data_splitting). It's value is currently set to 10.

To get more information about a Batch Size please follow the link: https://www.youtube.com/watch?v=U4WB9p6ODjM


## **4.5 Visualize Data**

By Visualizing the data you can confirm that you are on the right track e.g. regarding the labeling 

Do your images match the correct labels? 
If yes, then you have succeeded ! 

In [None]:
#The below line of code shows a random batch of images 
data.show_batch(figsize=(10,10))

<a name='training'></a>

# **5.Training/Fine-Tuning the model using Transfer Learning**

**Welcome back!**

Now we will start with fine-tuning of our pretrained model. This means that we are building a model which will take images as input and will output the predicted probability for each of the categories, in this case, it will get 6 probabilities and class with the maximun probability is chosen as the label. For this task we will use a technique called Transfer Learning.  To learn more about transfer learning please follow this link: https://www.youtube.com/watch?v=5T-iXNNiwIs


**What is Transfer Learning?**


*   Transfer learning is a technique where you use a model trained on a very large dataset (usually ImageNet in computer vision) and then adapt it to your own dataset.

*  The idea is that the model has learned to recognize many features on all of this data, like ImageNet, and that you will benefit from this knowledge, especially if your dataset is small. 

*   In practice, you need to change the last part of the model to be adapted to your own number of classes. 

*   Most convolutional models end with a few linear layers (a part we will call the head).

*   The last convolutional layer will have analyzed features in the image that went through the model, and the job of the head is to convert those in predictions for each of your classes. 

*   In transfer learning one keeps all the convolutional layers (called the body or the backbone of the model) with their weights pretrained on ImageNet but will define a new head initialized randomly.

**Two-Phase Training of the model**
*   We will train the model in two phases: first we freeze the body weights and only train the head (to convert those analyzed features into predictions for our own data). In the second phase we unfreeze the layers of the backbone (gradually if necessary) and fine-tune the whole model (possibly using differential learning rates).




 

For this training we have chosen a pretrained model called resnet34, it has previously been trained on 1,5 million of images. This means that we don't have to start with a model that knows nothing, we start with a model that knows something about recognizing images already. The 34 stands for the number of layers in the network, a smaller model trains faster. There is a bigger version is called resnet50. 

With below code our model will be able to train with the resnet34.

In [None]:
#The code in this cell will use a cnn_learner method. With this line of code we tell a learner to create a cnn model for us, in this case it's a resnet34. 
 
#The cnn_learner method helps you to automatically get a pretrained model from a given architecture, in this case resnet34
learn = cnn_learner(data, models.resnet34, loss_func=CrossEntropyLossFlat(), metrics=[error_rate, accuracy])


###  **Wait !!!**

There seems to be a lot of terms in the code that are complicated in the previous cell. Let's review each of them a bit 


*  **CNN** : Convolutional Neural Networks are a class of neural networks that are widely used in the areas of images recognition, images classifications. Objects detections, recognition faces etc. A convolution is the  basic operation of a CNN. For more explanation, watch the below video.
https://www.youtube.com/watch?v=YRhxdVk_sIs&t=419s

*  **Cross-Entropy Loss**  : Cross-entropy loss is a loss function used for this dataset. It  has two benefits:

> 1. It works even when our dependent variable has more than two categories.
> 2. It results in faster and more reliable training.

* **Error rate**: 
        error_rate = 1 - accuracy 
        accuracy = no of correctly classified samples / all samples

      

The below code-cell shows the detailed architecture of the deep neural network model(in our case resnet34) we are training. Knowing the architecture of a DNN(deep neural network) is useful in designing better neural network architectures for more advanced usecases.

In [None]:
#The code in this cell shows the architecture of the model( in our case CNN) that is being trained.
learn.model

## **Phase 1: Finetune the head of the model**

Now we enter the first phase of the training which means that we first we freeze the body weights and only train the head (to convert those analyzed features into predictions for our own data). We will train our models by letting it cycle through all our data 6 times. The number 6 is the number of times we let the model go through all the data. We can see the training loss which is telling us how much is the model learning from the data. The validation loss tells us how generalizable is the model.

In both the cases, training and validation loss, it's good to have a decreasing trend.

1 cycle = 1 epoch

It will take sometime to train your model.

Sit and relax after running the below cell ! :) You did a great job !

Or you can read [here](#cycles)  on how to choose the number of cycles/epochs.

In [None]:
#The code in this cell will run the training job for 6 epochs.
learn.fit_one_cycle(6)

Ideally if your model is learning something, you should see a certain trend. Your train_loss and valid_loss  and error_rate should be decreasing while accuracy should be increasing.



In [None]:
#Plots the loss for both training  and validation dataset
learn.recorder.plot_loss()

In [None]:
#The code in this code cell is saving the model to the disk with name stage-1
learn.save('stage-1')

Observe the decreasing trend in the plots above !!


<a name='cycles'></a>
### **How do we select the number of epochs?**


*  Often you will find that you are limited by time, rather than generalization and accuracy, when choosing how many epochs to train for. So your first approach to training should be to simply pick a number of epochs that will train in the amount of time that you are happy to wait for. Then look at the training and validation loss plots, as shown above, and in particular your metrics, and if you see that they are still getting better even in your final epochs, then you know that you have not trained for too long. In this situation you can increase the number of epochs you are training for.

*  If you have the time to train for more epochs, you may also want to instead use that time to train more parameters—that is, use a deeper architecture.

Now we successfully finetuned our model. In order not to lose our progress, let's save our trained model in preset location. The model will be saved on your google drive at /content/gdrive/MyDrive/scenes/models

## **Phase 2: Unfreezing and fine-tuning**

As mentioned above, training is a two-phase process. In the first training, we train only last layer of the model. It’ll never overfit and will give good results, but to really make the best use of the model, we unfreeze and fine tune all the layers in the model to train it better.

Finetuning all the layers of the model let's the model weights of all the layers finetuned to the features of the scenes dataset. This makes the model perform better on the scenes dataset.

In [None]:
#The code in this code cell unfreezes and trains the whole resnet34 model. We now allow for the whole model to be trained, not just the last layer. 
learn.unfreeze()

**Finding the best learning rate**

Finding a good learning rate is one important problem faced by the machine learning community. Learning rate decides how fast should the model weights be updated. It is mostly trial and error based but fastai has come up with a tool called learning rate finder which can give us the most appropriate learning rate. 

For a more intuitive explanation on how the learning rate finder works, refer to the below link
(https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html)

The below cell plots a curve showing the learning late versus loss. 






In [None]:
#The code in the code cell runs the learning rate finder provided by fastai
learn.lr_find()

Now you see a text above the plot which suggests a learning rate range. Change the lr_min value in the code cell below  to the suggested lr_min value in the plot.

For example if you the Suggested LRs are given as below :

SuggestedLRs(lr_min=0.004786301031708717, lr_steep=0.0014454397605732083)

Then, change the lr_min value to 0.0047 below in the code cell.


In [None]:
#Change the value of lr_min to the value suggested in the previous plot.
lr_min = 1e-4

Now, we train the model again after unfreezing all the layers of the pretrained model and also using the learning rate from the learning rate finder.

In [None]:
#The code in the code cell here runs a training for 5 epochs.
learn.fit_one_cycle(5, lr_max=slice(1e-6,lr_min))

Now we successfully finished phase-2 training of our model.

 In order not to lose our progress, let's save our trained model in preset location. The model will be saved on your google drive at /content/gdrive/MyDrive/scenes/models

In [None]:
#The code in this cell is saving the model to the disk with name stage-2
learn.save('stage-2')

<a name='results'></a>
# **Results Intepretation and Analysis**

*Now comes the most interesting part!*



We will first see which were the categories that the model was most confused with. We will try to see if what the model predicted is reasonable or not. Furthermore, we will plot a confusion matrix where we can see and learn more about the mistakes that the model made. We will explain the confusion matrix a bit further down. 

In [None]:
#The code in this cell when exected performs an analysis of the model performance on all the classes. The results of the analysis are shown in the next code cells.
interp = ClassificationInterpretation.from_learner(learn)

losses,idxs = interp.top_losses()

print('Interpretation and Analysis of Results done ! ')

In [None]:
# The code in this code cell shows some sample images, actual ground truth used for training and the predicted label.
# If the predicted label and the ground truth match, the labels are shown in green.
# If the predicted label and the ground truth do not match, the labels are shown in red.
learn.show_results()

So, one of the most interesting things we can do is called plot top losses. What this does is plot out when the model was very certain about a certain class, but was wrong. This means you are going to have a high loss. In other words; the model was confident about an answer, but answered wrong. The title of each image shows: prediction, actual, loss, probability of actual class. 

In [None]:
#The code in this cell shows the images the model is most confused on.

'''For every image, it shows 
1. Prediction: The  label predicted by the model.
2. Actual: The actual label in the dataset.
3. Loss :  The cross entropy loss of the image. More loss means the model is very certain about a wrong prediction.
4. Probability : How certain is the model's prediction

'''
interp.plot_top_losses(9, figsize=(15,11))



The confusion matrix is a way to visualuize your results and get an understanding for where your model makes mistakes and how frequent they are.


The confusion matrix so interesting that we want everyone to understand it properly. We gather in the main group to discuss it.
 

 If you see that people are still working, grab a coffee and come back ! :)

In [None]:
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

The most confused grabs the most common wrong predictions out of the confusion matrix. This can allow you for example as a domain expert, to understand based on your expertise, is this something that the model should be confused about. We can all understand that a glacier in many cases may be easy to confuse with mountains as glaciers many times exist in mountains. 


In [None]:
#The code in this code cell gives us the classes on which the model is confused in descending order.
'''
For example:
('glacier', 'mountain', 131)

What we can infer from the above line is that  131 glacier images have been predicted as mountain images.

'''
interp.most_confused(min_val=10)

## **Let's see if you can get  better Accuracy ! Try it out**


<a name='data_cleaning'></a>
## **Data Cleaning**

Oops ! seems like the organizers have mixed up two datasets in rush :P. 

 Can you try to clean it and see if that gives any accuracy gains ?

 **TIP** : The mix up happened with mostly the glacier and building classes.

 Other suggestions which might help in the accuracy gain:

*  Use the train_medium dataset in  /content/gdrive/MyDrive/AI_IN_PRACTICE/scenes provided which has more data
*  Increase the batch size 



---

**Congratulations!!**

---



You have completed the training :) 

Please return to the main group. 
Please be ready to let us know your error rate.

# **Common Terminology used in this training**

*   **CPU**: A central processing unit, also called a central processor, main processor or just processor, is the electronic circuitry within a computer that executes instructions that make up a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program.

*   **GPU**: A graphics processing unit, is a specialized, electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel.

*  **fastai** : is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. To learn more follow this link: https://docs.fast.ai/

*   **PyTorch** : Is a Python-based scientific computing and deep learning framework. It's a replacement for NumPy to use the power of GPUs. It's a deep learning research platform that provides maximum flexibility and speed.

*  **Google Colab** : At this moment you are in a google colab environment and you will be using this platform to run code and start learning about AI. Colab is a cloud based working environment that allows to to collaborate and train your models. A great environment to try things out and test. 

*  **Python** : Python is an interpreted programming language currently being used for any machine learning projects. Many of the open source Machine learning packages are extensively available in python only because of which it became a go-to language for Machine learning prototyping.

* **epoch** : An epoch refers to one cycle of training through the full training dataset.

*  **Imagenet**: ImageNet is a dataset consisting of 1.3 million images of various sizes around 500 pixels across, in 1,000 categories, which took a few days to train

*  **Pretrained model** : The model that has been trained from scratch on a very large dataset(usually ImageNet in computer vision) is called the pretrained model. To learn more about pretrained models, check the link below.
https://towardsdatascience.com/how-do-pretrained-models-work-11fe2f64eaa2

## **Acknowledgements**

1. A huge thanks to Fastai for providing a framework for fast prototyping.
2. Thanks to Kaggle and Intel for proving the scenes classification dataset