# Understanding the dataset

****Let us print the dataset

In [None]:
import os

In [None]:
print("There are following directories and files in this dataset")
print(*list(os.listdir("../input/cassava-leaf-disease-classification")),sep = "\n")

# Choosing the data for model training

****We will now count the number of images in the following directories:

In [None]:
import glob
train_images_jpg_format = glob.glob('../input/cassava-leaf-disease-classification/train_images/*.jpg')
len(train_images_jpg_format)

# Importing necessary libraries

**Import Pandas** - For data analysis 
**Import Fastai** - For training of deep learning model and predictions.

**Note**: We are using Fastai version 2 (Not previous version of Fastai- which is version 1).



In [None]:
import pandas as pd

import fastai
from fastai.vision.all import *


# Defining the variables and assigning the paths for this notebook

In [None]:
path = Path('../input/cassava-leaf-disease-classification')
image_path = Path('../input/cassava-leaf-disease-classification/train_images')
training_data_file = Path('../input/cassava-leaf-disease-classification/train.csv')
sample_submission_file = Path('../input/cassava-leaf-disease-classification/sample_submission.csv')

# Analysing the data


In [None]:
train_df = pd.read_csv(training_data_file)

In [None]:
print("Size of Training data \n", train_df.shape)
print("----------------------------------------------------------")
print("\nFirst few samples of data are \n",train_df.head())

Let us print the number of data samples with each output category

In [None]:
train_df['label'].value_counts()


# Selecting a subset of data for training purpose

We will select all the training data which has the output category as "1","2","4" and 0.196 % of training data which has the output category as "3" to have equal number of inputs with the same category of output.

In [None]:
train_df_output_3 = train_df[train_df['label']==3].sample(frac=0.196,random_state=111)
train_df_output_3.shape

Let us join the inputs selected from both the categories and call it a "new_df".

In [None]:
new_df = pd.concat([train_df[train_df['label']==0],train_df[train_df['label']==1],train_df[train_df['label']==2],train_df_output_3,train_df[train_df['label']==4]]).reset_index(drop=True)
new_df.shape

# Creating the image data loader

In [None]:
image_data_loader = ImageDataLoaders.from_df(new_df, path=image_path,
                               seed=42, fn_col=0, 
                               label_col=1, 
                               item_tfms=Resize(128), 
                               batch_tfms=aug_transforms(flip_vert=True, max_warp=0.), 
                               bs=128, val_bs=None, shuffle_train=True)

Let us check the device type of our "ImageDataLoader" to make sure that we are using "GPU"

In [None]:
image_data_loader.device

Let us check few random images from our ImageDataLoader's batch to make sure that images and labels appears correctly in it.

In [None]:
image_data_loader.show_batch()

# Trainnig the image recognizer model

We create a CNN (convolutional neural network) with the following specific details:

* What data we want to train it on? </br> Our data to be used for training is "image_data_loader"

* Which architecture to use? </br> We are using Resnet34

* what metric to use for our training evaluation? </br> We have specified it as "error_rate"

In [None]:
learn = cnn_learner(image_data_loader, resnet34, metrics=error_rate)

Let us train the model for 2 epochs

In [None]:
learn.fine_tune(2,freeze_epochs = 4)

In [None]:
learn.fit_one_cycle(4,slice(1e-5,1e-3))

> # Bring it on -  Test data !!

Defining the variables and assigning the paths for **test dataset**

In [None]:
test_image_files = Path('../input/cassava-leaf-disease-classification/test_images')

Let us create a **ImageDataLoader** of our test data set

In [None]:
image_data_loader_test = image_data_loader.test_dl(get_image_files(test_image_files))

Make the predictions using our trained model called "**learn**".
* Ignoring the first two outputs from the model, let us take our final result stored in variable "**predictions**"

In [None]:
_,_,results = learn.get_preds(dl = image_data_loader_test, with_decoded = True)

In [None]:
sub = pd.read_csv(sample_submission_file)
sub.head()

In [None]:
sub['label'] = results

In [None]:
sub.head()

In [None]:
sub.to_csv('my_submission_file.csv', index=False)