<a href="https://colab.research.google.com/github/owenpb/Kaggle-Cassava-Leaf-Classification/blob/main/kaggle-cassava-leaf-classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Kaggle competition: Cassava Leaf Disease Classification](https://www.kaggle.com/c/cassava-leaf-disease-classification/)
#### *Identifing the type of disease present on a Cassava Leaf image*

In this competition, we explore a dataset consisting of 21,367 labeled photographs of Cassava leaves collected during a regular survey in Uganda. Images were crowdsourced from farmers and annotated by experts at the National Crops Resources Research Institute (NaCRRI) in collaboration with the AI lab at Makerere University, Kampala.

Our task is to classify each image into one of 5 categories: one indicating a **healthy** leaf, and the remaining four categories indicating different diseases. These are **Cassava Bacteria Blight** (CBB), **Cassava Brown Streak Disease** (CBSD), **Cassava Green Mottle** (CGM), and **Cassava Mosaic Disease** (CMD).

This Notebook contains:

1. Exploratory Data Analysis (EDA) of the Cassava leaf image dataset, and demonstration of image augmentation techniques using the Albumentations library.

2. Training and finetuning a ResNet-152 model from torchvision (PyTorch) with GPU P100 accelerator.

3. Ensembling out-of-fold (OOF) predictions with test time augmentations (TTA).

4. Preparing our final submission for this competition.

## Import libraries:

First we will import all necessary libraries and set CUDA as our device (since we will be training our model with GPU).

In [1]:
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader
from torch.utils.data import Dataset

from torchvision.models import resnet152, ResNet152_Weights

import albumentations
from albumentations.pytorch.transforms import ToTensorV2

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split, StratifiedKFold

import os
import copy
import glob
import json
import random
import pathlib
from PIL import Image
import pickle

In [2]:
%pip install torchinfo
from torchinfo import summary

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


In [3]:
BASE_PATH = '/content/drive/MyDrive/Cassava-Leaf/'
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f'Device: {DEVICE}')

Device: cuda


# 1. Exploratory Data Analysis (EDA) and Image Augmentations

## Loading images:

First, let's read our training data (consisting of image ids and their corresponding label) into a pandas dataframe, and display the first 10 entries:


In [4]:
df = pd.read_csv(BASE_PATH + 'train.csv')
df.head(10)

Unnamed: 0,image_id,label
0,1000015157.jpg,0
1,1000201771.jpg,3
2,100042118.jpg,1
3,1000723321.jpg,1
4,1000812911.jpg,3
5,1000837476.jpg,3
6,1000910826.jpg,2
7,1001320321.jpg,0
8,1001723730.jpg,4
9,1001742395.jpg,3
