<h1>Cassava leaf identification - Visually inspecting images</h1>
    
In this notebook I'll:

<ul>
    <li>Inspect images of all the categories (different diseases and healthy plants) to get a sense of how difficult it is even for a human to visually classify an image</li>
    <li>Try different data agumentations and their parameters to create realistic augmentations that help enhance the model</li>
</ul>

In [None]:
# Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os
import cv2

import albumentations as A

In [None]:
# Definitions

COMPETITION_DIR = '../input/cassava-leaf-disease-classification'
TRAIN_DIR = '../input/cassava-leaf-disease-classification/train_images'

<h2>Read data</h2>

In [None]:
# Train data

train = pd.read_csv(os.path.join(COMPETITION_DIR, 'train.csv'))
train.head()

In [None]:
# Create a series to map labels to labels' descriptions

disease_labels = pd.read_json(os.path.join(COMPETITION_DIR, 'label_num_to_disease_map.json'), typ='series')
disease_labels

<h2>Basic numbers<h2>

In [None]:
# How many images do we have for every disease?

frequency = train.groupby('label', as_index=False).agg(count=('image_id', 'count'))
frequency['disease'] = frequency.label.map(disease_labels)
frequency['fraction'] = frequency['count'] / frequency['count'].sum()
frequency = frequency[['label', 'disease', 'count', 'fraction']]
frequency

The train set is imbalanced. We see that more than 60% of images correspond to the <b>Cassava Mosaic Disease (CMD)</b>. The lease frequent disease is <b>Cassava Bacterial Blight (CBB)</b> with roughly 5% of the images. The other two diseases, CBSD and CGM have similar percentage around 10%. Healthy images represent 12% of the total.

<h2>Show images</h2>

In [None]:
# Utility functions

def show_single_image(idx):
    
    img = plt.imread(os.path.join(TRAIN_DIR, train.loc[idx,'image_id']))
    label = train.loc[idx, 'label']
    plt.figure(figsize=(10,8))
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    plt.title(f'{idx}: {disease_labels[label]}')
    plt.tight_layout()
    
    
def show_random_images(disease=None, nrows=3, ncols=5):   
    
    if disease is None:
        population = train.index.values
    else:
        population = train.loc[train.label == disease].index.values
    
    indices = np.random.choice(population, nrows * ncols)
    
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(5*ncols, 4*nrows), constrained_layout=True)
    ax = ax.reshape(-1)
                           
    fig.suptitle(disease_labels[disease])
    
    # Iterate and plot random images
    for i in range(nrows*ncols):
        img = plt.imread(os.path.join(TRAIN_DIR, train.loc[indices[i],'image_id']))
        label = train.loc[indices[i], 'label']
        ax[i].imshow(img, cmap='gray')
        ax[i].set_title(f'{indices[i]}')
        ax[i].axis('off')
    
    #plt.tight_layout()
    plt.show()
    

def show_images(images, labels, nrows, ncols, suptitle=''):   
    
    assert len(images) == nrows * ncols
    assert len(labels) == nrows * ncols
    
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(5*ncols, 4*nrows), constrained_layout=True)
    ax = ax.reshape(-1)
                           
    fig.suptitle(suptitle)
    
    # Iterate and plot random images
    for i in range(nrows*ncols):
        ax[i].imshow(images[i], cmap='gray')
        ax[i].set_title(f'{labels[i]}')
        ax[i].axis('off')
    
    plt.show()

def read_train_image(idx):
    file_path = os.path.join(TRAIN_DIR, train.loc[idx, 'image_id'])
    image = cv2.imread(file_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    return image
    

<h2>Disease inspection</h2>

Let's show some images to check how difficult it is to diagnose the differente diseases by visual inspection.

In [None]:
# Seed for reproducibility

np.random.seed(42)

<h3>Cassava Bacterial Blight</h3>

At first, angular, water-soaked spots occur on the leaves which are restricted by the veins; the spots are more clearly seen on the lower leaf surface. The spots expand rapidly, join together, especially along the margins of the leaves, and turn brown with yellow borders. Droplets of a creamy-white ooze occur at the centre of the spots; later, they turn yellow. 

In [None]:
# Cassava Bacterial Blight (CBB)
show_random_images(disease=0, nrows=3, ncols=4)

<h3>Cassave Brown Streak Disease</h3>

Symptoms of cassava brown streak disease appear as patches of yellow areas mixed with normal green colour. This phenomenon is commonly referred to as chlorosis. It produces characteristic yellow or necrotic vein banding on leaves which may enlarge and join to form comparatively large yellow or necrotic patches. The yellow patches are more prominent on mature (bottom) leaves than younger ones. The infected leaves do not become distorted in shape as occurs with leaves infected by Cassava mosaic disease. Advanced symptoms on the leaves become an irregular yellow blotchy chlorosis that is most pronounced in the periphery (margins or edge) of lower leaves. 

In [None]:
# Cassava Brown Streak Disease (CBSD)
show_random_images(disease=1, nrows=3, ncols=4)

<h3>Cassava Green Mottle</h3>

Young leaves are puckered with faint to distinct yellow spots, green patterns (mosaics), and twisted margins.
    

In [None]:
# Cassava Green Mottle (CGM)
show_random_images(disease=2, nrows=3, ncols=4)

<h3>Cassava Mosaic Disease (CMD)</h3>

Characteristic leaf mosaic patterns that affect discrete areas and determined at an early stage of leaf development. The chlorotic areas fail to expand fully so that stresses set up by unequal expansion of the lamina cause malformation and distortion. Severely affected leaves are reduced in size, misshapen and twisted, with yello areas separated by areas of normal green colour. The leaf chlorosis may be pale yellow or nearly white, or just discernibly paler than normal. The chlorotic areas are usually clearly demarcated and vary in size from the whole leaflet to small flecks or spots. Leaflets may show a uniform mosaic pattern or the pattern is localised to a few areas which are often at the bases of the leaflets.

In [None]:
# Cassava Mosaic Disease (CMD)
show_random_images(disease=3, nrows=3, ncols=4)

In [None]:
# Inspect a single image

show_single_image(17173)

<h2>Albumentations transformations</h2>

Play with different albumentations transformations and their parameters to choose the most appropriate for the dataset.

<h2>HorizontalFlip</h2>

In [None]:
img = read_train_image(4573)
transform = A.HorizontalFlip(p=1)
transformed = transform(image=img)['image']
show_images([img, transformed], ['Original', 'Flipped'], nrows=1, ncols=2)

<h2>RandomSizedCrop</h2>

In [None]:
img = read_train_image(1924)
images = [img]
labels = ['Original']
transform = A.RandomSizedCrop(min_max_height=[256, 600], height=512, width=512, p=1)
for i in range(8):
    transformed = transform(image=img)['image']
    images.append(transformed)
    labels.append(str(i + 1))

show_images(images, labels, nrows=3, ncols=3)

<h2>GaussianBlur</h2>

In [None]:
img = read_train_image(17665)
images = [img]
labels = ['Original']
transform = A.GaussianBlur(blur_limit=[5,11], p=0.5)
for i in range(8):
    transformed = transform(image=img)['image']
    images.append(transformed)
    labels.append(str(i + 1))

show_images(images, labels, nrows=3, ncols=3)

<h2>Rotate</h2>

In [None]:
img = read_train_image(17665)
images = [img]
labels = ['Original']
transform = A.Rotate(limit=20, p=1)
for i in range(8):
    transformed = transform(image=img)['image']
    images.append(transformed)
    labels.append(str(i + 1))

show_images(images, labels, nrows=3, ncols=3)

<h2>RandomBrightnessContrast</h2>

In [None]:
img = read_train_image(17665)
images = [img]
labels = ['Original']
transform = A.RandomBrightnessContrast(brightness_limit=0.3, contrast_limit=0.3, p=1)
for i in range(11):
    transformed = transform(image=img)['image']
    images.append(transformed)
    labels.append(str(i + 1))

show_images(images, labels, nrows=3, ncols=4)

<h2>CenterCrop</h2>


In [None]:
img = read_train_image(16705)
images = [img]
labels = ['Original']
transform = A.CenterCrop(width=512, height=512, p=1)
for i in range(11):
    transformed = transform(image=img)['image']
    images.append(transformed)
    labels.append(str(i + 1))

show_images(images, labels, nrows=3, ncols=4)

<h2>Several transforms</h2>

In [None]:
img = read_train_image(16705)
images = [img]
labels = ['Original']

transform = A.Compose([
    A.RandomResizedCrop(512, 512),
    A.Transpose(p=0.5),
    A.HorizontalFlip(p=0.5),
    A.VerticalFlip(p=0.1),
    A.ShiftScaleRotate(p=0.5),
    A.GaussianBlur(blur_limit=[5,11], p=0.5),
    A.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2, p=0.5),
    A.RandomBrightnessContrast(brightness_limit=(-0.1,0.1), contrast_limit=(-0.1,0.1), p=0.5),
    A.CoarseDropout(p=0.5),
    A.Cutout(p=0.5)
], p=1.0)


for i in range(11):
    transformed = transform(image=img)['image']
    images.append(transformed)
    labels.append('T' + str(i + 1))

show_images(images, labels, nrows=3, ncols=4)