## General description

In this competition we work on developing solutions for identifying common diseases of cassava (plant). There are 4 different diseases and, of course, a plant could be healhy. As a resutl we have a classification problem with 5 classes.

In this notebook I'll explore the data, train a model using pytorch lightning and analyse the predictions.

![](http://www.naro.go.ug/files/images/crops-naro.jpg)

#### import libraries

In [None]:
package_paths = [
                 '../input/efficientnet-pytorrch-offline',
                 '../input/mlflow/mlflow-master'
                ]
import sys
for package_path in package_paths:
    sys.path.append(package_path)

In [None]:
#!pip install mlflow
#!pip install efficientnet_pytorch
#!pip install --quiet /kaggle/input/EfficientNet-PyTorch/EfficientNet-PyTorch-master
#!pip install --quiet /kaggle/input/mlflow/mlflow-master

In [None]:
from PIL import Image
from albumentations.core.composition import Compose
from albumentations.pytorch import ToTensorV2
from collections import defaultdict, deque
from efficientnet_pytorch import EfficientNet
from pytorch_lightning import Callback
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from torch import nn
from torch.autograd import Variable
from torch.utils.data import Dataset, TensorDataset, DataLoader
from torchvision.transforms import functional as F
from typing import Any, Dict, List, Union, Optional
import albumentations as A
import ast
import collections
import copy
import cv2
import datetime
import importlib
import json
import math
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pickle
# import pretrainedmodels
import pytorch_lightning as pl
%matplotlib inline
import random
import seaborn as sns
import shutil
import tempfile
import time
import torch
import torch.distributed as dist
import torch.nn.functional as F
import torch.utils.data
import torchvision
from tqdm.notebook import tqdm
sns.set_style('darkgrid')

In [None]:
#from pytorch_lightning.loggers.mlflow import MLFlowLogger
#import mlflow
#mlflow.__version__

#### helper functions

In [None]:
def visualize(images, transform):
    """
    Plot images and their transformations
    """
    fig = plt.figure(figsize=(32, 16))
    
    for i, im in enumerate(images):
        ax = fig.add_subplot(2, 5, i + 1, xticks=[], yticks=[])
        plt.imshow(im)
        
    for i, im in enumerate(images):
        ax = fig.add_subplot(2, 5, i + 6, xticks=[], yticks=[])
        plt.imshow(transform(image=im)['image'])

def set_seed(seed: int = 42) -> None:
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
set_seed()

## Data Exploration

In [None]:
path = "../input/cassava-leaf-disease-classification/"

In [None]:
train = pd.read_csv(f'{path}train.csv')

In [None]:
with open(f'{path}/label_num_to_disease_map.json', 'r') as f:
    name_mapping = json.load(f)
    
name_mapping = {int(k): v for k, v in name_mapping.items()}
name_mapping

As per description, there are 4 diseases and one class for healthy plants.

In [None]:
train.head()

In [None]:
train.shape

In [None]:
sns.countplot(y=train['label'].map(name_mapping), orient='v')
plt.title('Target distribution');

Interestingly the most common class belongs to one of diseases and not to healthy plants.

### Let's have a look at cassava

At first, lets have a look at images belonging to different classes

In [None]:
selected_images = []
fig = plt.figure(figsize=(16, 16))
for class_id, class_name in name_mapping.items():
    for i, (idx, row) in enumerate(train.loc[train['label'] == class_id].sample(4).iterrows()):
        ax = fig.add_subplot(5, 4, class_id * 4 + i + 1, xticks=[], yticks=[])
        img = cv2.imread(f"{path}train_images/{row['image_id']}")
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        plt.imshow(img)
        ax.set_title(f"Image: {row['image_id']}. Label: {row['label']}")
        if i == 0:
            selected_images.append(img)

As far as I can see, one of the common symptoms of desease is a change in color - usually yellow color with different patterns. We will need to be careful with augmentations.

By the way, let's see how different augmentation change the images.

In the first ror where are original images, in the second row there are augmented images. I selected one random image from each class.

In [None]:
#visualize(selected_images, A.HorizontalFlip(p=1))

In [None]:
#visualize(selected_images, A.ShiftScaleRotate(p=1))

In [None]:
#visualize(selected_images, A.Cutout(max_h_size=64, max_w_size=64, p=1))

## Bi-Tempered Logistic Loss function

In [None]:
def log_t(u, t):
    """Compute log_t for `u`."""

    if t == 1.0:
        return torch.log(u)
    else:
        return (u ** (1.0 - t) - 1.0) / (1.0 - t)


def exp_t(u, t):
    """Compute exp_t for `u`."""

    if t == 1.0:
        return torch.exp(u)
    else:
        return torch.relu(1.0 + (1.0 - t) * u) ** (1.0 / (1.0 - t))


def compute_normalization_fixed_point(activations, t, num_iters=5):
    """Returns the normalization value for each example (t > 1.0).
    Args:
    activations: A multi-dimensional tensor with last dimension `num_classes`.
    t: Temperature 2 (> 1.0 for tail heaviness).
    num_iters: Number of iterations to run the method.
    Return: A tensor of same rank as activation with the last dimension being 1.
    """

    mu = torch.max(activations, dim=-1).values.view(-1, 1)
    normalized_activations_step_0 = activations - mu

    normalized_activations = normalized_activations_step_0
    i = 0
    while i < num_iters:
        i += 1
        logt_partition = torch.sum(exp_t(normalized_activations, t), dim=-1).view(-1, 1)
        normalized_activations = normalized_activations_step_0 * (logt_partition ** (1.0 - t))

    logt_partition = torch.sum(exp_t(normalized_activations, t), dim=-1).view(-1, 1)

    return -log_t(1.0 / logt_partition, t) + mu


def compute_normalization(activations, t, num_iters=5):
    """Returns the normalization value for each example.
    Args:
    activations: A multi-dimensional tensor with last dimension `num_classes`.
    t: Temperature 2 (< 1.0 for finite support, > 1.0 for tail heaviness).
    num_iters: Number of iterations to run the method.
    Return: A tensor of same rank as activation with the last dimension being 1.
    """

    if t < 1.0:
        return None # not implemented as these values do not occur in the authors experiments...
    else:
        return compute_normalization_fixed_point(activations, t, num_iters)


def tempered_softmax(activations, t, num_iters=5):
    """Tempered softmax function.
    Args:
    activations: A multi-dimensional tensor with last dimension `num_classes`.
    t: Temperature tensor > 0.0.
    num_iters: Number of iterations to run the method.
    Returns:
    A probabilities tensor.
    """

    if t == 1.0:
        normalization_constants = torch.log(torch.sum(torch.exp(activations), dim=-1))
        activations = activations.transpose(0, 1)
    else:
        normalization_constants = compute_normalization(activations, t, num_iters)
        normalization_constants = normalization_constants.transpose(0, 1)
        activations = activations.transpose(0, 1)


    return exp_t(activations - normalization_constants, t)


def bi_tempered_logistic_loss(activations, labels, t1, t2, label_smoothing=0.0, num_iters=5):

    """Bi-Tempered Logistic Loss with custom gradient.
    Args:
    activations: A multi-dimensional tensor with last dimension `num_classes`.
    labels: A tensor with shape and dtype as activations.
    t1: Temperature 1 (< 1.0 for boundedness).
    t2: Temperature 2 (> 1.0 for tail heaviness, < 1.0 for finite support).
    label_smoothing: Label smoothing parameter between [0, 1).
    num_iters: Number of iterations to run the method.
    Returns:
    A loss tensor.
    """

    if label_smoothing > 0.0:
        num_classes = labels.shape[-1]
        labels = (1 - num_classes / (num_classes - 1) * label_smoothing) * labels + label_smoothing / (num_classes - 1)

    probabilities = tempered_softmax(activations, t2, num_iters)
    probabilities = probabilities.transpose(0, 1)

    temp1 = (log_t(labels + 1e-10, t1) - log_t(probabilities, t1)) * labels
    temp2 = (1 / (2 - t1)) * (torch.pow(labels, 2 - t1) - torch.pow(probabilities, 2 - t1))
    loss_values = temp1 - temp2

    return torch.sum(loss_values, dim=-1)

## Preparing classes for pytorch-lightning

Training neural nets in pytorch-lightning requires writing several classes. Some of them are pure Pytorch classes, some are from pl.

### Dataset class

In [None]:
class ImageClassificationDataset(Dataset):
    def __init__(
        self,
        image_names: List,
        transforms: Compose,
        labels: Optional[List[int]],
        img_path: str = '',
        mode: str = 'train',
        labels_to_ohe: bool = False,
        n_classes: int = 5,
    ):
        """
        Image classification dataset.

        Args:
            df: dataframe with image id and bboxes
            mode: train/val/test
            img_path: path to images
            transforms: albumentations
        """

        self.mode = mode
        self.transforms = transforms
        self.img_path = img_path
        self.image_names = image_names
        if labels is not None:
            if not labels_to_ohe:
                self.labels = np.array(labels)
            else:
                self.labels = np.zeros((len(labels), n_classes))
                self.labels[np.arange(len(labels)), np.array(labels)] = 1

    def __getitem__(self, idx: int) -> Dict[str, np.array]:
        image_path = self.img_path + self.image_names[idx]
        image = cv2.imread(f'{image_path}', cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        if image is None:
            raise FileNotFoundError(image_path)
        target = self.labels[idx]

        img = self.transforms(image=image)['image']
        sample = {'image_path': image_path, 'image': img, 'target': np.array(target).astype('int64')}

        return sample

    def __len__(self) -> int:
        return len(self.image_names)

### Augmentations

For now I chose some augmentations at random.

In [None]:
train_augs = A.Compose([
        A.RandomResizedCrop(512, 512),
        A.Transpose(p=0.5),
        A.HorizontalFlip(p=0.5),
        A.VerticalFlip(p=0.5),
        A.ShiftScaleRotate(p=0.5),
        A.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2, p=0.5),
        A.RandomBrightnessContrast(brightness_limit=(-0.1,0.1), contrast_limit=(-0.1, 0.1), p=0.5),
        A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255.0, p=1.0),
        A.CoarseDropout(p=0.5),
        A.Cutout(p=0.5),
        ToTensorV2(p=1.0),
    ], p=1.)
valid_augs = A.Compose([
        A.CenterCrop(512, 512, p=1.),
        A.Resize(512, 512),
        A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255.0, p=1.0),
        ToTensorV2(p=1.0),
    ], p=1.)

### PL datamodule

This class prepares data. Here we initialize data classes and write code for dataloaders. Notice that in `setup` I split data into train and valid.

In [None]:
class CassavaDataModule(pl.LightningDataModule):
    def __init__(self,
                 df,
                 train_augs,
                 valid_augs,
                 path):
        super().__init__()
        self.df = df
        self.train_augs = train_augs
        self.valid_augs = valid_augs
        self.path = path

    def prepare_data(self):
        pass

    def setup(self, stage=None):
        
        folds = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
        
        train_indexes, valid_indexes = list(folds.split(self.df, self.df['label']))[0]
        
        train_df = self.df.iloc[train_indexes]
        valid_df = self.df.iloc[valid_indexes]

        
        self.train_dataset = ImageClassificationDataset(image_names=train_df['image_id'].values,
                                                        transforms=train_augs,
                                                        labels=train_df['label'].values,
                                                        img_path=self.path,
                                                        mode='train',
                                                        labels_to_ohe=False,
                                                        n_classes=5)
        self.valid_dataset = ImageClassificationDataset(image_names=valid_df['image_id'].values,
                                                        transforms=valid_augs,
                                                        labels=valid_df['label'].values,
                                                        img_path=self.path,
                                                        mode='valid',
                                                        labels_to_ohe=False,
                                                        n_classes=5)

    def train_dataloader(self):
        train_loader = torch.utils.data.DataLoader(
            self.train_dataset,
            batch_size=16,
            num_workers=4,
            shuffle=True,
        )
        return train_loader

    def val_dataloader(self):
        valid_loader = torch.utils.data.DataLoader(
            self.valid_dataset,
            batch_size=32,
            num_workers=4,
            shuffle=False,
        )

        return valid_loader

    def test_dataloader(self):
        return None

### Defining the model.

In [None]:
class CassvaImgClassifier(nn.Module):
    def __init__(self, model_arch, n_class):
        super().__init__()
        self.model = EfficientNet.from_pretrained(model_arch)
        n_features = self.model._fc.in_features
        self.model._fc = nn.Linear(n_features, n_class)
    
    def forward(self, x, targets):
        logits = self.model(x)
        batch_size = targets.size()[0]
        targets = nn.functional.one_hot(targets, num_classes=5)
        loss = bi_tempered_logistic_loss(activations=logits, labels=targets, t1=0.4, t2=2.0)
        loss = loss.sum() / batch_size
        return logits, loss

### Main pl training class

In this class we define optimizers, schedulers and training itself.

In [None]:
class LitCassava(pl.LightningModule):
    def __init__(self, model):
        super(LitCassava, self).__init__()
        self.model = model
        self.metric = pl.metrics.Accuracy()
        self.learning_rate = 1e-4

    def forward(self, x, targets, *args, **kwargs):
        return self.model(x, targets)

    def configure_optimizers(self):
        optimizer = torch.optim.AdamW(self.model.parameters(), lr=self.learning_rate, weight_decay=0.001)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=2)

        return (
            [optimizer],
            [{'scheduler': scheduler, 'interval': 'epoch', 'monitor': 'valid_loss'}],
        )

    def training_step(
        self, batch: torch.Tensor, batch_idx: int
    ) -> Union[int, Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]]:
        image = batch['image']
        target = batch['target']
        logits, loss = self(image, target)
        score = self.metric(logits.argmax(1), target)
        self.log('train_loss', loss, on_step=False, on_epoch=True, logger=True)
        logs = {'train_loss': loss, f'train_accuracy': score}
        return {
            'loss': loss,
            'log': logs,
            'progress_bar': logs,
            'logits': logits,
            'target': target,
            f'train_accuracy': score,
        }

    def training_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        y_true = torch.cat([x['target'] for x in outputs])
        y_pred = torch.cat([x['logits'] for x in outputs])
        score = self.metric(y_pred.argmax(1), y_true)
        self.log('train_score', score, logger=True)
        logs = {'train_loss': avg_loss, 'train_accuracy': score}
        return {'log': logs, 'progress_bar': logs}

    def validation_step(
        self, batch: torch.Tensor, batch_idx: int
    ) -> Union[int, Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]]:
        image = batch['image']
        target = batch['target']
        logits, loss = self(image, target)
        score = self.metric(logits.argmax(1), target)
        self.log('val_loss', loss, on_step=False, on_epoch=True, logger=True)
        logs = {'valid_loss': loss, f'valid_accuracy': score}

        return {
            'loss': loss,
            'log': logs,
            'progress_bar': logs,
            'logits': logits,
            'target': target,
            f'valid_accuracy': score,
        }

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
        y_true = torch.cat([x['target'] for x in outputs])
        y_pred = torch.cat([x['logits'] for x in outputs])
        score = self.metric(y_pred.argmax(1), y_true)
        self.log('val_score', score, logger=True)
        logs = {'valid_loss': avg_loss, f'valid_accuracy': score, 'accuracy': score}
        return {'valid_loss': avg_loss, 'log': logs, 'progress_bar': logs}

## Training the model

In [None]:
#mlflow_logger = MLFlowLogger(experiment_name='cassava-hatanor',
#                             tracking_uri="http://54.238.161.51:5000",
#                             tags={"machine":"kaggle-notebook",
#                                   "argu":"v2",
#                                   "pretrain":"yes",
#                                   "arch":"efficientnet-b3"
#                                 })
#tracking_uri = mlflow.get_tracking_uri()
#print("Current tracking uri: {}".format(tracking_uri))

In [None]:
model = CassvaImgClassifier(model_arch="efficientnet-b3", n_class=5)

In [None]:
dm = CassavaDataModule(train, train_augs, valid_augs, f'{path}train_images/')

In [None]:
trainer = pl.Trainer(
        checkpoint_callback=ModelCheckpoint(monitor='val_loss',
                                            save_top_k=1, filepath='{epoch}_{val_loss:.4f}_{val_score:.4f}', mode='min'),
        gpus=1,
        max_epochs=50,
        num_sanity_val_steps=0,
        weights_summary='top',
        callbacks = [EarlyStopping(monitor='val_loss', patience=10, mode='min')],
        #logger=mlflow_logger,
        fast_dev_run=False,
)

In [None]:
lit_model = LitCassava(model)

In [None]:
trainer.fit(lit_model, dm)

In [None]:
# ---------------------
# Log best scores
# ---------------------
#val_losses = mlflow_logger.experiment.get_metric_history(run_id=mlflow_logger.run_id, key="val_loss")
#best_val_loss = np.min([v.value for v in val_losses])
#best_epoch = np.argmin([v.value for v in val_losses])
#best_val_loss = val_losses[best_epoch].value
#mlflow_logger.experiment.log_metric(key="best_val_loss", value=best_val_loss, run_id=mlflow_logger.run_id)

## Analyzing the predictions

First of all, let's make predictions on validation data and collect them.

In [None]:
lit_model.model.eval();

In [None]:
image_paths = []
targets = []
predictions = []
for i, batch in tqdm(enumerate(dm.val_dataloader())):
    with torch.no_grad():
        targets.append(batch['target'].detach().cpu().numpy())
        image_paths.append(batch['image_path'])
        pred = lit_model.model(batch['image'], batch['target'])[0]
        predictions.append(pred.detach().cpu().numpy())

In [None]:
preds_df = pd.DataFrame({'target': np.concatenate(targets),
              'prediction': np.concatenate(predictions).argmax(1),
              'logits': np.concatenate(predictions).max(1),
              'image_paths': [i for j in image_paths for i in j]})
preds_df.head()

In [None]:
sns.countplot(y=preds_df['prediction'].map(name_mapping), orient='v')
plt.title('Prediction distribution');

In [None]:
print(metrics.classification_report(preds_df['target'], preds_df['prediction']))

In [None]:
metrics.confusion_matrix(preds_df['target'], preds_df['prediction'])

In [None]:
selected_images = []
fig = plt.figure(figsize=(16, 16))
c = 1
for class_id1, class_name1 in name_mapping.items():
    for class_id2, class_name2 in name_mapping.items():
        if class_id1 != class_id2:
            img_path = preds_df.loc[(preds_df['target'] == class_id1)
                                    & (preds_df['prediction'] == class_id2)].sort_values('logits', ascending=False)['image_paths'].values[0]

            ax = fig.add_subplot(5, 4, c, xticks=[], yticks=[])
            img = cv2.imread(img_path)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            plt.imshow(img)
            ax.set_title(f"Correct class: {class_id1}. Predicted class: {class_id2}")
            c += 1

We can see that the distribution of predictions is similar to the distribution original classes.

Rare classes have more errors in predictions, we will need to find a way to tacke that.

## Prediction

In [None]:
sub = pd.read_csv(f'{path}/sample_submission.csv')
sub.head()

In [None]:
test_dataset = ImageClassificationDataset(image_names=sub['image_id'].values,
                                                        transforms=valid_augs,
                                                        labels=sub['label'].values,
                                                        img_path=f'{path}test_images/',
                                                        mode='test',
                                                        labels_to_ohe=False,
                                                        n_classes=5)

test_loader = torch.utils.data.DataLoader(
            test_dataset,
            batch_size=4,
            num_workers=4,
            shuffle=False,
        )



In [None]:
lit_model.model.cuda()

In [None]:
predictions = []

for batch in test_loader:

    image = batch['image'].to('cuda')
    target = batch['target'].to('cuda')
    with torch.no_grad():
        outputs = lit_model.model(image, target)[0]
        preds = outputs.argmax(1).detach().cpu().numpy()

        predictions.append(preds)

In [None]:
sub['label'] = np.concatenate(predictions)

In [None]:
sub

In [None]:
sub.to_csv('submission.csv', index=False)