<a href="https://colab.research.google.com/github/pgurazada/explore-dinov2/blob/main/bean_leaf_classification_dinov2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this exercise, we train a classification model that uses the feature extraction capabilities of [Dino V2](https://arxiv.org/pdf/2304.07193.pdf).

# Imports

In [1]:
import os
import cv2
import json
import glob
import torch
import random

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import torchvision.transforms as T

from scipy.stats import reciprocal, uniform

from PIL import Image

from tqdm import tqdm
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile

from sklearn.svm import SVC

In [2]:
!pip show torch torchvision

Name: torch
Version: 2.1.0+cu118
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, triton, typing-extensions
Required-by: fastai, torchaudio, torchdata, torchtext, torchvision
---
Name: torchvision
Version: 0.16.0+cu118
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, pillow, requests, torch
Required-by: fastai


# Data

The dataset used in this exercise is the [Ibean Leaf dataset](https://github.com/AI-Lab-Makerere/ibean/).

The dataset contains leaf images taken in the field in different districts in Uganda by the Makerere AI lab in collaboration with the National Crops Resources Research Institute (NaCRRI), the national body in charge of research in agriculture in Uganda. Beans are an important cereal food crop for Africa grown by many small-holder farmers and are a significant source of proteins for school-age going children in East Africa.

There are 3 classes of leaf images in the dataset:
 - Healthy
 - Angular Leaf Spot disease
 - Bean Rust disease

The input data includes example leaf images taken in the field/garden using a basic smartphone. Some sample images are presented below.

![Image of IBean](https://github.com/AI-Lab-Makerere/ibean/raw/master/bean-example-data.png)

The images were then annotated by experts from NaCRRI who determined for each image which disease was manifested. The experts were part of the data collection team and images were annotated directly during the data collection process in the field.

---

Our aim in this exercise is to build a robust model that can distinguish between diseases in the bean plants by building a series of computer vision models aimed at increasing **accuracy**.

Image data sets are typically organized into training, validation and testing directories, where each of these directories contains subfolders with images corresponding to each of the classes in the data set.

```
project
│   README.md
└───train
│   └───class1
│       │   tr_image11.jpg
│       │   tr_image12.jpg
│       │   ...
|   └───class2
│       │   tr_image21.jpg
│       │   tr_image22.jpg
│       │   ...
└───validation
│   └───class1
│       │   val_image11.jpg
│       │   val_image12.jpg
│       │   ...
|   └───class2
│       │   val_image21.jpg
│       │   val_image22.jpg
│       │   ...
```

In [3]:
train_data_url = 'https://storage.googleapis.com/ibeans/train.zip'
validation_data_url = 'https://storage.googleapis.com/ibeans/validation.zip'
test_data_url = 'https://storage.googleapis.com/ibeans/test.zip'

In [4]:
with urlopen(train_data_url) as zipped_file:
    with ZipFile(BytesIO(zipped_file.read())) as zfile:
        zfile.extractall('/tmp/bean')

In [5]:
with urlopen(validation_data_url) as zipped_file:
    with ZipFile(BytesIO(zipped_file.read())) as zfile:
        zfile.extractall('/tmp/bean')

In [6]:
with urlopen(test_data_url) as zipped_file:
    with ZipFile(BytesIO(zipped_file.read())) as zfile:
        zfile.extractall('/tmp/bean')

In [7]:
TRAINING_DIR = '/tmp/bean/train'
VALIDATION_DIR = '/tmp/bean/validation'
TEST_DIR = '/tmp/bean/test'

In [8]:
train_labels = {}

for folder in os.listdir(TRAINING_DIR):
    for file in os.listdir(os.path.join(TRAINING_DIR, folder)):
        if file.endswith(".jpg"):
            full_name = os.path.join(TRAINING_DIR, folder, file)
            train_labels[full_name] = folder

In [9]:
val_labels = {}

for folder in os.listdir(VALIDATION_DIR):
    for file in os.listdir(os.path.join(VALIDATION_DIR, folder)):
        if file.endswith(".jpg"):
            full_name = os.path.join(VALIDATION_DIR, folder, file)
            val_labels[full_name] = folder

In [10]:
test_labels = {}

for folder in os.listdir(TEST_DIR):
    for file in os.listdir(os.path.join(TEST_DIR, folder)):
        if file.endswith(".jpg"):
            full_name = os.path.join(TEST_DIR, folder, file)
            test_labels[full_name] = folder

In [11]:
dinov2_vits14 = torch.hub.load("facebookresearch/dinov2", "dinov2_vits14")

Downloading: "https://github.com/facebookresearch/dinov2/zipball/main" to /root/.cache/torch/hub/main.zip
Downloading: "https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth" to /root/.cache/torch/hub/checkpoints/dinov2_vits14_pretrain.pth
100%|██████████| 84.2M/84.2M [00:01<00:00, 69.0MB/s]


In [12]:
device = torch.device('cuda' if torch.cuda.is_available() else "cpu")
dinov2_vits14.to(device)

DinoVisionTransformer(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 384, kernel_size=(14, 14), stride=(14, 14))
    (norm): Identity()
  )
  (blocks): ModuleList(
    (0-11): 12 x NestedTensorBlock(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): MemEffAttention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (ls1): LayerScale()
      (drop_path1): Identity()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU(approximate='none')
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
      (ls2): LayerScale()
      (drop_path2): Identity()
    )
  )
  (n

In [13]:
transform_image = T.Compose(
    [
        T.ToTensor(),
        T.Resize(244, antialias=True),
        T.CenterCrop(224),
        T.Normalize([0.5], [0.5])
    ]
)

In [14]:
def load_image(img: str) -> torch.Tensor:
    """
    Load an image and return a tensor that can be used as an input to DINOv2.
    """
    img = Image.open(img)

    transformed_img = transform_image(img)[:3].unsqueeze(0)

    return transformed_img

In [15]:
def compute_embeddings(files: list) -> dict:
    """
    Create an index that contains all of the images in the specified list of files.
    """
    all_embeddings = {}

    with torch.no_grad():
      for i, file in enumerate(tqdm(files)):
        embeddings = dinov2_vits14(load_image(file).to(device))

        all_embeddings[file] = np.array(embeddings[0].cpu().numpy()).reshape(1, -1).tolist()

    return all_embeddings

In [16]:
train_embeddings = compute_embeddings(train_labels.keys())

100%|██████████| 1034/1034 [00:25<00:00, 40.72it/s]


In [17]:
val_embeddings = compute_embeddings(val_labels.keys())

100%|██████████| 133/133 [00:01<00:00, 67.03it/s]


In [18]:
test_embeddings = compute_embeddings(test_labels.keys())

100%|██████████| 128/128 [00:01<00:00, 67.18it/s]


In [19]:
ytrain = [train_labels[file] for file in train_labels.keys()]
train_embedding_list = list(train_embeddings.values())
Xtrain = np.array(train_embedding_list).reshape(-1, 384)

yval = [val_labels[file] for file in val_labels.keys()]
val_embedding_list = list(val_embeddings.values())
Xval = np.array(val_embedding_list).reshape(-1, 384)

ytest = [test_labels[file] for file in test_labels.keys()]
test_embedding_list = list(test_embeddings.values())
Xtest = np.array(test_embedding_list).reshape(-1, 384)

In [20]:
n_iter = 200

model_performance = []

In [21]:
for _ in tqdm(range(n_iter)):
    C_ = uniform(1, 100).rvs()

    model_svm = SVC(
        gamma='scale',
        C=C_,
        random_state=42
    )

    model_svm.fit(Xtrain, ytrain)

    predictions = model_svm.predict(Xval)
    val_accuracy = (predictions == np.array(yval)).mean()

    model_performance.append([C_, val_accuracy])

100%|██████████| 200/200 [00:16<00:00, 11.85it/s]


In [22]:
(
    pd.DataFrame(model_performance, columns=['C', 'validation_accuracy'])
      .sort_values(by='validation_accuracy', ascending=False)
      .head(5)
)

Unnamed: 0,C,validation_accuracy
0,82.703849,0.977444
118,95.430154,0.977444
121,68.61005,0.977444
122,85.303672,0.977444
124,53.097181,0.977444


In [23]:
model_svm = SVC(
    gamma='scale',
    C=50
)

model_svm.fit(Xtrain, ytrain)

In [24]:
test_predictions = model_svm.predict(Xtest)

In [25]:
(test_predictions == np.array(ytest)).mean()

0.9453125