<a href="https://colab.research.google.com/github/hey-sid29/paddy-disease/blob/main/Nb_1_Small_Image_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook-1: Training Smaller Models

- This Notebook covers the exploration of PyTorch Image Models on the Paddy Disease Classification dataset; In this Notebook the models that are trained are smaller in size and have been trained for less epochs and less fine tuning. This notebook is more of an exploration step to understand the capabilities of Timm Models

# **Title**: Paddy Disease Classification

**Description**:<br>


> Rice (Oryza sativa) is one of the staple foods worldwide. Paddy, the raw grain before removal of husk, is cultivated in tropical climates, mainly in Asian countries. Paddy cultivation requires consistent supervision because several diseases and pests might affect the paddy crops, leading to up to 70% yield loss.(source: kaggle competition)



**Project Description**:<br>
1. Classify and Predict the type of Paddy Disease using computer vision techniques
2. The dataset is obtained from kaggle competition.<br>
> (source link: [Paddy Doctor: Paddy Disease Classification](https://https://www.kaggle.com/competitions/paddy-disease-classification/data))

**Dataset Description**:<br>

* __train.csv__ - The training set

> __image_id__ - Unique image identifier corresponds to image file names (.jpg) found in the train_images directory.<br>
> __label__ - Type of paddy disease, also the target class. There are *ten categories*, including the normal leaf.<br>
> __variety__ - The name of the paddy variety.<br>
> __age__ - Age of the paddy in days.<br>

* __sample_submission.csv__ - Sample submission file.
<br>
* __train_images__ - This directory contains 10,407 training images stored under different sub-directories corresponding to ten target classes. Filename corresponds to the image_id column of train.csv.
<br>
* __test_images__ - This directory contains 3,469 test set images.

## I. Setting up:

In [None]:
## Installing fastkaggle module

try:
  import fastkaggle
except ModuleNotFoundError:
  !pip install -Uq fastkaggle

from fastkaggle import *

In [None]:
#Setting up the colab environment:

!mkdir ~/.kaggle
!mv /content/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json

In [None]:
data_path = 'paddy-disease-classification'

path = setup_comp(data_path, install = 'fastai "timm>=0.6.2.dev0" ')

In [None]:
!pip install timm --q

In [None]:
import timm
from fastai.vision.all import *

#Uncomment below line to make the code results reproducible
set_seed(100)

path.ls()

## II. Reading the Image data:

In [None]:
train_path = path/'train_images'
files = get_image_files(train_path)

In [None]:
image = PILImage.create(files[0])
print(image.size)
image.to_thumb(128)

In [None]:
#Checking the sizes of all available images in the `train_images` folder:


from fastcore.parallel import *

def size(img):
  return PILImage.create(img).size


img_sizes = parallel(size, files, n_workers = 5)
pd.Series(img_sizes).value_counts()

- Majority of the pictures are of same size that is (480, 640)[10,403 images] <br>
- A few pictures are of different size (640, 480)[4 imgs]<br>

**Thus we need to resize the pictures to a consistent size**

In [None]:
#Creating a dataloader:

dls = ImageDataLoaders.from_folder(path=train_path, valid_pct=0.25, seed=200,
                                   item_tfms=Resize(480, method='squish'), batch_tfms=aug_transforms(size=128, min_scale=0.70))

dls.show_batch(max_n=4)

## III. Creating the first base model:

To create a model we should select a specific architecture and compare its performance with other models in the bunch. <br>


> Here's the fantastic notebook I refer to commonly for any go-to case:[The best vision models for fine-tuning](https://https://www.kaggle.com/code/jhoward/the-best-vision-models-for-fine-tuning)<br>
> Credits: [Jeremy Howard](https://https://www.kaggle.com/jhoward)



From that I would select a few models and run experiments:<br>


1.   *resnet26d*
2.   *convnext_tiny*




In [None]:
resnet_learner = vision_learner(dls, 'resnet26d', metrics=error_rate, path='.').to_fp16()
convnext_tiny_learner = vision_learner(dls, 'convnext_tiny', metrics=error_rate, path='.').to_fp16()

- For `Resnet26d` model:

In [None]:
#Finding an optimal Learning Rate using the `lr_find()`:
resnet_learner.lr_find(suggest_funcs=(slide, valley))

- For the `convnext_tiny` model:

In [None]:
convnext_tiny_learner.lr_find(suggest_funcs=(slide, valley))

#### **Training the models**:

1. Resnet26D

In [None]:
resnet_learner.fine_tune(epochs=7, base_lr=0.001)

2. Convnext_tiny

In [None]:
convnext_tiny_learner.fine_tune(epochs=7, base_lr=0.001)

- `convnext_tiny` performs slightly better than `resnet26d`

In [None]:
sample_sub = pd.read_csv(path/'sample_submission.csv')

In [None]:
sample_sub

In [None]:
test_files = get_image_files(path/'test_images').sorted()
test_dls = dls.test_dl(test_files)

### Predicting using test images:

- For `Resnet26d` model:

In [None]:
probs, _, idxs = resnet_learner.get_preds(dl=test_dls, with_decoded=True)
idxs

- For `convnext_tiny` model:

In [None]:
probs_conv, _, idxs_conv = convnext_tiny_learner.get_preds(dl=test_dls, with_decoded=True)
idxs_conv

- Mapping the indexes to the name of the disease:

In [None]:
dls.vocab

In [None]:
mapper = dict(enumerate(dls.vocab))
results = pd.Series(idxs.numpy(), name='idxs').map(mapper)
print(results)

In [None]:
#Creating submission file for the resnet model:
sample_sub['label']=results
sample_sub.to_csv('sub_resnet.csv', index=False)


In [None]:
!head sub_resnet.csv



---



---



- Creating submission file for the `convnext_tiny` model:

In [None]:
res = pd.Series(idxs_conv.numpy(), name='idxs_conv').map(mapper)
print(res)

In [None]:
sample_sub['label'] = res
sample_sub.to_csv('sub_convnext.csv', index=False)
!head sub_convnext.csv

### Results of the model:<br>

- `Resnet26d`<br>

> Public score: 0.86; Private Score: 0.87

- `ConvNext_Tiny`<br>

> Public Score: 0.92; Private Score: 0.93







---



---


