# Paddy Doctor Competition Late Sub

## About the Data

### What is paddy?

Rice is one of the staple foods of the world. Paddy is the raw grain of rice and is farmed in tropical climates, predominantly Asian countries. Paddy cultivation has to constantly be checked for diseases and pests. These diseases and pests can cause yield losses of up to 70%. The supervision of experts are required to manual diagnose of the plants and reduce crop losses. This being very expensive and tedious tasks.

### Objective

To develop a deep-learning model, that can classify different paddy leafs accurately. In this dataset, there's close to 10 000 labelled images which makes up 75% of the data. There are also 3 469 unlabelled images which makes up the other 25%. The model will have to classify between 10 different classes 9 of them being disease categories and 1 being of a normal leaf.

### Installing relevant libraries

In [1]:
import os
os.system('pip install fastkaggle')

Collecting fastkaggle
  Downloading fastkaggle-0.0.7-py3-none-any.whl (11 kB)
Installing collected packages: fastkaggle
Successfully installed fastkaggle-0.0.7




0

In [2]:
import fastkaggle

comp = 'paddy-disease-classification'
path = fastkaggle.setup_comp(comp, install='"fastcore>=1.4.5" "fastai>=2.7.1" "timm>=0.6.2.dev0"')

from fastai.vision.all import *

train_path = path/'train_images'
files = get_image_files(train_path)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, which is not installed.
tensorflow 2.6.4 requires h5py~=3.1.0, but you have h5py 3.7.0 which is incompatible.
tensorflow 2.6.4 requires numpy~=1.19.2, but you have numpy 1.21.6 which is incompatible.
tensorflow 2.6.4 requires tensorboard<2.7,>=2.6.0, but you have tensorboard 2.10.0 which is incompatible.
tensorflow 2.6.4 requires typing-extensions<3.11,>=3.7, but you have typing-extensions 4.1.1 which is incompatible.
tensorflow-transform 1.9.0 requires tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,<2.10,>=1.15.5, but you have tensorflow 2.6.4 which is incompatible.
tensorflow-serving-api 2.9.0 requires tensorflow<3,>=2.9.0, but you have tensorflow 2.6.4 which is incompatible.
pandas-profiling 3.1.0 requires mar

## Observe the data

Let's just get more familiar with the data that we'll be dealing with by checking the size of the images that we'll be dealing with

In [None]:
from fastcore.parallel import *

# create a function to get the image size from when the image's path is given
def f(file_path):
    return PILImage.create(file_path).size

# Get sizes of all images
img_sizes = parallel(f, files, n_workers=0)
# Count and display the number of occurances of each size
pd.Series(img_sizes).value_counts()

From the output above, we can see that there are 10 403 images that have the size 480x640 and there are 4 that are 640x480. They are almost the same size except for a few. Before we start to develop the deep learning model, we'll have to get the images to the same size. Luckily, for us fastai's `ImageDataLoader` can do this for us while it loads the training and validation dataset. We will be transforming the images into 480x480. Fastai let's us do this in a few different ways but for now we will go with just squishing images to this size.

## Load training and validation dataset into notebook

Let's load the dataset. Two things to note before we do so is:

1. We will be using an 80/20 training-validation split
2. We will be augmenting the data using `aug_transforms`

For the data augmentations, we will be randomly cropping the image with the minimum of the crop being 75% of the image as well as rescalling the newly cropped image down to 128x128 pixels

In [None]:
dls = ImageDataLoaders.from_folder(train_path, valid_pct=0.2,
                                   item_tfms=Resize(480, method='squish'),
                                   batch_tfms=aug_transforms(size=128, min_scale=0.75))
dls.show_batch(max_n=6)

## Let's define our first model

We'll being [this](https://www.kaggle.com/code/jhoward/the-best-vision-models-for-fine-tuning) to pick an architecture. Right now, our priority is speed over accuracy since we want to be iterating quickly. Thus we will be taking the fastest model. It seems like the resnet is the fastest in the top 15 so we will be using this architecture for now.

In [None]:
model = vision_learner(dls, 'resnet26d', metrics=error_rate, path='.').to_fp16()

## Finding best learning rate

Fastai has a function that can help us find an optimal starting gradient called `lr_find`. Let's use it!

In [None]:
model.lr_find(suggest_funcs=(valley, slide))

`lr_find` recommends stable gradients so that the model's training loss will converge. Let's go with a learning rate that's a bit more riskier so that we can hopefully get a faster/better trained model

In [None]:
model.fine_tune(3, 0.01)

In [None]:
ss = pd.read_csv(path/'sample_submission.csv')
ss

In [None]:
tst_files = get_image_files(path/'test_images').sorted()
tst_dl = dls.test_dl(tst_files)

probs,_,idxs = model.get_preds(dl=tst_dl, with_decoded=True)
idxs

In [3]:
dls.vocab

NameError: name 'dls' is not defined

In [None]:
mapping = dict(enumerate(dls.vocab))
results = pd.Series(idxs.numpy(), name="idxs").map(mapping)
results

In [None]:
ss['label'] = results
ss.to_csv('/kaggle/working/submission.csv', index=False)
ss.head()

## Initial submission to Kaggle

We can submit `submission.csv` to kaggle now and see what results it yields. The result that we get is 86.9%. This puts us in the bottom 50% of the leaderboard. This is fine since we just wanted to create a model that we can iterate quickly with

# Speeding up the model

Our model took about 90 seconds to train for each epoch and 3 minutes to train on kaggle. We can reduce the computational cost of training this model by reducing the height and width to half of the original's. This will make the image 4x smaller and therefore will be speed up the process by a factor of 4 as well. This is all done so we can iterate quickly to find the best model possible

In [None]:
train_path = '/kaggle/working/sml'
resize_images(path/'train_images', dest=train_path, max_size=256, recurse=True)

This will give us imaegs of 192x256. Here's a batch of images to visualise our new training data

In [None]:
dls = ImageDataLoaders.from_folder(train_path,
                                   valid_pct=0.2,
                                   item_tfms=Resize((256, 192)))
dls.show_batch(max_n=3)

We'll be trying out a bunch of different architectures, item transforms, and batch transforms. So let's make a function that makes it easier to test out these different combinations so that we can iterate faster

In [None]:
def train(arch, item, batch, epochs=5):
    dls = ImageDataLoaders.from_folder(train_path,
                                       valid_pct=0.2,
                                       item_tfms=item,
                                       batch_tfms=batch)
    model = vision_learner(dls, arch, metrics=error_rate).to_fp16()
    model.fine_tune(epochs, 0.01)
    return model

We've already transformed the processed the images by reducing the size so any `item_tfms` won't have much impact on the performance. Let's test the `resent26d` architecture to see if our speed improved

In [None]:
model = train('resnet26d', item=Resize(192),
              batch=aug_transforms(size=128, min_scale=0.75))

The speed has increased immensely! It has more than doubled the speed from the previous test.
We can start iterating now that we have a fast model that we can iterate on. Let's start trying out a different and more capable architecture

## ConvNext Architecture

Let's try out the `convnext_small`. It ranks as the best in terms of speed/performance metric in [this notebook](https://www.kaggle.com/code/jhoward/the-best-vision-models-for-fine-tuning). Note that it is also resolution independent

In [None]:
arch = 'convnext_small_in22k'

model = train(arch, item=Resize(192),
              batch=aug_transforms(size=128, min_scale=0.75))

## Preprocessing experiments

Let's try out different methods of batch and dataset transformations. We can look at two dataset tranformations method. The first squishing the image down to the preferred size or adding black bars to the image when the image isn't at the desired size

In [None]:
model = train(arch, item=Resize(192, method='squish'),
              batch=aug_transforms(size=128, min_scale=0.75))

In [None]:
model = train(arch, item=Resize((256, 192), method=ResizeMethod.Pad, pad_mode=PadMode.Zeros),
              batch=aug_transforms(size=(171, 128), min_scale=0.75), epochs=12)

Seems like there's some improvement so we'll using padding from now on

## Test time augmentation

We will now implement Test Time Augmentation(TTA). TTA is defined as:

>During inference or validation, creating multiple versions of each image, using data augmentation, and then taking the average or maximum of the predictions for each augmented version of the image.

Let's get our validation accuracy for our model before TTA so we can compare

In [None]:
valid = model.dls.valid
pred, targs = model.get_preds(dl=valid)

In [None]:
error_rate(pred, targs)

Now let's add TTA. We can do this buy calling the `tta()` function supplied by the fastai library

In [None]:
tta_preds, _ = model.tta(dl=valid)

Now let's check the error rate with TTA

In [None]:
error_rate(tta_preds, targs)

That's a massive improvement! We'll definitely be including this in our model

## Second submission

Now that we have good model and know which data preprocessing methods to use, we can put it all together and make another submission. Let's switch back to the original training dataset(the one before it was resized)

In [None]:
train_path = path/'train_images'
arch = 'convnext_small_in22k'

model = train(arch,
              epochs=12,
              item=Resize((480, 360), method=ResizeMethod.Pad, pad_mode=PadMode.Zeros),
              batch=aug_transforms(size=(256, 192), min_scale=0.75))

In [None]:
tta_preds,targs = model.tta(dl=model.dls.valid)
error_rate(tta_preds, targs)

In [4]:
test_files = get_image_files(path/'test_images').sorted()
test_dl = model.dls.test_dl(test_files)

NameError: name 'model' is not defined

In [None]:
preds,_ = model.tta(dl=test_dl)

In [None]:
idxs = preds.argmax(dim=1)

In [None]:
vocab = np.array(model.dls.vocab)
results = pd.Series(vocab[idxs], name="idxs")

In [None]:
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = results
ss.to_csv('/kaggle/working/submission_final.csv', index=False)
!head submission_final.csv

# Scaling up the model

In this step, we'll be scaling up our model into an ensemble of bigger models with larger inputs. One of the main challenges with this type of model is GPU memory. We'll need to design this model around this GPU memory limitation (Kaggle's GPU memory is 16GB)

It will help a lot to run a few models and image sizes to see what will run successfully. To speed up the process, we will take a small subset of the data so that we can run short epochs for testing. The memory that it uses should still be the same but it should be much faster

One easy wayu to do this is to use a category with few files in it. Let's look at our options:

In [3]:
df = pd.read_csv(path/'train.csv')
df.label.value_counts()

normal                      1764
blast                       1738
hispa                       1594
dead_heart                  1442
tungro                      1088
brown_spot                   965
downy_mildew                 620
bacterial_leaf_blight        479
bacterial_leaf_streak        380
bacterial_panicle_blight     337
Name: label, dtype: int64

Let's use `bacterial_panicle_blight` since it's the smallest

In [3]:
train_path = path/'train_images'/'bacterial_panicle_blight'
test_files = get_image_files(path/'test_images').sorted()

Let's set up a new `train()` function. We'll be making a few changes to this function though:

1. We'll be using a finetune argument to pick soecify whether we will be using the `fine_tune()` function or `fit_one_cycle()` function. `fit_one_cycle()` is faster since it doesn't fine-tune the head.
2. If we're going to use the `fine_tune()` function, we will be calculate the and return the TTA predictions
3. It's important that we don't set the seed so that the ensembled models train on slightly different train/validaiton sets
4. Lastly, we'll be adding a gradient accumulation paramter, `accum`. This will help with smoothing out the optimization of the loss function.

In [4]:
def train(arch, size, item=Resize(480, method='squish'), accum=1, finetune=True, epochs=12):
    dls = ImageDataLoaders.from_folder(train_path,
                                       valid_pct=0.2,
                                       item_tfms=item,
                                       batch_tfms=aug_transforms(size=size, min_scale=0.75),
                                       bs=64//accum)
    cbs = GradientAccumulation(64) if accum else []
    model = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
    if finetune:
        model.fine_tune(epochs, 0.01)
        return model.tta(dl=dls.test_dl(test_files))
    else:
        model.unfreeze()
        model.fit_one_cycle(epochs, 0.01)

## Impact of gradient of accumulation

To see the impact of gradient accumulation, let's train a small model

In [8]:
train('convnext_small_in22k', size=128, epochs=1, accum=1, finetune=False)

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_small_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_small_22k_224.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:12


To keep track of the memory usage, we'll create a function to display the memory usage as well as clear the memory used for the next run

In [5]:
import gc
def report_gpu():
    print(torch.cuda.list_gpu_processes())
    gc.collect()
    torch.cuda.empty_cache()

In [7]:
report_gpu()

GPU:0
no processes are running


In [11]:
train('convnext_small_in22k', size=128, epochs=1, accum=2, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:06


GPU:0
process       2776 uses     3013.000 MB GPU memory


The VRAM usage has gone down to 4GB. Let's set `accum=4` and see how low it gets after that

In [12]:
train('convnext_small_in22k', size=128, epochs=1, accum=4, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:07


GPU:0
process       2776 uses     2497.000 MB GPU memory


The memory usage has dropped by about 500MB

## Check memory usage

Now we'll check that each architecture and image size will fit into the memory available (16GB)

Let's try `convnext_large_in22k` with an image size of 224x224 first. We'll start with `accum=1` then increment by powers of 2

In [13]:
train('convnext_large_in22k', size=224, epochs=1, accum=2, finetune=False)
report_gpu()

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_large_22k_224.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:13


GPU:0
process       2776 uses    10935.000 MB GPU memory


Now that we've fitted `convnext_large_in22k` to the GPU memory we'll try a different architecture. Let's try `vit_large_patch16_224` with the image size of 224x244

In [14]:
train('vit_large_patch16_224', size=224, epochs=1, accum=2, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:19


GPU:0
process       2776 uses    15255.000 MB GPU memory


And finally repeat the process with the swinv2 and swin architectures:

In [15]:
train('swinv2_large_window12_192_22k', size=192, epochs=1, accum=2, finetune=False)
report_gpu()

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth" to /root/.cache/torch/hub/checkpoints/swinv2_large_patch4_window12_192_22k.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:14


GPU:0
process       2776 uses    13411.000 MB GPU memory


In [16]:
train('swin_large_patch4_window7_224', size=224, epochs=1, accum=2, finetune=False)
report_gpu()

Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window7_224_22kto1k.pth" to /root/.cache/torch/hub/checkpoints/swin_large_patch4_window7_224_22kto1k.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:14


GPU:0
process       2776 uses    11777.000 MB GPU memory


## Running the models

Now let's run all the models that were previously tested. Note that these models were chosen based on tests on their smaller models that shouwed promising results

In [6]:
res = 640, 480
models = {
    'convnext_large_in22k': {
        (Resize(res), (320, 224)),
    },
    'vit_large_patch16_224': {
        (Resize(480, method='squish'), 224),
        (Resize(res), 224),
    },
    'swinv2_large_window12_192_22k': {
        (Resize(480, method='squish'), 192),
        (Resize(res), 192),
    },
    'swin_large_patch4_window7_224': {
        (Resize(res), 224),
    }
}

Set our training path back to the whole training dataset

In [7]:
train_path = path/'train_images'

Now we're ready to train all the models. Note that all these models have different training and validation set, so the results of each aren't comparable.
We'll be adding the TTA predictions to a list called `tta_results`

In [8]:
tta_results = []
torch.cuda.empty_cache()
for arch, details in models.items():
    for item, size in details:
        print('---', arch)
        print(size)
        print(item.name)
        tta_results.append(train(arch, size, item=item, accum=2))
        gc.collect()
        torch.cuda.empty_cache()

--- convnext_large_in22k
(320, 224)
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_large_22k_224.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.872314,0.598905,0.176358,05:46


epoch,train_loss,valid_loss,error_rate,time
0,0.368224,0.244081,0.072561,07:44
1,0.309664,0.267226,0.079289,07:44
2,0.308013,0.293702,0.078328,07:44
3,0.210747,0.184096,0.045651,07:44
4,0.16701,0.159295,0.037001,07:44
5,0.141864,0.161172,0.039885,07:44
6,0.106586,0.167182,0.037482,07:44
7,0.086767,0.153313,0.032196,07:44
8,0.053264,0.159257,0.028832,07:43
9,0.042405,0.136714,0.02643,07:43


--- vit_large_patch16_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


epoch,train_loss,valid_loss,error_rate,time
0,1.010208,0.599176,0.190293,06:22


epoch,train_loss,valid_loss,error_rate,time
0,0.415092,0.209281,0.063431,08:46
1,0.357152,0.303912,0.103796,08:47
2,0.378315,0.297094,0.095147,08:47
3,0.279452,0.249924,0.068717,08:47
4,0.260639,0.16617,0.05334,08:47
5,0.189785,0.22716,0.065353,08:47
6,0.091498,0.137822,0.031716,08:47
7,0.091633,0.128244,0.030274,08:47
8,0.061324,0.11099,0.024507,08:47
9,0.036739,0.092021,0.023066,08:47


--- vit_large_patch16_224
224
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


epoch,train_loss,valid_loss,error_rate,time
0,0.970771,0.668443,0.201346,06:20


epoch,train_loss,valid_loss,error_rate,time
0,0.392599,0.225035,0.069678,08:45
1,0.321996,0.297305,0.087458,08:45
2,0.337138,0.349695,0.0889,08:45
3,0.276741,0.271388,0.070159,08:45
4,0.239266,0.276081,0.077367,08:45
5,0.143083,0.241358,0.049495,08:45
6,0.111401,0.208522,0.041326,08:45
7,0.075432,0.171988,0.033638,08:45
8,0.058192,0.147642,0.029793,08:45
9,0.039965,0.148389,0.023546,08:45


--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth" to /root/.cache/torch/hub/checkpoints/swinv2_large_patch4_window12_192_22k.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.926375,0.543835,0.160019,04:28


epoch,train_loss,valid_loss,error_rate,time
0,0.432434,0.22259,0.074483,05:43
1,0.34598,0.216458,0.067756,05:43
2,0.329633,0.265195,0.078328,05:44
3,0.260048,0.202407,0.057184,05:44
4,0.239109,0.207757,0.057184,05:44
5,0.140929,0.140889,0.035079,05:43
6,0.127671,0.150949,0.035079,05:43
7,0.100948,0.100023,0.024988,05:44
8,0.073523,0.097781,0.023546,05:44
9,0.04966,0.082431,0.019222,05:43


--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


epoch,train_loss,valid_loss,error_rate,time
0,0.907158,0.645986,0.198462,04:28


epoch,train_loss,valid_loss,error_rate,time
0,0.43981,0.247517,0.081211,05:43
1,0.363201,0.239293,0.073522,05:43
2,0.373965,0.268732,0.08025,05:43
3,0.325439,0.192823,0.06247,05:43
4,0.236287,0.165258,0.049015,05:43
5,0.146949,0.157314,0.045651,05:44
6,0.148755,0.097451,0.031716,05:43
7,0.080974,0.10631,0.028832,05:43
8,0.079647,0.087776,0.023066,05:43
9,0.05335,0.061899,0.019222,05:43


--- swin_large_patch4_window7_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window7_224_22kto1k.pth" to /root/.cache/torch/hub/checkpoints/swin_large_patch4_window7_224_22kto1k.pth


epoch,train_loss,valid_loss,error_rate,time
0,0.979339,0.528617,0.165305,04:32


epoch,train_loss,valid_loss,error_rate,time
0,0.46299,0.22462,0.072081,06:01
1,0.352011,0.208433,0.058626,06:01
2,0.333435,0.240751,0.068236,06:01
3,0.286459,0.212072,0.063912,06:01
4,0.223839,0.139611,0.039404,06:01
5,0.173465,0.128462,0.037001,06:01
6,0.156365,0.095416,0.027871,06:01
7,0.107965,0.082431,0.020663,06:01
8,0.072021,0.068457,0.016819,06:01
9,0.057434,0.071402,0.016819,06:01


In [9]:
save_pickle('tta_results.pkl', tta_results)
tta_prs = first(zip(*tta_results))
tta_prs += tta_prs[1:3]

avg_pr = torch.stack(tta_prs).mean(0)
avg_pr.shape

torch.Size([3469, 10])

In [14]:
dls = ImageDataLoaders.from_folder(train_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
    batch_tfms=aug_transforms(size=224, min_scale=0.75))

idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('submission_ensemble.csv', index=False)