<a href="https://colab.research.google.com/github/hey-sid29/paddy-disease/blob/main/Nb_3_Scaling_Up_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title Uploading Kaggle Api
!mkdir ~/.kaggle
!mv /content/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json

# Notebook-3: Training Larger Models with Larger Inputs

- In the previous two notebooks:

> [Nb-1: Small Image Models](https://github.com/hey-sid29/Image-Models/blob/main/Nb_1_Small_Image_Models.ipynb)

> [Nb-2: Optimizing Training Time](https://github.com/hey-sid29/Image-Models/blob/main/Nb2_Optimizing_Training_Time.ipynb)
<br>

These Notebooks explored and demonstrated the fine tuning of small Image Models in the Timm Library{PyTorch Image Models}, with improved accuracy, lowered training time per epoch, and additional preprocessing techniques to boost the overall accuracy of the models


---

- This Notebook explores larger image models and how do they consume GPU memory, and how do these Large models perform in the same [Paddy Disease Classification Dataset](https://www.kaggle.com/competitions/paddy-disease-classification/data).







## I. Setting Up Environment

In [None]:
try:
  import fastkaggle
except ModuleNotFoundError:
  !pip install fastkaggle --q


In [None]:
from fastkaggle import *

!pip install -Uq fastcore>=1.4.5
!pip install -Uq fastai>=2.7.1
!pip install -Uq timm==0.6.13

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/549.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/549.1 kB[0m [31m5.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m549.1/549.1 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
#getting the data:

data_path = "paddy-disease-classification"
path = setup_comp(data_path)

Downloading paddy-disease-classification.zip to /content


100%|██████████| 1.02G/1.02G [00:34<00:00, 32.1MB/s]





In [None]:
import timm
from fastai.vision.all import *
from pathlib import Path
print(timm.__version__)

0.6.13


## 1. Saving up GPU memory [using Gradient Accumulation]:

In [None]:
train_path = path/'train_images'

***Gradient Accumulation***:<br>
 this technique enables for a more memory efficient training process for models, as larger models require more and more GPU memory, which is not always possible hence, Gradient Accumulation splits the input batch or the global batch into small mini batches[*in code: batch size=64, accumulation=2, thus the mini-batch size=32* {batch size//accumulation}] and runs into sequentially in the GPU, thus the gradients are accumulated over these batches before going through a parameter update.

In [None]:
#Defining a train function with gradient accumulation:

def train_models(architecture, size, item=Resize(480, method='squish'), accum=1, finetune=True, epochs=15):
  dls = ImageDataLoaders.from_folder(train_path, valid_pct=0.2, item_tfms=item, batch_tfms=aug_transforms(size=size, min_scale=0.7),
                                     bs=64//accum)

  callback = GradientAccumulation(64) if accum else []
  learner = vision_learner(dls, architecture, metrics=error_rate, cbs=callback).to_fp16()
  if finetune:
    learner.fine_tune(epochs, 0.01)

  else:
    learner.unfreeze()
    learner.fit_one_cycle(epochs, 0.01)


In [None]:
train_models('convnext_small_in22k', 128, epochs=1, accum=1, finetune=False)


Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_small_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_small_22k_224.pth


epoch,train_loss,valid_loss,error_rate,time
0,2.587733,7.147458,0.846708,01:46


In [None]:
!pip install -Uq pynvml

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
#checking the memory use for the above model
import gc
def report_gpu_memory_use():
  print(torch.cuda.list_gpu_processes())
  gc.collect()
  torch.cuda.empty_cache()


In [None]:
report_gpu_memory_use()

GPU:0
process       2191 uses     3246.000 MB GPU memory


In [None]:
train_models('convnext_small_in22k', 128, epochs=1, accum=4, finetune=False)


epoch,train_loss,valid_loss,error_rate,time
0,2.307289,2.073117,0.739548,02:26


In [None]:
report_gpu_memory_use()

GPU:0
process       2191 uses     1664.000 MB GPU memory


### Checking memory use with large vision models:

In [None]:
train_models("convnext_large_in22k", 224, epochs=1, accum=2, finetune=False)

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_large_22k_224.pth


epoch,train_loss,valid_loss,error_rate,time
0,2.43947,3.606422,0.844786,03:23


In [None]:
report_gpu_memory_use()

GPU:0
process       2191 uses    10082.000 MB GPU memory


In [None]:
train_models("convnext_large_in22k", (320, 240), epochs=1, accum=2, finetune=False)

epoch,train_loss,valid_loss,error_rate,time
0,2.418951,2.395293,0.77655,04:34


In [None]:
report_gpu_memory_use()

GPU:0
process       2191 uses    13450.000 MB GPU memory


In [None]:
train_models('swin_large_patch4_window7_224', 224, epochs=1, accum=4, finetune=False)
report_gpu_memory_use()

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window7_224_22kto1k.pth" to /root/.cache/torch/hub/checkpoints/swin_large_patch4_window7_224_22kto1k.pth


epoch,train_loss,valid_loss,error_rate,time
0,2.607357,2.173585,0.839981,04:05


GPU:0
process       2191 uses     7416.000 MB GPU memory


## Running a set of models:

In [None]:
reso = 640,480

**Models to be used**:<br>


1.   Convnext Large
2.   ViT large
3.   Swin V2 Large transformers
4.   Swin Large transformer



In [None]:
models = {
    'convnext_large_in22k' : {(Resize(reso), (320,224))},
    'vit_large_patch16_224' : {(Resize(480, method='squish'), 224), (Resize(reso), 224)},
    'swinv2_large_window12_192_22k' : {(Resize(480, method='squish'), 192), (Resize(reso), 192)},
    'swin_large_patch4_window7_224' : {(Resize(reso), 224)}
}

In [None]:
#we will also use TTA so appending the TTA results:

tta_res = []


#going over the models dict, and starting to train them & reporting their GPU uses:
for arch, resizing in models.items():
  for function, size in resizing:
    print("_______", arch)
    print(size)
    print(function.name)
    tta_res.append(train_models(arch, size, item=function, accum=2))
    gc.collect()
    torch.cuda.empty_cache()

_______ convnext_large_in22k
(320, 224)
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


epoch,train_loss,valid_loss,error_rate,time
0,0.869282,0.572958,0.171072,03:25


epoch,train_loss,valid_loss,error_rate,time
0,0.399285,0.214569,0.064873,04:28
1,0.298218,0.195351,0.057665,04:29
2,0.303873,0.196167,0.059106,04:28
3,0.251087,0.200498,0.052379,04:29
4,0.230011,0.239151,0.066314,04:29
5,0.157937,0.209875,0.048054,04:27
6,0.140141,0.13668,0.034599,04:27
7,0.090786,0.154041,0.03556,04:27
8,0.084829,0.111407,0.02691,04:26
9,0.064804,0.112729,0.029313,04:26


_______ vit_large_patch16_224
224
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


epoch,train_loss,valid_loss,error_rate,time
0,0.973429,0.597319,0.185968,03:54


epoch,train_loss,valid_loss,error_rate,time
0,0.436029,0.245278,0.077847,05:06
1,0.306151,0.25192,0.068236,05:07
2,0.320521,0.276375,0.079289,05:05
3,0.298511,0.307692,0.083614,05:04
4,0.243731,0.234655,0.070159,05:04
5,0.187122,0.215465,0.056223,05:03
6,0.145284,0.17167,0.042287,05:03
7,0.106196,0.175308,0.038443,05:03
8,0.119916,0.148347,0.030754,05:01
9,0.064023,0.128019,0.031235,05:01


_______ vit_large_patch16_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}


epoch,train_loss,valid_loss,error_rate,time


### Ensembling the results: