# What & Why
This notebook is inspired by the [third part](https://www.kaggle.com/code/jhoward/scaling-up-road-to-the-top-part-3) of Jeremy's solution of [Paddy Doctor: Paddy Disease Classification](https://www.kaggle.com/competitions/paddy-disease-classification) competition.

It is used to find batch size, batch accumulation, and image size configuration to fit the model into GPU memory.
In the current implementation, the search still needs to be manual because if the function to check size fails with OOM on GPU, then calling garbage collection and freeing CUDA cache don't free memory, and you need to restart the notebook to continue.

# Python packages setup & data download

This part is copied from the [01-baseline.ipynb](./01-baseline.ipynb)

In [1]:
USE_LATEST_PIP_PACKAGES = True

In [2]:
import os
is_kaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
if is_kaggle:
    print("Running notebook in kaggle mode")
else:
    print("Running notebook in Paperspace mode")

Running notebook in Paperspace mode


In [3]:
# install Jeremy's fastkaggle package with helpers for Kaggle API
try:
    import fastkaggle
except ModuleNotFoundError:
    if is_kaggle:
        print("installing fastkaggle into system folder on Kaggle")
        !pip install -q fastkaggle
    else:
        print("installing fastkaggle into user (/root/.local/) folder on Paperspace")
        # we are installing into local folder
        !pip install --user -q fastkaggle

from fastkaggle import setup_comp, push_notebook

In [4]:
# Jeremy's philosophy is to always work on latest versions of all packages 
# and don't bother with creating and tracking Python environments with the fixed set of packages
# I still like to have full replica of the env, so plan to add it as a fallback
if not is_kaggle:
    if USE_LATEST_PIP_PACKAGES:
        !pip install --user -Uqq timm==0.6.13 huggingface_hub fastai pynvml
    else:
        !pip install --user  --no-cache-dir -r ../requirements.txt

## Get competition data

In [5]:
# setup_comp downloads competition dataset archive into the current folder and extracts into subfolder
# it also pip-installs libraries when if we are on Kaggle
COMPETITION_NAME = 'paddy-disease-classification'

path = setup_comp(COMPETITION_NAME, install='timm huggingface_hub fastai')

In [6]:
import pandas as pd

In [7]:
# import fastai only after we updated all packages on Kaggle or locally
from fastai.vision.all import *
from fastcore.parallel import *
set_seed(42)

In [8]:
path.ls()

(#4) [Path('paddy-disease-classification/sample_submission.csv'),Path('paddy-disease-classification/test_images'),Path('paddy-disease-classification/train_images'),Path('paddy-disease-classification/train.csv')]

In [9]:
SUBM_FILES = get_image_files(path/'test_images').sorted()

In [10]:
SUBM_FILES

(#3469) [Path('paddy-disease-classification/test_images/200001.jpg'),Path('paddy-disease-classification/test_images/200002.jpg'),Path('paddy-disease-classification/test_images/200003.jpg'),Path('paddy-disease-classification/test_images/200004.jpg'),Path('paddy-disease-classification/test_images/200005.jpg'),Path('paddy-disease-classification/test_images/200006.jpg'),Path('paddy-disease-classification/test_images/200007.jpg'),Path('paddy-disease-classification/test_images/200008.jpg'),Path('paddy-disease-classification/test_images/200009.jpg'),Path('paddy-disease-classification/test_images/200010.jpg')...]

# Setup experiment

In [11]:
import gc
import re

In [12]:
df = pd.read_csv(path / "train.csv")
df.label.value_counts()

normal                      1764
blast                       1738
hispa                       1594
dead_heart                  1442
tungro                      1088
brown_spot                   965
downy_mildew                 620
bacterial_leaf_blight        479
bacterial_leaf_streak        380
bacterial_panicle_blight     337
Name: label, dtype: int64

Our goal is to check if model can fit the GPU, so we don't care about the actual trainig and thus will use only small portion of data.
Interestingly,  I would probably go with random sample, but Jeremy uses the smallest subclass.

In [13]:
trn_path = path / "train_images" / "bacterial_panicle_blight" 

In [14]:
get_image_files(trn_path)

(#337) [Path('paddy-disease-classification/train_images/bacterial_panicle_blight/109162.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/109183.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/101765.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/101592.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/104770.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/110190.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/106643.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/104269.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/108023.jpg'),Path('paddy-disease-classification/train_images/bacterial_panicle_blight/107676.jpg')...]

In [15]:
# questions
#  - why batch size 64? 
#  - how do we run learning rate finder with gradient accumulation?
#  - we probably do need to run LR finder when changing architecture
#  - why don't we use smaller batch size instead of gradient accumulation?

LEARNING_RATE = 0.01

In [16]:
def report_and_free_gpu(verbose=False):
    gpu_info = torch.cuda.list_gpu_processes()
    gc.collect()
    if verbose:
        print("Reported:", gpu_info)
    if gpu_info.find("no processes are running") >0:
        return 0
    used_mb = int(np.round(float(re.search(r"uses\s+(\d+.\d*)\s*MB", gpu_info).group(1)),0))
    torch.cuda.empty_cache()
    if verbose:
        print(f"Extracted usage: {used_mb} MB")
    return used_mb

In [17]:
report_and_free_gpu()

0

In [18]:
def check_model_runs(arch, size, item=Resize(480, method='squish'), accum=1, epochs=1, logical_batch_size=64):
    before_mb = report_and_free_gpu()
    dls = ImageDataLoaders.from_folder(
        trn_path,
        valid_pct=0.2,
        item_tfms=item,
        batch_tfms=aug_transforms(size=size, min_scale=0.75),
        bs = logical_batch_size // accum
    )
    callbacks = GradientAccumulation(logical_batch_size) if accum else []
    learn = vision_learner(dls, arch, metrics=error_rate, cbs=callbacks).to_fp16()
    learn.unfreeze()
    learn.fit_one_cycle(epochs, LEARNING_RATE)
    after_mb = report_and_free_gpu()
    return (arch, size, before_mb, after_mb)

## Explore effect of accum on convnext_small

In [19]:
check_model_runs('convnext_small_in22k', 128, accum=1)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:04


('convnext_small_in22k', 128, 0, 4268)

In [20]:
#results on papersapce:
#process    2598611 uses     4268.000 MB GPU memory

In [21]:
check_model_runs('convnext_small_in22k', 128, accum=2)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:03


('convnext_small_in22k', 128, 2364, 3188)

In [22]:
#results on papersapce:
#process    2598611 uses     3188.000 MB GPU memory

In [23]:
# increasing number of epochs shouldn't change results
check_model_runs('convnext_small_in22k', 128, accum=2, epochs=4)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:03
1,0.0,0.0,0.0,00:03
2,0.0,0.0,0.0,00:03
3,0.0,0.0,0.0,00:03


('convnext_small_in22k', 128, 2096, 3188)

In [24]:
#results on papersapce:
#process    2598611 uses     3188.000 MB GPU memory

In [25]:
check_model_runs('convnext_small_in22k', 128, accum=4)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:04


('convnext_small_in22k', 128, 2096, 2670)

In [26]:
#results on papersapce:
#process    2598611 uses     2670.000 MB GPU memory

## Manually check accum size to fit larger models

Start by calling train with accum=1 and increase it until it completes without errors

### model: convnext_large_in22k

In [27]:
check_model_runs('convnext_large_in22k', 224, accum=2)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:05


('convnext_large_in22k', 224, 2124, 11082)

In [28]:
# size x accum
# 480 x 5
# process    2694520 uses    15334.000 MB GPU memory
# 224 x 4
# process    2694520 uses     8086.000 MB GPU memory
# 224 x 2 
# process    2694520 uses    11082.000 MB GPU memory
# (320,240) x 2 
#process    2694520 uses    14426.000 MB GPU memory
# -> we need to adjust image siza and accum parameters to fit into memory

In [29]:
check_model_runs('convnext_large_in22k', (320, 240), accum=2)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:06


('convnext_large_in22k', (320, 240), 4644, 14426)

### model:vit_large_patch16_224

In [30]:
# trying to run this model with size differnt from 224 fails with assertion error because
# it doesn't support custom image size
check_model_runs('vit_large_patch16_224', 224, accum=2)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:06


('vit_large_patch16_224', 224, 5126, 15344)

In [31]:
# 224 x 4
# process    2729516 uses    11232.000 MB GPU memory
# 224 x 3 
# process    2729516 uses    12432.000 MB GPU memory
# 224 x 2 
# process    2729516 uses    15344.000 MB GPU memory

### model:swinv2_large_window12_192.ms_in22k

In [32]:
report_and_free_gpu()

6570

In [33]:
# this model expects image of a fixed size (192, 192)
# I need to downgrade timm to 0.6.13 to make it work
check_model_runs('swinv2_large_window12_192_22k', 192, accum=2)

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:06


('swinv2_large_window12_192_22k', 192, 1210, 13528)

### model: swin_large_patch4_window7_224

In [34]:
check_model_runs('swin_large_patch4_window7_224', 224, accum=2)

epoch,train_loss,valid_loss,error_rate,time
0,0.0,0.0,0.0,00:06


('swin_large_patch4_window7_224', 224, 4696, 11934)

In [35]:
#224 x 2
#process    2759986 uses    11934.000 MB GPU memory

In [36]:
models_of_interest = [
    ("convnext_large_in22k",         (320,240), 2),
    ("vit_large_patch16_224",         224,      2),
    ("swinv2_large_window12_192_22k", 192,      2),
    ("swin_large_patch4_window7_224", 224,      2),
]