<a href="https://colab.research.google.com/github/lolikgiovi/ConvNeXt-Repro/blob/main/Training_ConvNeXt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installing the Dependencies

In [None]:
!pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
!pip install timm==0.3.2 tensorboardX six

In [None]:
!git clone https://github.com/facebookresearch/ConvNeXt

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Optional: Using Weights a& Biases Dashboard
I found it convenient to monitor my model training performance via W&B Dashboard. You can log in to W&B using this command and follow the instruction through.

In [None]:
#1178f6d81bafb6a3f9362a927de31ed685ab4c59
!pip install wandb
!wandb login

## Dataset
I am using [Imagenette Dataset](https://github.com/fastai/imagenette). It is a **subset of Imagenet** dataset, the dataset being used in [the official ConvNeXt implementation](https://github.com/facebookresearch/ConvNeXt).

Both of the dataset containing images with Fullsize, 320px and 160px size. I am using the 160px for training the ConvNeXt model.

The dataset also comes with a CSV file with 1%, 5%, 25%, and 50% of the labels randomly changed to an incorrect label.





### Imagenette
*Imagenette* is a subset of 10 easily classified classes from Imagenet (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute).

In [None]:
# Getting data from Imagenette
!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
!tar -xzf imagenette2-160.tgz

--2023-03-05 15:38:18--  https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.165.160, 52.217.118.160, 52.217.101.110, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.165.160|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 99003388 (94M) [application/x-tar]
Saving to: ‘imagenette2-160.tgz’


2023-03-05 15:38:22 (26.8 MB/s) - ‘imagenette2-160.tgz’ saved [99003388/99003388]



### Alternative: Imagewoof
*Imagewoof* is a subset of 10 classes from Imagenet **that aren't so easy to classify**, since they're all dog breeds. The breeds are: Australian terrier, Border terrier, Samoyed, Beagle, Shih-Tzu, English foxhound, Rhodesian ridgeback, Dingo, Golden retriever, Old English sheepdog. 

In [None]:
# Getting data from Imagewoof
!wget https://s3.amazonaws.com/fast-ai-imageclas/imagewoof2-160.tgz
!tar -xvzf imagewoof2-160.tgz

## Setting up Model Training in Colab



Original training command [from repo](https://github.com/facebookresearch/ConvNeXt/blob/main/TRAINING.md):


```
python -m torch.distributed.launch --nproc_per_node=8 main.py \
                                   --model convnext_tiny --drop_path 0.1 \
                                   --batch_size 128 --lr 4e-3 --update_freq 4 \
                                   --model_ema true --model_ema_eval true \
                                   --data_path /path/to/imagenet-1k 
                                   --output_dir /path/to/save_results
```

Using this command straight up in my Google Colab, it will resulting error like:
```
RuntimeError: CUDA error: invalid device ordinal  File "main.py", line 477, in <module>
```

So I specified the CUDA Device first and changed the nproc_per_node from 8 to 1, my training command become:
```
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny --drop_path 0.1 \
                                    --batch_size 128 --lr 4e-3 --update_freq 4 \
                                    --model_ema true --model_ema_eval true \
                                    --input_size 160 --drop_path 0.2 \
                                    --data_path /content/imagenette2-160 \
                                    --output_dir /content/res

```

In [None]:
%cd /content/ConvNeXt

/content/ConvNeXt


I am using ConvNeXt Tiny as the model architecture, since my task is requiring me to train the models using smallest dataset, then a smaller architecture will fit best since it have fewer parameters and require less data to train.

### ConvNeXt-T -- Batch 32, Augmentation Default
- Batch size: 32
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2


This is my first trial on Training the model. I tried to train 100 epoch with 50-30-20 steps since I want to see the initial performance first before doing the whole 100 epochs. 

In [None]:
!mkdir -p /content/result_tiny
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --epochs 100 \
                                    --batch_size 32 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --output_dir /content/result_tiny \
                                    --log_dir /content/result_tiny \
                                    --enable_wandb true --wandb_ckpt true

In [None]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

### ConvNeXt-T -- Batch 64, Augmentation Default
- Batch size: 64
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2


In this approach, I tried to make the batch size bigger so the training will be stable. It might be more stable though since the Acc@1 EMA is the highest among all, but the Acc@1 is considered smaller than the ones with smaller batch size.

In [None]:
!mkdir -p /content/result_tiny2
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --epochs 100 \
                                    --batch_size 64 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_tiny2 \
                                    --log_dir /content/result_tiny2 \
                                    --enable_wandb true --wandb_ckpt true

In [None]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny2/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

### ConvNeXt-T -- Batch 32, Augmentation Modified
- Batch size: 32
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2

Augmentation Edit:
- color_jitter: 0.5 (default: 0.4)
- smoothing: 0.2 (default: 0.1)



Here, I tried to get back with 32 Batch Size but modified the augmentation variable a bit. The result is the highest amongst all.

In [None]:
!mkdir -p /content/result_tiny3
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --epochs 100 \
                                    --batch_size 32 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --color_jitter 0.5 \
                                    --smoothing 0.2 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_tiny3 \
                                    --log_dir /content/result_tiny3 \
                                    --enable_wandb true --wandb_ckpt true

In [None]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny3/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

### ConvNeXt-S -- Batch 32, Augmentation Default
- Batch size: 32
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2

In [None]:
!mkdir -p /content/result_small_1
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_small \
                                    --epochs 100 \
                                    --batch_size 64 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_small_1 \
                                    --log_dir /content/result_small_1 \
                                    --enable_wandb true --wandb_ckpt true

In [None]:
# Evaluation
!python main.py --model convnext_small --eval true \
                --resume /content/result_small_1/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160