<a href="https://colab.research.google.com/github/lolikgiovi/ConvNeXt-Repro/blob/main/Training_History.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installing the Dependencies

In [1]:
!pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
!pip install timm==0.3.2 tensorboardX six

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.8.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.0%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 GB[0m [31m880.8 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision==0.9.0+cu111
  Downloading https://download.pytorch.org/whl/cu111/torchvision-0.9.0%2Bcu111-cp38-cp38-linux_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m61.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 1.13.1+cu116
    Uninstalling torch-1.13.1+cu116:
      Successfully uninstalled torch-1.13.1+cu116
  Attempting uninstall: torchvision
    Found existing installation: tor

In [2]:
!git clone https://github.com/facebookresearch/ConvNeXt

Cloning into 'ConvNeXt'...
remote: Enumerating objects: 252, done.[K
remote: Counting objects: 100% (249/249), done.[K
remote: Compressing objects: 100% (118/118), done.[K
remote: Total 252 (delta 129), reused 192 (delta 110), pack-reused 3[K
Receiving objects: 100% (252/252), 69.63 KiB | 963.00 KiB/s, done.
Resolving deltas: 100% (129/129), done.


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Optional: Using Weights a& Biases Dashboard
I found it convenient to monitor my model training performance via W&B Dashboard. You can log in to W&B using this command and follow the instruction through.

In [4]:
#1178f6d81bafb6a3f9362a927de31ed685ab4c59
!pip install wandb
!wandb login

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


## Dataset
I am using [Imagenette Dataset](https://github.com/fastai/imagenette). It is a **subset of Imagenet** dataset, the dataset being used in [the official ConvNeXt implementation](https://github.com/facebookresearch/ConvNeXt).

Both of the dataset containing images with Fullsize, 320px and 160px size. I am using the 160px for training the ConvNeXt model.

The dataset also comes with a CSV file with 1%, 5%, 25%, and 50% of the labels randomly changed to an incorrect label.





### Imagenette
*Imagenette* is a subset of 10 easily classified classes from Imagenet (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute).

In [3]:
# Getting data from Imagenette
!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
!tar -xzf imagenette2-160.tgz

--2023-03-05 15:38:18--  https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.165.160, 52.217.118.160, 52.217.101.110, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.165.160|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 99003388 (94M) [application/x-tar]
Saving to: ‘imagenette2-160.tgz’


2023-03-05 15:38:22 (26.8 MB/s) - ‘imagenette2-160.tgz’ saved [99003388/99003388]



### Alternative: Imagewoof
*Imagewoof* is a subset of 10 classes from Imagenet **that aren't so easy to classify**, since they're all dog breeds. The breeds are: Australian terrier, Border terrier, Samoyed, Beagle, Shih-Tzu, English foxhound, Rhodesian ridgeback, Dingo, Golden retriever, Old English sheepdog. 

In [None]:
# Getting data from Imagewoof
!wget https://s3.amazonaws.com/fast-ai-imageclas/imagewoof2-160.tgz
!tar -xvzf imagewoof2-160.tgz

## Setting up Model Training in Colab



Original training command [from repo](https://github.com/facebookresearch/ConvNeXt/blob/main/TRAINING.md):


```
python -m torch.distributed.launch --nproc_per_node=8 main.py \
                                   --model convnext_tiny --drop_path 0.1 \
                                   --batch_size 128 --lr 4e-3 --update_freq 4 \
                                   --model_ema true --model_ema_eval true \
                                   --data_path /path/to/imagenet-1k 
                                   --output_dir /path/to/save_results
```

Using this command straight up in my Google Colab, it will resulting error like:
```
RuntimeError: CUDA error: invalid device ordinal  File "main.py", line 477, in <module>
```

So I specified the CUDA Device first and changed the nproc_per_node from 8 to 1, my training command become:
```
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny --drop_path 0.1 \
                                    --batch_size 128 --lr 4e-3 --update_freq 4 \
                                    --model_ema true --model_ema_eval true \
                                    --input_size 160 --drop_path 0.2 \
                                    --data_path /content/imagenette2-160 \
                                    --output_dir /content/res

```

In [None]:
%cd /content/ConvNeXt

/content/ConvNeXt


I am using ConvNeXt Tiny as the model architecture, since my task is requiring me to train the models using smallest dataset, then a smaller architecture will fit best since it have fewer parameters and require less data to train.

### ConvNeXt-T -- Batch 32, Augmentation Default
- Batch size: 32
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2


This is my first trial on Training the model. I tried to train 100 epoch with 50-30-20 steps since I want to see the initial performance first before doing the whole 100 epochs. 

In [17]:
!mkdir -p /content/result_tiny
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --epochs 50 \
                                    --batch_size 32 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --output_dir /content/result_tiny \
                                    --log_dir /content/result_tiny \
                                    --enable_wandb true --wandb_ckpt true

/content/ConvNeXt
| distributed init (rank 0): env://, gpu 0
Namespace(aa='original', auto_resume=True, batch_size=32, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.1, enable_wandb=True, epochs=50, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=0, log_dir='/content/result_tiny', lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=True, model_ema_decay=0.9999, model_ema_eval=True, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/

In [18]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel', 

Acc@1 after 50 epochs: 78.083

In [24]:
!mkdir -p /content/result_tiny
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --resume /content/result_tiny/checkpoint-49.pth \
                                    --epochs 80 \
                                    --start_epoch 50 \
                                    --batch_size 32 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --output_dir /content/result_tiny \
                                    --log_dir /content/result_tiny \
                                    --enable_wandb true --wandb_ckpt true

/content/ConvNeXt
| distributed init (rank 0): env://, gpu 0
Namespace(aa='original', auto_resume=True, batch_size=32, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.1, enable_wandb=True, epochs=80, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=0, log_dir='/content/result_tiny', lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=True, model_ema_decay=0.9999, model_ema_eval=True, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/

In [25]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel', 

Acc@1 after 80 epochs: 82.268

In [26]:
!mkdir -p /content/result_tiny
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --resume /content/result_tiny/checkpoint-79.pth \
                                    --epochs 100 \
                                    --start_epoch 80 \
                                    --batch_size 32 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --output_dir /content/result_tiny \
                                    --log_dir /content/result_tiny \
                                    --enable_wandb true --wandb_ckpt true

/content/ConvNeXt
| distributed init (rank 0): env://, gpu 0
Namespace(aa='original', auto_resume=True, batch_size=32, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.1, enable_wandb=True, epochs=100, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=0, log_dir='/content/result_tiny', lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=True, model_ema_decay=0.9999, model_ema_eval=True, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='

In [27]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel', 

Acc@1 after 100 epochs: 83.389

### ConvNeXt-T -- Batch 64, Augmentation Default
- Batch size: 64
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2


In this approach, I tried to make the batch size bigger so the training will be stable. It might be more stable though since the Acc@1 EMA is the highest among all, but the Acc@1 is considered smaller than the ones with smaller batch size.

In [28]:
!mkdir -p /content/result_tiny2
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --epochs 50 \
                                    --batch_size 64 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_tiny2 \
                                    --log_dir /content/result_tiny2 \
                                    --enable_wandb true --wandb_ckpt true

/content/ConvNeXt
| distributed init (rank 0): env://, gpu 0
Namespace(aa='original', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.1, enable_wandb=True, epochs=50, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=0, log_dir='/content/result_tiny2', lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=True, model_ema_decay=0.9999, model_ema_eval=True, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=10, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/c

In [30]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny2/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel', 

In [31]:
!mkdir -p /content/result_tiny2
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --resume /content/result_tiny2/checkpoint-49.pth\
                                    --epochs 100 \
                                    --start_epoch 50 \
                                    --batch_size 64 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_tiny2 \
                                    --log_dir /content/result_tiny2 \
                                    --enable_wandb true --wandb_ckpt true

/content/ConvNeXt
| distributed init (rank 0): env://, gpu 0
Namespace(aa='original', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_backend='nccl', dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=True, drop_path=0.1, enable_wandb=True, epochs=100, eval=False, eval_data_path=None, finetune='', gpu=0, head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=0, log_dir='/content/result_tiny2', lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=True, model_ema_decay=0.9999, model_ema_eval=True, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=10, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/

In [6]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/drive/MyDrive/results/tiny2-checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel', 

### ConvNeXt-T -- Batch 32, Augmentation Modified
- Batch size: 32
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2

Augmentation Edit:
- color_jitter: 0.5 (default: 0.4)
- smoothing: 0.2 (default: 0.1)



Here, I tried to get back with 32 Batch Size but modified the augmentation variable a bit. The result is the highest amongst all.

In [33]:
!mkdir -p /content/result_tiny3
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_tiny \
                                    --epochs 100 \
                                    --batch_size 32 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --color_jitter 0.5 \
                                    --smoothing 0.2 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_tiny3 \
                                    --log_dir /content/result_tiny3 \
                                    --enable_wandb true --wandb_ckpt true

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch: [38]  [110/295]  eta: 0:00:52  lr: 0.003502  min_lr: 0.003502  loss: 3.2387 (3.2897)  weight_decay: 0.0500 (0.0500)  time: 0.2676  data: 0.0010  max mem: 3500
Epoch: [38]  [120/295]  eta: 0:00:49  lr: 0.003500  min_lr: 0.003500  loss: 3.3178 (3.2900)  weight_decay: 0.0500 (0.0500)  time: 0.2710  data: 0.0022  max mem: 3500
Epoch: [38]  [130/295]  eta: 0:00:46  lr: 0.003498  min_lr: 0.003498  loss: 3.3256 (3.2933)  weight_decay: 0.0500 (0.0500)  time: 0.2694  data: 0.0023  max mem: 3500
Epoch: [38]  [140/295]  eta: 0:00:43  lr: 0.003496  min_lr: 0.003496  loss: 3.3386 (3.2940)  weight_decay: 0.0500 (0.0500)  time: 0.2647  data: 0.0015  max mem: 3500
Epoch: [38]  [150/295]  eta: 0:00:40  lr: 0.003495  min_lr: 0.003495  loss: 3.2107 (3.2884)  weight_decay: 0.0500 (0.0500)  time: 0.2610  data: 0.0018  max mem: 3500
Epoch: [38]  [160/295]  eta: 0:00:37  lr: 0.003493  min_lr: 0.003493  loss: 3.2248 (3.2892)  weight_decay

In [38]:
# Evaluation
!python main.py --model convnext_tiny --eval true \
                --resume /content/result_tiny3/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_tiny', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel', 

85.885% Acc@1, it is higher than the paper's result which is 82%. 

I might have overfit from the small dataset though, but my approach has been optimized to overcome overfitting: using Tiny Architecture and smaller batch size.

### ConvNeXt-S -- Batch 32, Augmentation Default
- Batch size: 32
- Epochs: 100
- Update Freq: 4
- Input Size: 160 (Imagenette2-160)
- Learning rate: 0.004
- Drop: 0.2

In [44]:
!mkdir -p /content/result_small_1
%cd /content/ConvNeXt
!CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main.py \
                                    --model convnext_small \
                                    --epochs 100 \
                                    --batch_size 64 \
                                    --lr 4e-3 \
                                    --update_freq 4 \
                                    --model_ema true \
                                    --model_ema_eval true \
                                    --aa original \
                                    --drop_path 0.1 \
                                    --opt adamw \
                                    --train_interpolation bicubic \
                                    --input_size 160 \
                                    --data_path /content/imagenette2-160 \
                                    --nb_classes 10 \
                                    --output_dir /content/result_small_1 \
                                    --log_dir /content/result_small_1 \
                                    --enable_wandb true --wandb_ckpt true

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch: [10]  [146/147]  eta: 0:00:00  lr: 0.002197  min_lr: 0.002197  loss: 2.9072 (2.9511)  weight_decay: 0.0500 (0.0500)  time: 0.7234  data: 0.0002  max mem: 7679
Epoch: [10] Total time: 0:02:08 (0.8729 s / it)
Averaged stats: lr: 0.002197  min_lr: 0.002197  loss: 2.9072 (2.9511)  weight_decay: 0.0500 (0.0500)
Test:  [ 0/41]  eta: 0:02:02  loss: 1.3286 (1.3286)  acc1: 77.0833 (77.0833)  acc5: 91.6667 (91.6667)  time: 2.9963  data: 2.5350  max mem: 7679
Test:  [10/41]  eta: 0:00:20  loss: 1.6801 (1.7207)  acc1: 55.2083 (50.3788)  acc5: 90.6250 (89.4886)  time: 0.6483  data: 0.2396  max mem: 7679
Test:  [20/41]  eta: 0:00:11  loss: 1.9541 (1.9681)  acc1: 31.2500 (35.2679)  acc5: 84.3750 (82.7877)  time: 0.4117  data: 0.0059  max mem: 7679
Test:  [30/41]  eta: 0:00:05  loss: 2.0139 (1.9571)  acc1: 28.1250 (36.2231)  acc5: 82.2917 (83.6694)  time: 0.4061  data: 0.0010  max mem: 7679
Test:  [40/41]  eta: 0:00:00  loss: 1.81

In [46]:
# Evaluation
!python main.py --model convnext_small --eval true \
                --resume /content/result_small_1/checkpoint-best.pth \
                --input_size 160 --drop_path 0.1 \
                --data_path /content/imagenette2-160

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_resume=True, batch_size=64, clip_grad=None, color_jitter=0.4, crop_pct=None, cutmix=1.0, cutmix_minmax=None, data_path='/content/imagenette2-160', data_set='IMNET', device='cuda', disable_eval=False, dist_eval=True, dist_on_itp=False, dist_url='env://', distributed=False, drop_path=0.1, enable_wandb=False, epochs=300, eval=True, eval_data_path=None, finetune='', head_init_scale=1.0, imagenet_default_mean_and_std=True, input_size=160, layer_decay=1.0, layer_scale_init_value=1e-06, local_rank=-1, log_dir=None, lr=0.004, min_lr=1e-06, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convnext_small', model_ema=False, model_ema_decay=0.9999, model_ema_eval=False, model_ema_force_cpu=False, model_key='model|module', model_prefix='', momentum=0.9, nb_classes=1000, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', pin_mem=True, project='convnext', recount=1, remode='pixel',

The result is higher than the paper's result which is 83%.

It took longer training time compared to Tiny Architecture, almost 5 hours.