This downstream_imagenet
is isolated from pre-training codes. One can treat this downstream_imagenet
as an independent codebase 🛠️.
See INSTALL.md to prepare pip
dependencies and the ImageNet dataset.
Note: for network definitions, we directly use timm.models.ResNet
and official ConvNeXt.
Run /downstream_imagenet/main.py via torchrun
.
It is required to specify the ImageNet data folder (--data_path
), your experiment name & log dir (--exp_name
and --exp_dir
, automatically created if not exists), the model name (--model
, valid choices see the keys of 'HP_DEFAULT_VALUES' in /downstream_imagenet/arg.py line14), and the pretrained weight file --resume_from
to run fine-tuning.
All the other configurations have their default values, listed in /downstream_imagenet/arg.py#L13.
You can overwrite any defaults by --bs=1024
or something like that.
Here is an example to pretrain a ConvNeXt-Small on an 8-GPU single machine:
$ cd /path/to/SparK/downstream_imagenet
$ torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=<some_port> main.py \
--data_path=/path/to/imagenet --exp_name=<your_exp_name> --exp_dir=/path/to/logdir \
--model=convnext_small --resume_from=/some/path/to/convnextS_1kpretrained_official_style.pth
For multiple machines, change the --nnodes
and --master_addr
to your configurations. E.g.:
$ torchrun --nproc_per_node=8 --nnodes=<your_nnodes> --node_rank=<rank_starts_from_0> --master_address=<some_address> --master_port=<some_port> main.py \
...
See files under --exp_dir
to track your experiment:
-
<model>_1kfinetuned_last.pth
: the latest model weights -
<model>_1kfinetuned_best.pth
: model weights with the highest acc -
<model>_1kfinetuned_best_ema.pth
: EMA weights with the highest acc -
finetune_log.txt
: records some important information such as:git_commit_id
: git versioncmd
: all arguments passed to the script
It also reports training loss/acc, best evaluation acc, and remaining time at each epoch.
-
tensorboard_log/
: saves a lot of tensorboard logs, you can visualize accuracies, loss values, learning rates, gradient norms and more things viatensorboard --logdir /path/to/this/tensorboard_log/ --port 23333
.
Use --resume_from
again, like --resume_from=path/to/<model>_1kfinetuned_last.pth
.