2nd place out of 47 with accuracy score 87.13 (top 1 91.49)
See the presentation for more details.
This is a baseline solution for CV task in Yandex ML Cup 2021 competition.
To run training, you have to setup prdinary PyTorch environment with some extra
libraries which can be installed by pip
or conda
in a virual environment.
For full list of required libraries see baseline/requirements.txt
.
To train the baseline model, you need to have at least single GPU.
Baseline training can be run by set of commands similar to
cd baseline
python3 -m venv venv
source venv/bin/activate
pip3 install torch==1.8.2+cu111 torchvision==0.9.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
pip install -r requirements.txt
CUDA_VISIBLE_DEVICES=1,2 python train.py +name=baseline train.trainer_params.gpus=2 _data.paths.images_directory='/home/${oc.env:USER}/path/to/images/dir/' _data.paths.metadata_file='/home/${oc.env:USER}/path/to/metadata.json'
For convienience, pretrained model checkpoint can be downloaded by running
bash download_checkpoints.sh
A small subset of Caltech
datasets is avaialble at contest/data
directory.
contest/data/datasets/
directory contains two datasets, with classes labels
and source images in each of them.
To run zero-shot classes prediction on these datasets, run
cd baseline
python predict.py --ckpt_path /path/to/checkpoint --data_directory ../contest/data/datasets/ --predicts_file ../contest/predictions.json
You may add --device cuda
argument to speed up prediction locally.
To approximate evaluation speed on the Yandex Contest server, add --device cpu --num_threads 1
arguments to imitate single available CPU.
File contest/data/gt.json
contains GT labels. You can use them to calculate accuracy by running
python ../contest/evaluate_predictions.py --gt_file ../contest/data/gt.json --predicts_file ../contest/predictions.json --average 1 --strict 1
It will print an accuracy like 78.57894736842104
. You can also use --average 0
instead, to get per-class accuracies:
{"caltech101": 74.21052631578947, "caltech256": 82.94736842105263}
In order to send the model for online evaluation in Yandex.Contest system,
you need to make it compact and runnable without network connection.
In order to do this you may use script convert_checkpoint_for_inference.py
which disables loading pretrained embeddings and weights from network and optionally converts checkpoint
into FP16 mode (which roughly halves the checkpoint size), while also deleting
optimizer states from it.
setup.sh
and predict.sh
are the two files used by Yandex.Contest system to run your submission.
setup.sh
should install come extra required libraries missing in Y.Contest environment and predict.sh
is run with two arguments: first is path to source data (like contest/data/eval/public_subset
) and the second is the .json file where prediction should be put.
To make a submission, simply archive whole baseline
directory and send it to Y.Contest system for online evaulation:
cd baseline
zip -r ../submission.zip ./*
List of all packages have been installed in Yandex Contest and it's version you can
find in packages-versions.txt