This repository provides code for participating in The Algonauts Project 2023 Challenge. For more details, please refer to our solution paper and the challenge's official page.
[Our paper] [The challenge's official page]
model | Size name | Num. of paremeters | Link |
---|---|---|---|
EVA01-CLIP | huge | 1.0B | Paper |
EVA02-CLIP | base | 0.086B | |
large | 0.3B | ||
enormous | 4.4B | ||
ConvNext | xxlarge | 0.85B | Paper |
ONE-PEACE | N / A | 1.5B | Paper |
InternImage | giant | 3.0B | Paper |
git clone https://github.com/suyamat/ScalingVisionEncoder
cd ScalingVisionEncoder
conda create -n scaling_vis_enc python==3.8
conda activate scaling_vis_enc
pip insall -r requirements.txt
echo -e 'DATA_DIR=data\nPYTHONPATH=./' > .env
Place the challenge's data in the data directory, following this structure: data/resp/subj01/...
For instance, to extract features from "EVA02-CLIP-large" with 4 GPUs (1 node x 4 GPUs), you can run
python -m encoding.scripts.extract_features \
--model_name "eva02-clip-large" \
--subject_name "all" \
--skip "1" \
--n_device "4" \
--batch_size "128"
For instance, to search for the optimal combination of layer and kernel size for max pooling, utilizing all layers, all kernel sizes, and 100% of the samples, you can run
python -m encoding.scripts.search_hparams \
--model_name "eva02-clip-large" \
--subject_name "all" \
--layer_start "1" \
--layer_step "1" \
--layer_end "24" \
--kernel_start "1" \
--kernel_step "1" \
--kernel_end "16" \
--use_ratio "1.0"
For example, to make final predictions using EVA02-CLIP-large, you can run
python -m encoding.scripts.predict \
--model_name "eva02-clip-large" \
--subject_name "all" \
Our code is built upon the following repositories. We would like to extend our gratitude to the contributors of these excellent codebases.