Huiwon Jang1 ·
Dongyoung Kim1 ·
Junsu Kim1
Jinwoo Shin1 · Pieter Abbeel2 · Younggyo Seo1,3
1 KAIST 2UC Berkeley 3Dyson Robot Learning Lab
Jinwoo Shin1 · Pieter Abbeel2 · Younggyo Seo1,3
1 KAIST 2UC Berkeley 3Dyson Robot Learning Lab
- We note that torch version >2.0 may work, but
conda install
with below version is recommended.
conda create -n rsp python=3.9.12 -y
conda activate rsp
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
sh data_preprocessing/download.sh
sh data_preprocessing/extract.sh
- We assume the root directory for the data:
$DATA_ROOT = /data/kinetics400
. - If you want to change the root directory, please change
root_dl
ofdownload.sh
andextract.sh
.
- We resize the data into 256x256 for the efficient loading while training.
python data_preprocessing/make_256scale.py --datadir $DATA_ROOT
- We additionally provide the code to filter out several not-working videos.
python data_preprocessing/make_labels.py --datadir $DATA_ROOT --filedir train2
/data/kinetics400
|-- train2
|-- abseiling
|-- xx.mp4
|-- ...
|-- air_drumming
|-- xx.mp4
|-- ...
|-- ...
|-- labels
|-- label_full_1.0.pickle
- Note that
[N_NODE] x [BATCH_SIZE_PER_GPU] x [ACCUM_ITER] = 1536
to reproduce our results. - Default:
[DATA_PATH]=/data/kinetics400
python -m torch.distributed.launch --nproc_per_node=[N_NODE] main_pretrain_rsp.py \
--batch_size [BATCH_SIZE_PER_GPU] \
--accum_iter [ACCUM_ITER] \
--model rsp_vit_small_patch16 \
--epochs 400 \
--warmup_epochs 40 \
--data_path [DATA_PATH] \
--log_dir [LOG_DIR] \
--output_dir [LOG_DIR] \
--norm_pix_loss \
--repeated_sampling 2
We provide the checkpoint in the below:
The evaluation code is mainly built upon Dino.
- Step 1: Dataset preparation
We note that the default root path is [DATA_ROOT]=/data
. Additionally, we resize DAVIS of 480x(?) to 480x880 for a natural evaluation with patches.
sh data_preprocessing/eval/davis_download.sh
python data_preprocessing/eval/davis_preprocessing.py --data_root [DATA_ROOT]
[DATA_ROOT]/DAVIS_480_880
|-- Annotations/480p
|-- bear
|-- 00000.png
|-- ...
|-- ...
|-- ImageSets/2017/val.txt
|-- JPEGImages/480p
|-- bear
|-- 00000.jpg
|-- ...
|-- ...
- Step 2: Video object segmentation
python eval_video_segmentation_davis.py \
--finetune [LOG_DIR]/checkpoint-199.pth \
--output_dir [LOG_DIR]/davis_seg \
--data_path [DATA_ROOT]/DAVIS_480_880 \
--topk 7 --size_mask_neighborhood 30 --n_last_frames 30 \
--model vit_small
- Step 3: Evaluation the obtained segmentation
git clone https://github.com/davisvideochallenge/davis2017-evaluation
python ./davis2017-evaluation/evaluation_method.py \
--task semi-supervised \
--results_path [LOG_DIR]/davis_seg \
--davis_path [DATA_ROOT]/DAVIS_480_880
TBD
TBD
We will update the evaluation code at https://github.com/huiwon-jang/RSP/tree/eval_cortexbench.
TBD
TBD