Skip to content

jbertrand89/matching_based_fsar

Repository files navigation

Rethinking matching-based few-shot action recognition

[arXiv ] [project page]

This repository contains official code for our paper Rethinking matching-based few-shot action recognition.

What do we have here?

  1. Installation

  2. Data preparation

  3. Model zoo

  4. Evaluating a pre-trained model

    1. On pre-saved episodes
    2. General use-case
  5. Train a model

  6. Scripts summary

  7. Citation

Installation

This code is based on the TSL [1] and TRX [2] repositories. It requires Python >= 3.8

You can find below the installation script:

Code
ROOT_REPO_DIR=<path_to_the_root_folder>
cd ${ROOT_REPO_DIR}
git clone https://github.com/xianyongqin/few-shot-video-classification.git
git clone https://github.com/tobyperrett/few-shot-action-recognition.git
git clone https://github.com/jbertrand89/temporal_matching.git
cd temporal_matching

python -m venv ENV
source ENV/bin/activate
pip install torch torchvision==0.12.0
pip install tensorboard
pip install einops
pip install ffmpeg
pip install pandas

or use
pip install -r requirements.txt

Data preparation

For more details on the datasets, please refer to DATA_PREPARATION.

Model zoo

We saved the scripts and the pretrained models evaluated in the paper in MODEL_ZOO.

The following sections detail each step.

Evaluating a pre-trained model

On pre-saved episodes

To reproduce the paper numbers, you first need to

To run inference for a given matching function on pre-saved episodes, you need to specify:

  • ROOT_TEST_EPISODE_DIR (as defined in Download test episodes)
  • CHECKPOINT_DIR (as defined in Model ZOO)
  • ROOT_REPO_DIR (as defined in Installation)
  • MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
  • SHOT (number of example per class between 1/5)
  • DATASET (between ssv2/kinetics/ucf101)

And then run the evaluation on saved episodes script. Each script is different depending on the matching function, so please refer to scripts summary to find the one you need. For example, with Chamfer++ matching, run

Code
ROOT_TEST_EPISODE_DIR=<your_path>
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
SHOT=1
DATASET=ssv2

TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate  # ENV is the name of the environment

for SEED in 1 5 10
do
  MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}.pt
  
  python run_matching.py \
  --num_gpus 1 \
  --num_workers 1 \
  --backbone r2+1d_fc \
  --feature_projection_dimension 1152 \
  --method matching-based \
  --matching_function chamfer \
  --video_to_class_matching joint \
  --clip_tuple_length 3 \
  --shot ${SHOT} \
  --way 5 \
  -c ${CHECKPOINT_DIR} \
  -r -m ${MODEL_NAME} \
  --load_test_episodes \
  --test_episode_dir ${ROOT_TEST_EPISODE_DIR} \
  --dataset_name ${DATASET}
done

python average_multi_seeds.py --result_dir ${CHECKPOINT_DIR} --result_template ${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed --seeds 1 5 10

General use-case

You may want to run inference on a new set of episodes. We provide a script to use the R(2+1)D feature loader.

You first need to

To run inference for a given matching function, you need to specify:

  • ROOT_FEATURE_DIR (as defined in Download pre-saved features)
  • CHECKPOINT_DIR (as defined in Model ZOO)
  • ROOT_REPO_DIR (as defined in Installation)
  • MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
  • SHOT (number of example per class between 1/5)
  • DATASET (between ssv2/kinetics/ucf101)
  • TEST_SEED (the number you like)

And then run the evaluation, general case script. Each script is different depending on the matching function, so please refer to scripts summary to find the one you need. For example, with Chamfer++ matching, run

Code
ROOT_FEATURE_DIR=<your_path>
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
SHOT=1
DATASET=ssv2
TEST_SEED=1
TEST_DIR=${ROOT_FEATURE_DIR}/${DATASET}/test

TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment

for SEED in 1 5 10
do
  MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}.pt
  
  python run_matching.py \
  --num_gpus 1 \ 
  --num_workers 1 \
  --backbone r2+1d_fc \
  --feature_projection_dimension 1152 \
  --method matching-based \
  --matching_function chamfer \
  --video_to_class_matching joint \
  --clip_tuple_length 3 \
  --shot ${SHOT} \
  --way 5  \
  -c ${CHECKPOINT_DIR} \
  -r -m ${MODEL_NAME}\
  --split_dirs  ${TEST_DIR} \
  --split_names test \
  --split_seeds ${TEST_SEED}\
  --dataset_name ${DATASET}
done

python average_multi_seeds.py --result_dir ${CHECKPOINT_DIR} --result_template ${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed --seeds 1 5 10

Train a model

To compare fairly classifier-based and matching-based approaches, we start from frozen R(2+1)D features

To run inference for a given matching function on pre-saved episodes, you need to specify:

  • CHECKPOINT_DIR (can be different from the one defined in Model ZOO)
  • ROOT_FEATURE_DIR (as defined in Download pre-saved features)
  • ROOT_REPO_DIR (as defined in Installation)
  • MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
  • DATASET (between ssv2/kinetics/ucf101)
  • SHOT (number of example per class between 1/5)
  • SEED (the number you like, we chose 1/5/10)

The following hyper-parameters were tuned with optuna, we provide you the optimum value found for each method

  • LR (usually between 0.01/0.001/0.0001)
  • GLOBAL_TEMPERATURE
  • TEMPERATURE_WEIGHT

And then run the training script. Each script is different depending on the matching function, so please refer to the scripts summary to find the one you need. For example, with Chamfer++ matching, run

Code
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_FEATURE_DIR=<your_path>
ROOT_REPO_DIR=<your_repo_dir>

MATCHING_NAME=chamfer++
DATASET=ssv2
SHOT=1
SEED=1
LR=0.001  # hyper parameter tuned with optuna
GLOBAL_TEMPERATURE=100  # hyper parameter tuned with optuna
TEMPERATURE_WEIGHT=0.1  # hyper parameter tuned with optuna

TRAIN_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/train
VAL_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/val
TEST_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/test

MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}
CHECKPOINT_DIR_TRAIN=${CHECKPOINT_DIR}/${MODEL_NAME}
rm -r ${CHECKPOINT_DIR_TRAIN}

TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment

python run_matching.py \
--dataset_name ${DATASET} \
--tasks_per_batch 1 \
--num_gpus 1 \
--num_workers 1 \
--shot ${SHOT} \
--way 5 \
--query_per_class 1 \
--num_test_tasks 10000 \
--num_val_tasks 10000 \
-c ${CHECKPOINT_DIR_TRAIN} \
--train_split_dir ${TRAIN_FEATURE_DIR} \
--val_split_dir ${VAL_FEATURE_DIR} \
--test_split_dir ${TEST_FEATURE_DIR} \
--train_seed ${SEED} \
--val_seed ${SEED} \
--test_seed 1 \
--seed ${SEED} \
-lr ${LR} \
--matching_global_temperature ${GLOBAL_TEMPERATURE} \
--matching_global_temperature_fixed \
--matching_temperature_weight ${TEMPERATURE_WEIGHT} \
--backbone r2+1d_fc \
--feature_projection_dimension 1152 \
--method matching-based \
--matching_function chamfer \
--video_to_class_matching joint \
--clip_tuple_length 3

Scripts summary

The following Table recaps the scripts for evaluating and training the following models:

  • our method: Chamfer++
  • prior work:
  • useful baselines:
    • mean
    • max
    • diagonal
    • linear
Table
Matching method Evaluation on saved episodes Evaluation, general case Training
tsl from_episodes N/A N/A
mean from_episodes from_loader train
max from_episodes from_loader train
chamfer++ from_episodes from_loader train
diagonal from_episodes from_loader train
linear from_episodes from_loader train
otam from_episodes from_loader train
trx from_episodes from_loader train
visil from_episodes from_loader train

Citation

Coming soon.

References

[1] Xian et al. Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation

[2] Perrett et al. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

[3] Cao et al. Few-Shot Video Classification via Temporal Alignment

[4] Kordopatis-Zilos et al. ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages