Skip to content

sangho-vision/avbert

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

Parameter Efficient Multimodal Transformers for Video Representation Learning

This repository contains the code and models for our ICLR 2021 paper:

Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song
[paper] [poster] [slides]

@inproceedings{lee2021avbert,
    title="{Parameter Efficient Multimodal Transformers for Video Representation Learning}",
    author={Sangho Lee and Youngjae Yu and Gunhee Kim and Thomas Breuel and Jan Kautz and Yale Song},
    booktitle={ICLR},
    year=2021
}

System Requirements

  • Python >= 3.7.6
  • FFMpeg 4.3.1
  • CUDA >= 10.1 supported GPUs with at least 24GB memory

Installation

  1. Install PyTorch 1.6.0, torchvision 0.7.0 and torchaudio 0.6.0 for your environment. Follow the instructions in HERE.

  2. Install other required packages.

pip install -r requirements.txt

Download Data

python download_ucf101.py
python download_esc50.py
python download_ks.py
python download_checkpoint.py

Experiments

To run experiments with a single GPU.

UCF101 (split: 1, 2 or 3)

cd code
python run_net.py \
    --cfg_file configs/ucf101/config.yaml \
    --configuration ucf101 \
    --pretrain_checkpoint_path checkpoints/checkpoint.pyth \
    TRAIN.DATASET_SPLIT <split>
    TEST.DATASET_SPLIT <split>

ESC-50 (split: 1, 2, 3, 4 or 5)

cd code
python run_net.py \
    --cfg_file configs/esc50/config.yaml \
    --configuration esc50 \
    --pretrain_checkpoint_path checkpoints/checkpoint.pyth \
    TRAIN.DATASET_SPLIT <split>
    TEST.DATASET_SPLIT <split>

Kinetics-Sounds

cd code
python run_net.py \
    --cfg_file configs/kinetics-sounds/config.yaml \
    --configuration kinetics-sounds \
    --pretrain_checkpoint_path checkpoints/checkpoint.pyth

After submission, we further adjusted hyperparameters and achieved the following results.

Dataset Top-1 Accuracy Top-5 Accuracy
UCF101 87.5 97.4
ESC-50 85.9 96.9
Kinetis-Sounds 85.8 97.8

Acknowledgments

This source code is based on PySlowFast.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages