ETAD: A Unified Framework for Efficient Temporal Action Detection

This repo holds the official implementation of paper: "ETAD: A Unified Framework for Efficient Temporal Action Detection", which is accepted in CVPR workshop 2023.

Temporal action detection (TAD) with end-to-end training often suffers from the pain of huge demand for computing resources due to long video duration. In this work, we propose an efficient temporal action detector (ETAD) that can train directly from video frames with extremely low GPU memory consumption. Our main idea is to minimize and balance the heavy computation among features and gradients in each training iteration. We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder. To further alleviate the computational redundancy in training, we propose to dynamically sample only a small subset of proposals during training. Moreover, various sampling strategies and ratios are studied for both the encoder and detector. ETAD achieves state-of-the-art performance on TAD benchmarks with remarkable efficiency. On ActivityNet-1.3, training ETAD in 18 hours can reach 38.25% average mAP with only 1.3 GB memory consumption per video under end-to-end training.

Updates

12/03/2023: We have released our code and pretrained models for the ActivityNet experiments.

Installation

Step 1. Clone the repository

git clone git@github.com:sming256/ETAD.git
cd ETAD

Step 2. Install PyTorch=2.0.1, Python=3.10.12, CUDA=11.8

conda create -n etad python=3.10.12
source activate etad
conda install pytorch=2.0.1 torchvision=0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia

Step 3. Install mmaction2 for end-to-end training

pip install openmim
mim install mmcv==2.0.1
mim install mmaction2==1.1.0
pip install numpy==1.23.5

To Reproduce Our Results on ActivityNet 1.3

End-to-End Experiment

Download the ActivityNet videos

Note that we are not allowed to redistribute the videos without license agreement. You can download the activitynet raw videos from official website.
We downsample the videos to 15 fps and resize the shorter side to 256. If you find it's hard to prepare the videos, you can send an email to shuming.liu@kaust.edu.sa to get the videos under license agreements.
Change the VIDEO_PATH to the path of your videos.

Download the backbone weights

Download the pretrained weights for R(2+1)D backbone and move it to pretrained/r2plus1d_34-tsp_on_activitynet-max_gvf-backbone_lr_0.0001-fc_lr_0.002-epoch_5-0d2cf854.pth.

Training

python tools/train.py configs/anet/e2e_anet_tsp_snippet0.3.py 1
1 means using 1 gpu to train.
The end-to-end experiment takes 18 hours and no more than 10 GB memory for training.

Inference

python tools/test.py configs/anet/e2e_anet_tsp_snippet0.3.py 1
The testing takes around 45 mins.

Evaluation

python tools/post.py configs/anet/e2e_anet_tsp_snippet0.3.py

Feature-based Experiment

Download the TSP features

You can download TSP feature from ActionFormer, or directly from this Google drive.
Change the FEATURE_PATH to the path of your features.

Training

python tools/train.py configs/anet/feature_anet_tsp.py 1
The feature-based experiment is fast (6 mins in my workstation).

Testing and Evaluation

python tools/test.py configs/anet/feature_anet_tsp.py 1 && python tools/post.py configs/anet/feature_anet_tsp.py

Pretrained Models

You can download the pretrained models in this link. If you want to do inference with our checkpoint, you can simply run

python tools/test.py configs/anet/e2e_anet_tsp_snippet0.3.py 1 --checkpoint e2e_anet_snippet0.3_bs4_92e98.pth.tar
python tools/post.py configs/anet/e2e_anet_tsp_snippet0.3.py

The results on ActivityNet (with CUHK classifier) should be

mAP at tIoUs	0.5	0.75	0.95	Avg
ETAD - TSP - Feature	54.96	39.06	9.21	37.80
ETAD - TSP - E2E	56.22	39.93	10.23	38.73

You can also download our logs, and results from Google Drive.

Contact

If you have any questions about our work, please contact Shuming Liu (shuming.liu@kaust.edu.sa).

References

If you are using our code, please consider citing our paper.

@inproceedings{liu2023etad,
  title={ETAD: Training Action Detection End to End on a Laptop},
  author={Liu, Shuming and Xu, Mengmeng and Zhao, Chen and Zhao, Xu and Ghanem, Bernard},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4524--4533},
  year={2023}
}

If you are using TSP features, please cite

@inproceedings{alwassel2021tsp,
  title={{TSP}: Temporally-sensitive pretraining of video encoders for localization tasks},
  author={Alwassel, Humam and Giancola, Silvio and Ghanem, Bernard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops},
  pages={3173--3183},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs/anet		configs/anet
lib		lib
tools		tools
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETAD: A Unified Framework for Efficient Temporal Action Detection

Updates

Installation

To Reproduce Our Results on ActivityNet 1.3

End-to-End Experiment

Feature-based Experiment

Pretrained Models

Contact

References

About

Releases 1

Packages

Languages

sming256/ETAD

Folders and files

Latest commit

History

Repository files navigation

ETAD: A Unified Framework for Efficient Temporal Action Detection

Updates

Installation

To Reproduce Our Results on ActivityNet 1.3

End-to-End Experiment

Feature-based Experiment

Pretrained Models

Contact

References

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages