PointTAD [NeurIPS 2022]

This repo holds the codes of paper: "PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points", which is accepted in NeurIPS 2022.

[Paper Link] [Zhihu]

News

[Jan. 10, 2023] Fixed some bugs and typos; updated best checkpoints for both multi-label benchmarks.

[Dec. 13, 2022] We release the codes and checkpoints on MultiTHUMOS and Charades.

Overview

This paper presents a query-based framework for multi-label temporal action detection, namely PointTAD, that leverages a set of learnable query points to handle both boundary frames and action semantic keyframes for finer action representation. Our model takes RGB input only and streamlines an end-to-end trainable framework for easy deployment. PointTAD surpasses previous multi-label TAD works by a large margin under detection-mAP and achieves comparable results under segmentation-mAP.

Dependencies
Data Preparation
Checkpoints
Testing
Training

Dependencies

PyTorch 1.8.1 or higher, opencv-python, scipy, terminaltables, ruamel-yaml, ffmpeg

pip install -r requirements.txt to install dependencies.

Data Preparation

To prepare the RGB frames and corresponding annotations,

Clone the repository and cd PointTAD; mkdir data
For MultiTHUMOS:
- Download the raw videos of THUMOS14 from here and put them into /data/thumos14_videos;
- Extract the RGB frames from raw videos using util/extract_frames.py. The frames will be placed in /data/multithumos_frames;
- You also need to generate multithumos_frames.json for the extracted frames with /util/generate_frame_dict.py and put the json file into /datasets folder.
For Charades:
- Download the RGB frames of Charades from here , and place the frames at /data/charades_v1_rgb.
Replace the frame folder path or image tensor path in /datasets/dataset_cfg.yml.

The structure of data/ is displayed as follows:

|-- data
|   |-- thumos14_videos
|   |   |-- training
|   |   |-- testing
|   |-- multithumos_frames
|   |   |-- training
|   |   |-- testing
|   |-- charades_v1_rgb

[Optional] Once you had the raw frames, you can convert them into tensors with /util/frames2tensor.py to speed up IO. By enabling --img_tensor in train.sh and test.sh, the model takes in image tensors instead of frames.

Checkpoints

The best checkpoint is provided in the link below. We provide an error bar for each benchmark in the supplementary material of our paper.

Dataset	mAP@0.2	mAP@0.5	mAP@0.7	Avg-mAP	Checkpoint
MultiTHUMOS	39.70%	24.90%	12.04%	23.46%	Link
Charades	17.45%	13.46%	9.14%	12.13%	Link

Testing

Use test.sh to evaluate,

MultiTHUMOS:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos --eval --load multithumos_best.pth

Charades:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades --eval --load charades_best.pth

Training

Use train.sh to train PointTAD,

MultiTHUMOS:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset multithumos

Charades:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=11302 --use_env main.py --dataset charades

Acknowledgements

The codebase is built on top of RTD-Net, DETR, Sparse R-CNN, AFSD and E2ETAD, we thank them for providing useful codes.

Citations

If you think our work is useful, please feel free to cite our paper:

@inproceedings{
	tan2022pointtad,
	title={Point{TAD}: Multi-Label Temporal Action Detection with Learnable Query Points},
	author={Jing Tan and Xiaotong Zhao and Xintian Shi and Bin Kang and Limin Wang},
	booktitle={Advances in Neural Information Processing Systems},
	editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
	year={2022},
	url={https://openreview.net/forum?id=_r8pCrHwq39}
}

Contacts

Jing Tan: jtan@smail.nju.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ckpt		ckpt
datasets		datasets
imgs		imgs
models		models
util		util
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.py		main.py
multithumos_dense.ipynb		multithumos_dense.ipynb
requirements.txt		requirements.txt
start.sh		start.sh
test.sh		test.sh
train.sh		train.sh

License

MCG-NJU/PointTAD

Folders and files

Latest commit

History

Repository files navigation

PointTAD [NeurIPS 2022]

News

Overview

Dependencies

Data Preparation

Checkpoints

Testing

Training

Acknowledgements

Citations

Contacts

About

Topics

Resources

License

Stars

Watchers

Forks

Languages