Post-Processing Temporal Action Detection

Sauradip Nag^1,2,+ Xiatian Zhu^1,3 Yi-Zhe Song^1,2 Tao Xiang^1,2

¹CVSSP, University of Surrey, UK ²iFlyTek-Surrey Joint Research Center on Artificial Intelligence, UK
³Surrey Institute for People-Centred Artificial Intelligence, UK

⁺corresponding author

Accepted to CVPR 2023

Paper | Project Page | Slides | Poster

GAP.mp4

Updates

(June, 2023) We released GAP inference code in iPython notebook for all dataset.
(Mar, 2023) GAP is accepted by CVPR 2023.

Summary

First Non-Learnable Refinement module for Temporal Action Detection.
Gaussian based Post-Processing for action start and end points
Taylor expansion helps the model solve the boundary ambiguity issue at sub-snippet level.
Can be used as a plug-and-play module for both training and inference.

Abstract

Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolution downsampling and recovery. This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary inference at a sub-snippet level. We further introduce an efficient Taylor-expansion based approximation, dubbed as Gaussian Approximated Post-processing (GAP). Extensive experiments demonstrate that our GAP can consistently improve a wide variety of pre-trained off-the-shelf TAD models on the challenging ActivityNet (+0.2% -0.7% in average mAP) and THUMOS (+0.2% -0.5% in average mAP) benchmarks. Such performance gains are already significant and highly comparable to those achieved by novel model designs. Also, GAP can be integrated with model training for further performance gain. Importantly, GAP enables lower temporal resolutions for more efficient inference, facilitating low-resource applications.

Architecture

Getting Started

Requirements

Python 3.7
PyTorch == 1.9.0 (Please make sure your pytorch version is atleast 1.8)
NVIDIA GPU

How to use

Just past the code ins cript to any inference file of standard Temporal Action Detection inference code and it should work.

TO-DO Checklist

Create a implemented version on BMN
Use GAP at training code

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{nag2023post,
  title={Post-Processing Temporal Action Detection},
  author={Nag, Sauradip and Zhu, Xiatian and Song, Yi-Zhe and Xiang, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18837--18845},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
GAP.ipynb		GAP.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Post-Processing Temporal Action Detection

Accepted to CVPR 2023

Paper | Project Page | Slides | Poster

Updates

Summary

Abstract

Architecture

Getting Started

Requirements

How to use

TO-DO Checklist

Citation

About

Releases

Packages

Languages

sauradip/GAP

Folders and files

Latest commit

History

Repository files navigation

Post-Processing Temporal Action Detection

Accepted to CVPR 2023

Paper | Project Page | Slides | Poster

Updates

Summary

Abstract

Architecture

Getting Started

Requirements

How to use

TO-DO Checklist

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages