Skip to content

mondalanindya/MSQNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MSQNet

Official implementation of "Actor-agnostic Multi-label Action Recognition with Multi-modal Query", accepted at ICCV Workshops 2023.

Authors

Anindya Mondal*, Sauradip Nag*, Joaquin M Prada, Xiatian Zhu, Anjan Dutta*.

[CVF Open Access] [Poster] [ArXiv] [Video]

Leaderboard

PWC PWC PWC PWC PWC PWC PWC PWC

Abstract

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. poster

Implementation

Visit this folder for implementation details.

If you find our work useful, please consider citing:


@InProceedings{Mondal_2023_ICCV,
    author    = {Mondal, Anindya and Nag, Sauradip and Prada, Joaquin M and Zhu, Xiatian and Dutta, Anjan},
    title     = {Actor-Agnostic Multi-Label Action Recognition with Multi-Modal Query},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2023},
    pages     = {784-794}
}