Skip to content

hanzif1/HM-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Multimodal Long Video Reasoning via Hierarchical Multi-Agent Retrieval-Augmented Generation

Jisheng Dang1, Quan Wan1, Dewei Liu1, Ziyue Wang1, Bimei Wang, Pei Liu3, Hong Peng1, Bin Hu1, Tat-Seng Chua4

1Lanzhou University, 2The Hong Kong University of Science and Technology, 3The Hong Kong University of Science and Technology, 4National University of Singapore

TL;DR: We propose a Hierarchical Multi-Agent RAG framework that decomposes complex video queries, retrieves external knowledge, and aggregates answers for robust video understanding.


VideoRAG addresses the limitations of current LLMs in video-text alignment and long-horizon reasoning. By coordinating specialized agents hierarchically, our framework effectively fuses internal temporal understanding with external knowledge retrieval.

Framework Overview

๐Ÿค– Core Agents

Our approach consists of three specialized agents working in synergy:

  1. Question Decomposition Agent: Reformulates complex/ambiguous queries into structured sub-tasks.
  2. Multi-source Reasoning Agents:
    • Web Agent: Retrieves external open-world knowledge.
    • Memory-based Agent: Captures long-range temporal dependencies within videos.
  3. Answer Aggregation Agent: Synthesizes results, resolves contradictions, and generates the final prediction.

๐Ÿ”ฅ News

  • 2025.12.3 ๐Ÿšง Initial release of the multi-agent framework.

๐Ÿ† Performance on Benchmarks

We compare our method with state-of-the-art models across four challenging benchmarks. Despite having fewer parameters (2B), our framework achieves the best performance across all metrics.

Method Size Acc@MME Acc@QA Acc@MVB Acc@MLVU
FrozenBiLM 1.2B 32.5 48.6 31.0 -
Video-ChatGPT 7B 38.5 55.2 33.8 39.4
Otter 9B 45.3 59.1 40.5 41.2
mPLUG-Owl 7B 48.6 56.5 51.4 46.2
MovieChat 7B 46.5 58.2 46.8 48.1
LLaMA-VID 7B 42.1 57.8 41.3 43.5
TinyLLaVA 3B 44.2 58.1 45.5 44.8
LLaVA-Phi 2.7B 42.5 56.4 43.1 41.2
ST-LLM 7B 50.1 59.6 51.9 49.8
VILA-2.7B 2.7B 48.9 60.5 49.2 51.0
Ours 2B 53.26 66.62 52.8 62.3

Todo

  1. Release the code

๐Ÿ› ๏ธ Installation

  1. Clone the repository
git clone https://github.com/hanzif1/videoRAG.git
cd videoRAG

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors