StreamingCoT

Overview

StreamingCoT is the first dataset explicitly designed for temporally evolving reasoning in streaming Video Question Answering (VideoQA) and multimodal Chain-of-Thought (CoT) tasks. Addressing critical limitations in current VideoQA benchmarks, StreamingCoT features:

Dynamic temporal understanding: Captures evolving answers in video streams
Explicit reasoning chains: Provides annotated multimodal reasoning paths
Temporal dependency modeling: Tracks semantic evolution across video timelines
Spatiotemporal grounding: Links reasoning steps to visual evidence

This dataset establishes a new foundation for research in streaming video understanding, complex temporal reasoning, and multimodal inference.

Key Features

🎥 Curated Video Corpus

5,745 high-quality short videos (≤60 seconds)
Global representation through stratified geographic sampling
Rigorous multimodal filtering:
- Social validation (>5,000 interactions)
- Lexical density constraints
- HD resolution (≥720p)
- Motion dynamics analysis
- Aesthetic scoring (≥7/10)

⏱️ Hierarchical Temporal Annotation

Per-second dense captions aligned with visual content
Adaptive temporal segmentation via Dynamic Semantic Fusion (DSF)
Context-aware narration generation with inter-segment coherence
Expert-validated semantic completeness and temporal alignment

❓ Dynamic QA Construction

6 specialized question types:
1. Cumulative counting
2. Periodic pattern recognition
3. Sequential step recognition
4. State duration measurement
5. Object state recognition
6. Clue-revealing responses
Distractor-aware option design targeting temporal misperceptions
Human-verified temporal consistency and answer validity

🧠 Multimodal Chain-of-Thought

Spatiotemporally grounded reasoning chains:
1. Temporally-aware CoT initialization
2. Key object extraction and spatial grounding
3. Multimodal reasoning fusion
Iterative human validation protocol ensuring:
- Spatiotemporal consistency
- Temporal causality
- Evidence completeness
- Answer derivation soundness

Dataset Structure

StreamingCoT/
├── bbox/                    # Per-second bounding box annotations
│   └── VIDEO_ID/            # Directory per video (YouTube ID)
│       ├── sec_0_idx_48.json  # Bounding boxes at second 0 (frame 48)
│       ├── sec_1_idx_17.json  # Second 1 annotations
│       └── ...              
├── final_cot/               # Verified reasoning chains
│   ├── VIDEO_ID.jsonl       # Final CoT in JSON Lines format
│   └── ...                  
├── initial_cot/             # Preliminary reasoning chains
│   ├── VIDEO_ID.jsonl       # Initial CoT annotations
│   └── ...                  
└── key_frames/              # Temporally significant frames
    └── VIDEO_ID/            # Directory per video
        └── metadata.json    # Key frame positions and features

Construction Pipeline

Our hierarchical annotation framework:

Video Collection & Filtering
YouTube API → Geographic balancing → Multimodal quality screening
Hierarchical Captioning
Per-second captioning → Dynamic segmentation → Context-aware narration
Dynamic QA Generation
Question typing → Distractor design → Temporal realignment
Multimodal CoT Synthesis
Keyframe selection → Object grounding → Reasoning fusion
Iterative Validation
Expert verification → Error taxonomy → Corrective regeneration

Applications

StreamingCoT enables research in:

Temporal reasoning in video understanding
Multimodal chain-of-thought development
Streaming video question answering
Spatiotemporally grounded inference
Dynamic distractor analysis
Video-based logical deduction systems

Access

The StreamingCoT dataset and construction toolkit are available at:
https://anonymous.4open.science/

License

StreamingCoT is released for non-commercial research purposes. All videos are sourced from YouTube and remain subject to original content creators' rights. Users must comply with YouTube's Terms of Service.

Citation

@article{streamingcot2024,
  title={StreamingCoT: Advancing Temporal Reasoning in VideoQA through Dynamic Multimodal Chain-of-Thought},
  author={Anonymous},
  journal={Submitted to Preprint},
  year={2024},
  note={Dataset available at \url{https://anonymous.4open.science/}}
}

Contact

For dataset inquiries, please open an issue on our repository or contact the maintainers through the anonymous submission portal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamingCoT

Overview

Key Features

🎥 Curated Video Corpus

⏱️ Hierarchical Temporal Annotation

❓ Dynamic QA Construction

🧠 Multimodal Chain-of-Thought

Dataset Structure

Construction Pipeline

Applications

Access

License

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bbox		bbox
final_cot		final_cot
initial_cot		initial_cot
key_frames		key_frames
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

StreamingCoT

Overview

Key Features

🎥 Curated Video Corpus

⏱️ Hierarchical Temporal Annotation

❓ Dynamic QA Construction

🧠 Multimodal Chain-of-Thought

Dataset Structure

Construction Pipeline

Applications

Access

License

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages