# CaptionQA Quickstart

Welcome to the CaptionQA quickstart notebook. This guide helps you verify your development environment, explore the dataset utilities bundled with the repository, and establish a baseline workflow for panoramic captioning + QA research.

## 1. Environment setup

CaptionQA uses [uv](https://docs.astral.sh/uv/) for dependency management and targets Python 3.10+. On Windows 11 PowerShell, the recommended bootstrap sequence is:

```powershell
winget install --id Astral.Uv -e
uv venv captionqa
.\captionqa\Scripts\Activate.ps1
uv pip install --editable .
```

If you are working on macOS or Linux, the commands are identical except for the activation step (`source captionqa/bin/activate`).

> **Tip:** Ensure that FFmpeg is installed and available on your `PATH` before attempting to process audio/video assets.

Run the following cell to confirm the Python version and virtual environment information inside the notebook kernel.

In [9]:
import os
import platform
import sys

print(f'Python version: {sys.version.split()[0]}')
print(f'Executable: {sys.executable}')
print(f'Platform: {platform.platform()}')
print(f'Virtual env: {os.environ.get("VIRTUAL_ENV", "<none>")}')

Python version: 3.10.1
Executable: c:\Users\willj\Documents\Coding Projects\CaptionQA\captionqa\Scripts\python.exe
Platform: Windows-10-10.0.26100-SP0
Virtual env: C:\Users\willj\Documents\Coding Projects\CaptionQA\captionqa


If the `Virtual env` field shows `<none>`, activate the `captionqa` environment (or your preferred venv) and restart the Jupyter kernel before proceeding.

## 2. Dataset utilities

The `captionqa.data.download` module centralizes dataset metadata and provides a command-line interface. The next cell lists all datasets currently configured. This is a safe operation that does **not** download any data.

In [12]:
from pathlib import Path
import os
from captionqa.data import DATASETS

# Resolve the dataset root to the repository-level 'datasets' directory by default.
# Priority: 1) CAPTIONQA_DATASETS env var  2) repo root discovery  3) current cwd fallback
env_root = os.environ.get('CAPTIONQA_DATASETS')
if env_root:
    DATASET_ROOT = Path(env_root).expanduser().resolve()
else:
    cwd = Path.cwd()
    repo_root = None
    for p in [cwd, *cwd.parents]:
        if (p / 'pyproject.toml').exists() or (p / '.git').exists():
            repo_root = p
            break
    DATASET_ROOT = ((repo_root or cwd) / 'datasets').resolve()

print(f'Dataset root: {DATASET_ROOT}')
print('\nAvailable datasets:')
for name, task in DATASETS.items():
    print(f'{name:<10s} - {task.description}')

Dataset root: D:\CaptionQA\data

Available datasets:
360x       - Panoramic video dataset with scene descriptions, action labels, and binaural audio.
360dvd     - Dense 360° video understanding dataset for video-language modeling.
leader360v - Large-scale 360° dataset for object tracking and viewpoint-aware understanding.
360sr      - Static panoramic scene classification dataset for spatial scene context models.
avqa       - Audio-visual question answering dataset repository with preprocessing utilities.


Use the CLI from a terminal to download assets once you have granted Hugging Face access where required:

```bash
python -m captionqa.data.download --list
python -m captionqa.data.download 360x --output datasets --dry-run
python -m captionqa.data.download leader360v --output datasets
```

The `--dry-run` option prints the operations without performing any downloads, which is useful for verifying credentials and paths.

## 3. Inspecting a downloaded dataset

After downloading, you can explore the file structure programmatically. The example below demonstrates how to enumerate the top-level contents of the 360x dataset. If the dataset is not yet downloaded, the code will notify you.

In [13]:
from itertools import islice
from pathlib import Path
import os

# Resolve dataset root in this priority order:
# 1) DATASET_ROOT defined earlier in the notebook
# 2) CAPTIONQA_DATASETS environment variable (e.g., set to D:\CaptionQA\data)
# 3) 'datasets' at the repository root (walk up to find pyproject.toml/.git)
dataset_root = globals().get('DATASET_ROOT')
if dataset_root is None:
    env_root = os.environ.get('CAPTIONQA_DATASETS')
    if env_root:
        dataset_root = Path(env_root).expanduser().resolve()
    else:
        cwd = Path.cwd()
        repo_root = None
        for p in [cwd, *cwd.parents]:
            if (p / 'pyproject.toml').exists() or (p / '.git').exists():
                repo_root = p
                break
        dataset_root = ((repo_root or cwd) / 'datasets').resolve()

print(f'Using dataset root: {dataset_root}')

hr_root = dataset_root / '360x' / '360x_dataset_HR'
if hr_root.exists():
    entries = sorted(hr_root.iterdir())
    print(f'Found {len(entries)} items under {hr_root}:')
    for path in islice(entries, 10):
        print(' -', path.relative_to(hr_root))
    if len(entries) > 10:
        print('...')
else:
    print(f'360x high-resolution dataset not found at {hr_root}. If you have access, run:')
    print('  python -m captionqa.data.download 360x --output', dataset_root)

Using dataset root: D:\CaptionQA\data
Found 5 items under D:\CaptionQA\data\360x\360x_dataset_HR:
 - .cache
 - .gitattributes
 - binocular
 - README.md
 - TAL_annotations


Repeat the pattern for other datasets—adjust the root path and traversal depth depending on the structure (archives vs. Git repositories).

In [None]:
# Resolve dataset root (reuse prior logic)
dataset_root = globals().get('DATASET_ROOT')
if dataset_root is None:
    env_root = os.environ.get('CAPTIONQA_DATASETS')
    if env_root:
        dataset_root = Path(env_root).expanduser().resolve()
    else:
        cwd = Path.cwd()
        repo_root = None
        for p in [cwd, *cwd.parents]:
            if (p / 'pyproject.toml').exists() or (p / '.git').exists():
                repo_root = p
                break
        dataset_root = ((repo_root or cwd) / 'datasets').resolve()

def show_dir(root: Path, name: str, limit: int = 10):
    print(f'\n{name} root: {root}')
    if not root.exists():
        print(f'[missing] {name} not found. If you have access, run:')
        print('  python -m captionqa.data.download', name.lower(), '--output', dataset_root)
        return
    entries = sorted(root.iterdir())
    print(f'Found {len(entries)} items under {root}:')
    for path in islice(entries, limit):
        print(' -', path.name)
    if len(entries) > limit:
        print('...')

# 360DVD (archive extracted into the dataset directory)
dvd_root = dataset_root / '360dvd'
show_dir(dvd_root, '360dvd')

# Leader360V (HF snapshot under Leader360V subfolder)
leader_root = dataset_root / 'leader360v' / 'Leader360V'
show_dir(leader_root, 'leader360v')

# 360SR (Google Drive folder under 360SR-Challenge)
sr_root = dataset_root / '360sr' / '360SR-Challenge'
show_dir(sr_root, '360sr')

# AVQA (git repo under AVQA folder)
avqa_root = dataset_root / 'avqa' / 'AVQA'
show_dir(avqa_root, 'avqa')


360dvd root: D:\CaptionQA\data\360dvd
Found 1 items under D:\CaptionQA\data\360dvd:
 - 360DVD_dataset.zip

leader360v root: D:\CaptionQA\data\leader360v\Leader360V
Found 8 items under D:\CaptionQA\data\leader360v\Leader360V:
 - .cache
 - .gitattributes
 - assets
 - README.md
 - Sample1
 - Sample2
 - Sample3
 - Sample4

360sr root: D:\CaptionQA\data\360sr\360SR-Challenge
Found 5 items under D:\CaptionQA\data\360sr\360SR-Challenge:
 - 360SR Challenge Results -- Video.xlsx
 - archives
 - NTIRE 2023_ 360° Omnidirectional Super Resolution - Track 1 Image.xlsx
 - Ntire2023-Flickr360
 - Ntire2023-ODV360

avqa root: D:\CaptionQA\data\avqa\AVQA
Found 7 items under D:\CaptionQA\data\avqa\AVQA:
 - .git
 - .gitignore
 - data
 - HAVF
 - pics
 - preprocess
 - README.md
