# CaptionQA Quickstart

Welcome to the CaptionQA quickstart notebook. This guide helps you verify your development environment, explore the dataset utilities bundled with the repository, and establish a baseline workflow for panoramic captioning + QA research.

## 1. Environment setup

CaptionQA uses [uv](https://docs.astral.sh/uv/) for dependency management and targets Python 3.10+. On Windows 11 PowerShell, the recommended bootstrap sequence is:

```powershell
winget install --id Astral.Uv -e
uv venv captionqa
.\captionqa\Scripts\Activate.ps1
uv pip install --editable .
```

If you are working on macOS or Linux, the commands are identical except for the activation step (`source captionqa/bin/activate`).

> **Tip:** Ensure that FFmpeg is installed and available on your `PATH` before attempting to process audio/video assets.

Run the following cell to confirm the Python version and virtual environment information inside the notebook kernel.

In [5]:
import os
import platform
import sys

print(f'Python version: {sys.version.split()[0]}')
print(f'Executable: {sys.executable}')
print(f'Platform: {platform.platform()}')
print(f'Virtual env: {os.environ.get("VIRTUAL_ENV", "<none>")}')

Python version: 3.10.1
Executable: c:\Users\willj\Documents\Coding Projects\CaptionQA\captionqa\Scripts\python.exe
Platform: Windows-10-10.0.26100-SP0
Virtual env: C:\Users\willj\Documents\Coding Projects\CaptionQA\captionqa


If the `Virtual env` field shows `<none>`, activate the `captionqa` environment (or your preferred venv) and restart the Jupyter kernel before proceeding.

## 2. Dataset utilities

The `captionqa.data.download` module centralizes dataset metadata and provides a command-line interface. The next cell lists all datasets currently configured. This is a safe operation that does **not** download any data.

In [6]:
from pathlib import Path
from captionqa.data import DATASETS

DATASET_ROOT = Path('datasets').expanduser().resolve()
print(f'Dataset root: {DATASET_ROOT}')
print('\nAvailable datasets:')
for name, task in DATASETS.items():
    print(f'{name:<10s} - {task.description}')

Dataset root: C:\Users\willj\Documents\Coding Projects\CaptionQA\notebooks\datasets

Available datasets:
360x       - Panoramic video dataset with scene descriptions, action labels, and binaural audio.
360dvd     - Dense 360° video understanding dataset for video-language modeling.
leader360v - Large-scale 360° dataset for object tracking and viewpoint-aware understanding.
360sr      - Static panoramic scene classification dataset for spatial scene context models.
avqa       - Audio-visual question answering dataset repository with preprocessing utilities.


  from .autonotebook import tqdm as notebook_tqdm


Use the CLI from a terminal to download assets once you have granted Hugging Face access where required:

```bash
python -m captionqa.data.download --list
python -m captionqa.data.download 360x --output datasets --dry-run
python -m captionqa.data.download leader360v --output datasets
```

The `--dry-run` option prints the operations without performing any downloads, which is useful for verifying credentials and paths.

## 3. Inspecting a downloaded dataset

After downloading, you can explore the file structure programmatically. The example below demonstrates how to enumerate the top-level contents of the 360x dataset. If the dataset is not yet downloaded, the code will notify you.

In [7]:
from itertools import islice

dataset_root = globals().get('DATASET_ROOT')
if dataset_root is None:
    from pathlib import Path
    dataset_root = Path('datasets').expanduser().resolve()

hr_root = dataset_root / '360x' / '360x_dataset_HR'
if hr_root.exists():
    entries = sorted(hr_root.iterdir())
    print(f'Found {len(entries)} items under {hr_root}:')
    for path in islice(entries, 10):
        print(' -', path.relative_to(hr_root))
    if len(entries) > 10:
        print('...')
else:
    print(f'360x high-resolution dataset not found at {hr_root}. Run the downloader once you have access.')

360x high-resolution dataset not found at C:\Users\willj\Documents\Coding Projects\CaptionQA\notebooks\datasets\360x\360x_dataset_HR. Run the downloader once you have access.


Repeat the pattern for other datasets—adjust the root path and traversal depth depending on the structure (archives vs. Git repositories).

## 4. Next steps

* Prototype captioning models using your framework of choice (e.g., PyTorch + Hugging Face Transformers).
* Ingest panoramic video clips and convert them into frame sequences or features suitable for model training.
* Integrate QA annotations by aligning temporal segments with generated captions.

This notebook will continue to evolve as the project matures. Feel free to add exploratory experiments, preprocessing utilities, and evaluation routines in subsequent sections.