# **360x Dataset EDA**

Quick exploratory checks on the 360x panoramic dataset: verify structure, inspect sample entries, and capture basic summary stats.

### Dataset context
- The [360+x project page](https://x360dataset.github.io/) describes the dataset as a **panoptic multi-modal scene understanding** corpus with 2,152 videos (8.579M frames) captured using 360° and Spectacles cameras across 17 cities in 5 countries, covering 28 scene categories spanning indoor and outdoor environments.
- Each video carries **temporal activity localization for 38 action classes**, synchronized **binaural audio** (with published delay statistics), and aligned third-person, panoramic, and binocular viewpoints to encourage cross-modal analysis.
- The downloadable packages on Hugging Face include both **high-resolution assets** (panoramic 5760×2880, binocular 2432×1216, front-view 1920×1080) and **lower-resolution variants** (panoramic/binocular 640×320, front-view 569×320), with shared JSON indices, class maps, and per-clip activity segmentation metadata.


## 1. Resolve dataset root

Uses the same resolution logic as the quickstart notebook: prefer `CAPTIONQA_DATASETS`, otherwise search upward for the repository root and fall back to `./datasets`.

In [1]:
from pathlib import Path
from itertools import islice
import json
import os

def resolve_dataset_root() -> Path:
    env_root = os.environ.get('CAPTIONQA_DATASETS')
    if env_root:
        return Path(env_root).expanduser().resolve()
    cwd = Path.cwd()
    for candidate in [cwd, *cwd.parents]:
        if (candidate / 'pyproject.toml').exists() or (candidate / '.git').exists():
            return (candidate / 'datasets').resolve()
    return (cwd / 'datasets').resolve()

DATASET_ROOT = resolve_dataset_root()
print(f'Dataset root: {DATASET_ROOT}')
HR_ROOT = DATASET_ROOT / '360x' / '360x_dataset_HR'
LR_ROOT = DATASET_ROOT / '360x' / '360x_dataset_LR'
print(f'HR root: {HR_ROOT}')
print(f'LR root: {LR_ROOT}')

Dataset root: D:\CaptionQA\data
HR root: D:\CaptionQA\data\360x\360x_dataset_HR
LR root: D:\CaptionQA\data\360x\360x_dataset_LR


## 2. Validate presence & summarize

Run the next cell to verify the expected folder layout and capture a few file listings. The notebook handles missing data gracefully and prints next steps if the dataset is absent.

In [4]:
def describe_directory(root: Path, label: str, limit: int = 10):
    print(f'\n{label} -> {root}')
    if not root.exists():
        print('[missing] Not found. Download with: python -m captionqa.data.download 360x --output', DATASET_ROOT)
        return
    entries = sorted(root.iterdir())
    print(f'Total entries: {len(entries)}')
    for path in islice(entries, limit):
        print(' -', path.name)
    if len(entries) > limit:
        print(' ...')

describe_directory(HR_ROOT, 'High-resolution split')
describe_directory(LR_ROOT, 'Low-resolution split')


High-resolution split -> D:\CaptionQA\data\360x\360x_dataset_HR
Total entries: 5
 - .cache
 - .gitattributes
 - binocular
 - README.md
 - TAL_annotations

Low-resolution split -> D:\CaptionQA\data\360x\360x_dataset_LR
[missing] Not found. Download with: python -m captionqa.data.download 360x --output D:\CaptionQA\data


## 3. Sample metadata (optional)

If JSON metadata files are available, the next cell attempts to load one sample to inspect structure. Adjust the glob pattern if the dataset uses a different naming scheme.

In [5]:
def load_sample_metadata(root: Path, suffix: str = '.json'):
    if not root.exists():
        print('[skip] Root missing; nothing to load.')
        return
    for path in sorted(root.rglob(f'*{suffix}')):
        print(f'Previewing {path}')
        try:
            with path.open('r', encoding='utf-8') as handle:
                snippet = json.load(handle)
        except Exception as exc:
            print('Failed to parse JSON:', exc)
            return
        print(json.dumps(snippet, indent=2)[:2000])
        return
    print('[skip] No files matching suffix found.')

load_sample_metadata(HR_ROOT)
load_sample_metadata(LR_ROOT)

Previewing D:\CaptionQA\data\360x\360x_dataset_HR\TAL_annotations\019cc67f-512f-4b8a-96ef-81f806c86ce1.json
{
  "file": {
    "1": {
      "fid": "1",
      "fname": "360_panoramic.mp4",
      "type": 4,
      "loc": 1,
      "src": ""
    }
  },
  "metadata": {
    "1_1_1": {
      "duration": [
        5.516,
        7.3905
      ],
      "action": {
        "1": "sitting"
      }
    },
    "1_2_2": {
      "duration": [
        12.312,
        17.47384
      ],
      "action": {
        "1": "drinking"
      }
    },
    "1_2_3": {
      "duration": [
        51.417,
        56.46075
      ],
      "action": {
        "1": "drinking"
      }
    },
    "1_2_4": {
      "duration": [
        235.334,
        237.98142
      ],
      "action": {
        "1": "drinking"
      }
    },
    "1_3_5": {
      "duration": [
        9.558,
        116.33559
      ],
      "action": {
        "1": "speaking"
      }
    },
    "1_3_6": {
      "duration": [
        135.211,
        167.50226

## 4. Detailed roadmap 

### A. Data access & integrity audit
1. **Confirm environment authentication**
   - Verify `huggingface-cli whoami` succeeds so gated downloads remain accessible.
   - Mirror the final dataset root (HR vs LR) and capture the exact path in the notebook for reproducibility.
2. **Inventory archives & modality folders**
   - Programmatically list `panoramic/`, `binocular/`, `third_person/`, and `activity_segmentation/` directories; assert counts match published totals (2,152 full videos / 1,380 clips) once decompressed.
   - Parse `index.json` to extract per-item metadata (scene id, city, capture device, clip ids) and ensure all referenced files exist.
3. **Checksum / size validation (spot checks)**
   - Compute file sizes & optional hashes for a random stratified sample to confirm downloads are complete across modalities and resolutions.

### B. Metadata profiling
1. **Scene & geography coverage**
   - Load `index.json` into a DataFrame; summarize counts by `scene_category`, `city`, `country`, indoor/outdoor flag to verify the 28-scene, 17-city distribution.
   - Visualize distributions (bar charts, choropleth-ready tables) and highlight underrepresented categories.
2. **Action label analysis**
   - Iterate `activity_segmentation/*.json`; explode temporal segments to calculate frequency, duration, and co-occurrence of the 38 action classes.
   - Plot duration histograms and per-video action counts to reproduce/double-check dataset charts (e.g., number of actions per clip, time-of-day coverage).
3. **Clip segmentation review**
   - Confirm 1,380 ~10s clips by aggregating segment metadata, checking total duration (~244k seconds / 67.78 hours) against expectations.

### C. Video modality diagnostics
1. **Resolution & bitrate verification**
   - Use `ffprobe` (via `ffmpeg-python` or subprocess) on samples from each modality/resolution to confirm frame size, FPS, codec, audio channels.
   - Tabulate metrics to ensure panoramic videos retain 360° projection metadata (e.g., equirectangular tags).
2. **Temporal alignment checks**
   - For matching clip IDs across panoramic/front-view/binocular videos, compute start/end timestamps and verify synchronization with activity segments.
   - Overlay representative frames to inspect spatial correspondence between modalities.
3. **Quality spotlights**
   - Render thumbnails or short GIFs for a stratified sample (scene type × device) to visually inspect exposure, motion blur, and unique scenarios.

### D. Audio & binaural analysis
1. **Channel inspection**
   - Confirm audio streams are stereo/binaural; measure inter-channel delay statistics to compare with published histograms.
2. **Spectrogram profiling**
   - Generate Mel spectrograms for random clips to assess frequency coverage; store representative figures in the notebook.
3. **Cross-modal cues**
   - Correlate audio energy bursts with action segment timestamps to evaluate labeling quality.

### E. Feature & annotation validation
1. **Pre-computed feature parity**
   - If I3D/VGGish/ResNet-18 features are present, verify dimensionality and sample statistics; confirm number of feature files equals number of clips.
2. **Class mapping sanity checks**
   - Inspect `classes.json` to validate naming consistency between metadata and activity labels; flag missing or duplicate entries.

### F. Documentation & reproducibility
1. **Record assumptions & gaps**
   - Maintain a running log within the notebook capturing any anomalies (missing files, corrupted clips) and remediation steps.
2. **Outline next analytical directions**
   - Based on findings, prioritize deeper tasks (e.g., pose estimation feasibility, QA pair synthesis), linking them to concrete dataset evidence.


In [6]:
# A.1 Confirm environment authentication and capture dataset root
from huggingface_hub import HfApi
import os

try:
    info = HfApi().whoami()
    user = info.get('name') or info.get('email') or '<unknown>'
    print('Hugging Face auth: OK -', user)
except Exception as exc:
    print('Hugging Face auth: NOT AUTHENTICATED')
    print('Hint: run "huggingface-cli login" or set HF_TOKEN before downloads.')

print('CAPTIONQA_DATASETS =', os.environ.get('CAPTIONQA_DATASETS', '<unset>'))
print('Using dataset root =', DATASET_ROOT)
print('HR root =', HR_ROOT)
print('LR root =', LR_ROOT)

  from .autonotebook import tqdm as notebook_tqdm


Hugging Face auth: OK - Quiixotic
CAPTIONQA_DATASETS = <unset>
Using dataset root = D:\CaptionQA\data
HR root = D:\CaptionQA\data\360x\360x_dataset_HR
LR root = D:\CaptionQA\data\360x\360x_dataset_LR


### A.2 Inventory modality folders

In [7]:
from collections import defaultdict

def inventory_videos(root: Path):
    report = {}
    if not root.exists():
        return report
    candidates = ['binocular', 'panoramic', 'front_view', 'third_person', 'activity_segmentation', 'TAL_annotations']
    for name in candidates:
        path = root / name
        if path.exists():
            if name.lower() in ('tal_annotations', 'activity_segmentation'):
                count = len(list(path.glob('*.json')))
                report[name] = {'type': 'annotations', 'count': count, 'path': str(path)}
            else:
                vids = list(path.rglob('*.mp4'))
                report[name] = {'type': 'video', 'count': len(vids), 'path': str(path)}
    if not report:
        vids = list(root.rglob('*.mp4'))
        report['all_videos'] = {'type': 'video', 'count': len(vids), 'path': str(root)}
    return report

print('HR inventory:')
hr_inv = inventory_videos(HR_ROOT)
for k, v in hr_inv.items():
    print(f" - {k:<18} {v['type']:<10} {v['count']:>6} @ {v['path']}")

print('LR inventory:')
lr_inv = inventory_videos(LR_ROOT)
for k, v in lr_inv.items():
    print(f" - {k:<18} {v['type']:<10} {v['count']:>6} @ {v['path']}")

HR inventory:
 - binocular          video          55 @ D:\CaptionQA\data\360x\360x_dataset_HR\binocular
 - TAL_annotations    annotations    232 @ D:\CaptionQA\data\360x\360x_dataset_HR\TAL_annotations
LR inventory:


### A.3 Parse index/classes (if present)

In [8]:
candidates = list(HR_ROOT.rglob('index.json')) + list(HR_ROOT.rglob('classes.json'))
if not candidates:
    print('[info] No index.json or classes.json found under HR root.')
else:
    for p in candidates[:5]:
        try:
            with p.open('r', encoding='utf-8') as h:
                obj = json.load(h)
        except Exception as exc:
            print('Failed to parse', p, '->', exc)
            continue
        keys = list(obj) if isinstance(obj, dict) else (list(obj[0].keys()) if isinstance(obj, list) and obj else [])
        print(f'Parsed {p} | keys: {keys[:10]}')

[info] No index.json or classes.json found under HR root.


### A.4 Spot-check file sizes and hashes

In [9]:
import random, hashlib

def sample_files(report: dict, n: int = 3):
    samples = {}
    for name, entry in report.items():
        if entry.get('type') == 'video':
            files = sorted(Path(entry['path']).rglob('*.mp4'))
            if files:
                k = min(n, len(files))
                samples[name] = random.sample(files, k)
    return samples

def sha256_limited(path: Path, max_bytes: int = 5 * 1024 * 1024):
    h = hashlib.sha256()
    total = 0
    with path.open('rb') as f:
        while total < max_bytes:
            chunk = f.read(min(1024 * 1024, max_bytes - total))
            if not chunk:
                break
            h.update(chunk)
            total += len(chunk)
    return h.hexdigest(), total

samples = sample_files(hr_inv, n=3)
for name, files in samples.items():
    print(f'{name} samples ({len(files)}):')
    for p in files:
        try:
            size = p.stat().st_size
        except Exception as exc:
            print(' -', p, '-> stat failed:', exc)
            continue
        digest, scanned = sha256_limited(p)
        print(f' - {p.name:40s} size={size:,}B sha256[:8]={digest[:8]} scanned={scanned:,}B')

binocular samples (3):
 - clip4.mp4                                size=109,447,419B sha256[:8]=bd176b2e scanned=5,242,880B
 - clip3.mp4                                size=8,236,777B sha256[:8]=7a2c5197 scanned=5,242,880B
 - clip1.mp4                                size=217,734,288B sha256[:8]=f6a99fe4 scanned=5,242,880B
