# BraTS Data Ingestion

This notebook ingests BraTS (Brain Tumor Segmentation) sample data into a RadiObject. Run this **once** before notebooks 01-04.

## Overview

1. Check if RadiObject already exists at configured URI (skip if so)
2. Load raw NIfTI files from test data
3. Split 4D volumes into individual modalities
4. Create RadiObject with subject metadata

## Configuration

Edit `config.py` to change the target URI:

```python
# S3 storage (default)
BRATS_URI = "s3://souzy-scratch/radiobject/brats-tutorial"

# Local storage alternative
BRATS_URI = "./data/brats-tutorial"
```

## Setup

In [None]:
import sys
sys.path.insert(0, '..')

import tempfile
import shutil
from pathlib import Path

import numpy as np
import pandas as pd
import nibabel as nib

from config import BRATS_URI, S3_REGION
from src.radi_object import RadiObject
from src.ctx import configure, S3Config, TileConfig, SliceOrientation, CompressionConfig, Compressor

print(f"Target URI: {BRATS_URI}")

In [None]:
# Configure S3 if using S3 URI
if BRATS_URI.startswith("s3://"):
    configure(s3=S3Config(region=S3_REGION))
    print(f"S3 configured for region: {S3_REGION}")

# Configure TileDB storage
configure(
    tile=TileConfig(orientation=SliceOrientation.AXIAL),
    compression=CompressionConfig(algorithm=Compressor.ZSTD, level=3)
)

## Check if RadiObject Exists

In [None]:
def uri_exists(uri: str) -> bool:
    """Check if RadiObject exists at URI."""
    try:
        radi = RadiObject(uri)
        _ = radi.collection_names  # Force validation by accessing group metadata
        return True
    except Exception:
        return False

if uri_exists(BRATS_URI):
    print(f"RadiObject already exists at {BRATS_URI}")
    print("Skipping ingestion. Delete the URI to re-ingest.")
    SKIP_INGESTION = True
else:
    print(f"No RadiObject found at {BRATS_URI}")
    print("Proceeding with ingestion...")
    SKIP_INGESTION = False

## Load Test Data

In [None]:
if not SKIP_INGESTION:
    # Sync test data from S3 (downloads to ~/.cache/radiobject/ if not present)
    from data import get_test_data_path
    from data.sync import get_manifest

    DATA_DIR = get_test_data_path()
    NIFTI_DIR = DATA_DIR / "nifti" / "msd_brain_tumour"

    manifest = get_manifest("nifti")
    print(f"Found {len(manifest)} BraTS samples")

## Prepare NIfTI Files

BraTS stores 4 modalities (FLAIR, T1w, T1gd, T2w) in a single 4D file. We split them into individual 3D volumes.

In [None]:
if not SKIP_INGESTION:
    N_SUBJECTS = 5
    MODALITIES = ["FLAIR", "T1w", "T1gd", "T2w"]

    subjects = manifest[:N_SUBJECTS]
    subject_ids = [s["sample_id"] for s in subjects]

    # Create temp directory for split NIfTIs
    TEMP_DIR = tempfile.mkdtemp(prefix="brats_ingest_")
    split_dir = Path(TEMP_DIR) / "split_niftis"
    split_dir.mkdir(exist_ok=True)

    nifti_list = []
    for entry in subjects:
        img = nib.load(DATA_DIR / entry["image_path"])
        data_4d = np.asarray(img.dataobj, dtype=np.float32)
        
        for mod_idx, modality in enumerate(MODALITIES):
            nifti_path = split_dir / f"{entry['sample_id']}_{modality}.nii.gz"
            nib.save(nib.Nifti1Image(data_4d[..., mod_idx], img.affine), nifti_path)
            nifti_list.append((nifti_path, entry["sample_id"]))

    print(f"Prepared {len(nifti_list)} NIfTI files")
    print(f"Subjects: {subject_ids}")
    print(f"Modalities: {MODALITIES}")

## Create Subject Metadata

In [None]:
if not SKIP_INGESTION:
    # Create subject-level metadata with reproducible random values
    np.random.seed(42)
    obs_meta_df = pd.DataFrame({
        "obs_subject_id": subject_ids,
        "obs_id": subject_ids,
        "dataset": "BraTS",
        "tumor_grade": np.random.choice(["LGG", "HGG"], N_SUBJECTS),
        "age": np.random.randint(30, 70, N_SUBJECTS),
    })
    print("Subject metadata:")
    display(obs_meta_df)

## Create RadiObject

In [None]:
if not SKIP_INGESTION:
    print(f"Creating RadiObject at: {BRATS_URI}")
    
    radi = RadiObject.from_niftis(
        uri=BRATS_URI,
        niftis=nifti_list,
        obs_meta=obs_meta_df,
    )
    
    print(f"\nCreated: {radi}")

## Validate

In [None]:
if not SKIP_INGESTION:
    radi.validate()
    print("Validation passed")
    
    # Display summary
    print(f"\nCollections: {radi.collection_names}")
    print(f"Subjects: {radi.obs_subject_ids}")
    print(f"\nobs_meta:")
    display(radi.obs_meta.read())

## Cleanup Temp Files

In [None]:
if not SKIP_INGESTION:
    shutil.rmtree(TEMP_DIR)
    print(f"Cleaned up temp directory: {TEMP_DIR}")

## Verify RadiObject

Load the RadiObject from the URI to verify it was created correctly.

In [None]:
# Load from URI (works whether we just created it or it already existed)
radi = RadiObject(BRATS_URI)

print(f"Loaded: {radi}")
print(f"Collections: {radi.collection_names}")
print(f"Subjects: {len(radi)}")

# Quick data check
vol = radi.FLAIR.iloc[0]
print(f"\nSample volume: {vol}")
print(f"Axial slice shape: {vol.axial(z=77).shape}")

## Next Steps

The RadiObject is now available at `BRATS_URI`. Proceed to the tutorial notebooks:

- [01_radi_object.ipynb](./01_radi_object.ipynb) - RadiObject operations
- [02_volume_collection.ipynb](./02_volume_collection.ipynb) - Working with volume groups
- [03_volume.ipynb](./03_volume.ipynb) - Single volume operations
- [04_storage_configuration.ipynb](./04_storage_configuration.ipynb) - Tile orientation and compression