# VolumeCollection - Working with Volume Groups

A VolumeCollection organizes multiple Volumes with consistent X/Y/Z dimensions. This notebook covers:

- Accessing collections from RadiObject
- Collection properties and metadata
- Volume-level indexing (`iloc`, `loc`, boolean masks)
- Batch operations across volumes
- **Collection Query API** with lazy `CollectionQuery` builder
- Standalone collection creation

**Prerequisites:** Run [00_ingest_brats.ipynb](./00_ingest_brats.ipynb) first to create the RadiObject.

**Next:** [03_volume.ipynb](./03_volume.ipynb) - Single volume operations

## Setup

In [None]:
import sys
sys.path.insert(0, '..')

import tempfile
import shutil
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from config import BRATS_URI, S3_REGION
from src.radi_object import RadiObject
from src.volume_collection import VolumeCollection
from src.volume import Volume
from src.query import CollectionQuery  # Pipeline mode query builder
from src.ctx import configure, S3Config, TileConfig, SliceOrientation, CompressionConfig, Compressor

print(f"RadiObject URI: {BRATS_URI}")

In [None]:
# Configure S3 if using S3 URI
if BRATS_URI.startswith("s3://"):
    configure(s3=S3Config(region=S3_REGION))

configure(
    tile=TileConfig(orientation=SliceOrientation.AXIAL),
    compression=CompressionConfig(algorithm=Compressor.ZSTD, level=3)
)

## Load RadiObject from URI

In [None]:
radi = RadiObject(BRATS_URI)
radi

## Accessing Collections

In [None]:
# Access via attribute or method
flair = radi.FLAIR
t1w = radi.collection("T1w")

# Display the collection
flair

## Collection Properties

In [None]:
print(f"Shape (X, Y, Z): {flair.shape}")
print(f"Number of volumes: {len(flair)}")
print(f"Volume IDs: {flair.obs_ids}")

## Volume-Level Metadata: `obs`

Each collection has its own `obs` dataframe with per-volume metadata.

In [None]:
# Read all volume metadata
flair.obs.read()

In [None]:
# Available columns (includes NIfTI metadata)
print(f"Columns: {flair.obs.columns}")

In [None]:
# Filter volume metadata
flair.obs.read(columns=["obs_id", "obs_subject_id", "series_type", "dimensions"])

## Volume Indexing

Access individual volumes using pandas-like indexing.

In [None]:
# Integer-location indexing
vol = flair.iloc[0]           # First volume
vols = flair.iloc[0:3]        # First 3 volumes

print(f"iloc[0]: {vol}")
print(f"iloc[0:3]: {len(vols)} volumes -> {[v for v in vols]}")

In [None]:
# Label-based indexing by obs_id
first_obs_id = flair.obs_ids[0]
vol_by_id = flair.loc[first_obs_id]
print(f"loc['{first_obs_id}']: {vol_by_id}")

In [None]:
# Boolean mask indexing
mask = np.array([True, False, True, False, True])
filtered_vols = flair.iloc[mask]

print(f"Boolean mask selected: {len(filtered_vols)} volumes")
print(f"Volume IDs: {[v.obs_id for v in filtered_vols]}")

In [None]:
# Direct bracket indexing (convenience)
vol_int = flair[0]                    # By position
vol_str = flair[flair.obs_ids[0]]     # By obs_id

print(f"flair[0]: {vol_int.obs_id}")
print(f"flair['{flair.obs_ids[0]}']: {vol_str.obs_id}")

## Batch Operations

Iterate over volumes for batch processing.

In [None]:
# Compute statistics across all volumes
stats = []
for i in range(len(flair)):
    vol = flair.iloc[i]
    data = vol.to_numpy()
    stats.append({
        "obs_id": vol.obs_id,
        "mean": data.mean(),
        "std": data.std(),
        "max": data.max(),
    })

pd.DataFrame(stats)

In [None]:
# Compare modalities for one subject
subject_ids = radi.obs_subject_ids
subject_id = subject_ids[0]
z_slice = 77
modalities = ["FLAIR", "T1w", "T1gd", "T2w"]

fig, axes = plt.subplots(1, 4, figsize=(14, 3.5))
for i, mod in enumerate(modalities):
    vol = radi.collection(mod).loc[f"{subject_id}_{mod}"]
    axes[i].imshow(vol.axial(z_slice).T, cmap='gray', origin='lower')
    axes[i].set_title(mod)
    axes[i].axis('off')

plt.suptitle(f"{subject_id} - Z={z_slice}")
plt.tight_layout()
plt.show()

## ID Mapping

In [None]:
# Convert between obs_id and index
second_obs_id = flair.obs_ids[1]
idx = flair.obs_id_to_index(second_obs_id)
obs_id = flair.index_to_obs_id(2)

print(f"'{second_obs_id}' -> index {idx}")
print(f"index 2 -> '{obs_id}'")

In [None]:
# Get full obs row for a volume
flair.get_obs_row_by_obs_id(flair.obs_ids[0])

## Standalone Collection Creation

Create a VolumeCollection directly without a RadiObject. This requires raw NIfTI files.

## Collection Query API

For pipeline mode, use `query()` to create a lazy `CollectionQuery` builder.

In [None]:
# Create lazy CollectionQuery
cq = flair.query()
print(f"CollectionQuery type: {type(cq).__name__}")
print(f"Query: {cq}")

In [None]:
# Chain filters - lazy until materialized
filtered_cq = cq.head(3)
print(f"Filtered: {filtered_cq}")
print(f"Count: {filtered_cq.count()}")

In [None]:
# Materialize to obs DataFrame
filtered_cq.to_obs()

In [None]:
# Iterate over matching volumes
for vol in filtered_cq.iter_volumes():
    print(f"Volume: {vol.obs_id}, shape: {vol.shape}")

In [None]:
# Stack to numpy array (N, X, Y, Z)
stack = filtered_cq.to_numpy_stack()
print(f"Stacked shape: {stack.shape}")  # (3, X, Y, Z)

In [None]:
# Convenience methods on VolumeCollection return CollectionQuery
first_3 = flair.head(3)    # Returns CollectionQuery
last_3 = flair.tail(3)     # Returns CollectionQuery  
sample_3 = flair.sample(3, seed=42)  # Returns CollectionQuery

print(f"head(3): {first_3}")
print(f"tail(3): {last_3}")
print(f"sample(3, seed=42): {sample_3}")

In [None]:
# Example: Creating a standalone collection from NIfTIs
# (Requires raw NIfTI files - see 00_ingest_brats.ipynb for ingestion)

print("""To create a standalone VolumeCollection:

from src.volume_collection import VolumeCollection

# List of (nifti_path, subject_id) tuples
nifti_list = [
    (Path("subject1_FLAIR.nii.gz"), "subject1"),
    (Path("subject2_FLAIR.nii.gz"), "subject2"),
]

collection = VolumeCollection.from_niftis(
    uri="./my_collection",
    niftis=nifti_list,
)
""")

## Validation

In [None]:
flair.validate()
print("Collection validation passed")

## Next Steps

- [03_volume.ipynb](./03_volume.ipynb) - Single volume operations and partial reads
- [04_storage_configuration.ipynb](./04_storage_configuration.ipynb) - Tile orientation, compression, S3