# RadiObject - The Top-Level Container

RadiObject is the top-level container for multi-collection radiology data. This notebook covers:

- Loading and exploring RadiObject
- Subject indexing (`iloc`, `loc`, `[]`) and filtering
- Views and writing to storage

**Key terms:** See [Lexicon](https://srdsam.github.io/RadiObject/LEXICON/) for definitions of RadiObject, VolumeCollection, Volume, LazyQuery, and EagerQuery.

**Prerequisites:** Run [00_ingest_brats.ipynb](./00_ingest_brats.ipynb) first.

In [1]:
import shutil
import tempfile
from pathlib import Path

import numpy as np

from radiobject import RadiObject, S3Config, configure

# ── Storage URI ──────────────────────────────────────────────────
# Default: S3 (requires AWS credentials)
BRATS_URI = "s3://souzy-scratch/radiobject/brats-tutorial"
# For local storage, comment out the line above and uncomment:
# BRATS_URI = "./data/brats_radiobject"
# ─────────────────────────────────────────────────────────────────

configure(s3=S3Config(region="us-east-2"))
print(f"RadiObject URI: {BRATS_URI}")

RadiObject URI: s3://souzy-scratch/radiobject/brats-tutorial


In [2]:
radi = RadiObject(BRATS_URI)
print(radi)

# Quick summary of the RadiObject
print("\n" + radi.describe())

RadiObject(368 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w])



RadiObject Summary
URI: s3://souzy-scratch/radiobject/brats-tutorial
Subjects: 368
Collections: 5

Collections:
  - seg: 368 volumes, shape=240x240x155
  - T2w: 368 volumes, shape=240x240x155
  - FLAIR: 368 volumes, shape=240x240x155
  - T1gd: 368 volumes, shape=240x240x155
  - T1w: 368 volumes, shape=240x240x155


In [3]:
# Read all subject metadata
radi.obs_meta.read()

Unnamed: 0,obs_subject_id,age,survival_days,resection_status,dataset,obs_ids
0,BraTS20_Training_001,60.463,289.0,GTR,BraTS2020,"[""BraTS20_Training_001_FLAIR"", ""BraTS20_Traini..."
1,BraTS20_Training_002,52.263,616.0,GTR,BraTS2020,"[""BraTS20_Training_002_FLAIR"", ""BraTS20_Traini..."
2,BraTS20_Training_003,54.301,464.0,GTR,BraTS2020,"[""BraTS20_Training_003_FLAIR"", ""BraTS20_Traini..."
3,BraTS20_Training_004,39.068,788.0,GTR,BraTS2020,"[""BraTS20_Training_004_FLAIR"", ""BraTS20_Traini..."
4,BraTS20_Training_005,68.493,465.0,GTR,BraTS2020,"[""BraTS20_Training_005_FLAIR"", ""BraTS20_Traini..."
...,...,...,...,...,...,...
363,BraTS20_Training_365,,,,BraTS2020,"[""BraTS20_Training_365_FLAIR"", ""BraTS20_Traini..."
364,BraTS20_Training_366,72.000,633.0,GTR,BraTS2020,"[""BraTS20_Training_366_FLAIR"", ""BraTS20_Traini..."
365,BraTS20_Training_367,60.000,437.0,STR,BraTS2020,"[""BraTS20_Training_367_FLAIR"", ""BraTS20_Traini..."
366,BraTS20_Training_368,49.000,442.0,GTR,BraTS2020,"[""BraTS20_Training_368_FLAIR"", ""BraTS20_Traini..."


In [4]:
# Read specific columns
radi.obs_meta.read(columns=["obs_subject_id", "resection_status", "age"])

Unnamed: 0,obs_subject_id,resection_status,age
0,BraTS20_Training_001,GTR,60.463
1,BraTS20_Training_002,GTR,52.263
2,BraTS20_Training_003,GTR,54.301
3,BraTS20_Training_004,GTR,39.068
4,BraTS20_Training_005,GTR,68.493
...,...,...,...
363,BraTS20_Training_365,,
364,BraTS20_Training_366,GTR,72.000
365,BraTS20_Training_367,STR,60.000
366,BraTS20_Training_368,GTR,49.000


In [5]:
# Filter with QueryCondition
radi.obs_meta.read(value_filter="resection_status == 'GTR'")

Unnamed: 0,obs_subject_id,age,survival_days,resection_status,dataset,obs_ids
0,BraTS20_Training_001,60.463,289.0,GTR,BraTS2020,"[""BraTS20_Training_001_FLAIR"", ""BraTS20_Traini..."
1,BraTS20_Training_002,52.263,616.0,GTR,BraTS2020,"[""BraTS20_Training_002_FLAIR"", ""BraTS20_Traini..."
2,BraTS20_Training_003,54.301,464.0,GTR,BraTS2020,"[""BraTS20_Training_003_FLAIR"", ""BraTS20_Traini..."
3,BraTS20_Training_004,39.068,788.0,GTR,BraTS2020,"[""BraTS20_Training_004_FLAIR"", ""BraTS20_Traini..."
4,BraTS20_Training_005,68.493,465.0,GTR,BraTS2020,"[""BraTS20_Training_005_FLAIR"", ""BraTS20_Traini..."
...,...,...,...,...,...,...
114,BraTS20_Training_360,50.000,540.0,GTR,BraTS2020,"[""BraTS20_Training_360_FLAIR"", ""BraTS20_Traini..."
115,BraTS20_Training_363,57.000,62.0,GTR,BraTS2020,"[""BraTS20_Training_363_FLAIR"", ""BraTS20_Traini..."
116,BraTS20_Training_366,72.000,633.0,GTR,BraTS2020,"[""BraTS20_Training_366_FLAIR"", ""BraTS20_Traini..."
117,BraTS20_Training_368,49.000,442.0,GTR,BraTS2020,"[""BraTS20_Training_368_FLAIR"", ""BraTS20_Traini..."


In [6]:
print(f"Collection names: {radi.collection_names}")
print(f"Number of collections: {radi.n_collections}")

Collection names: ('seg', 'T2w', 'FLAIR', 'T1gd', 'T1w')
Number of collections: 5


In [7]:
# Access via attribute or method
flair = radi.FLAIR  # Attribute access
flair_alt = radi.collection("FLAIR")  # Method access

# Display the VolumeCollection
flair

VolumeCollection('FLAIR', 368 volumes, shape=240x240x155)

In [8]:
# Iterate over collection names
for name in radi:
    coll = radi.collection(name)
    print(f"{name}: {coll}")

seg: VolumeCollection('seg', 368 volumes, shape=240x240x155)
T2w: VolumeCollection('T2w', 368 volumes, shape=240x240x155)
FLAIR: VolumeCollection('FLAIR', 368 volumes, shape=240x240x155)
T1gd: VolumeCollection('T1gd', 368 volumes, shape=240x240x155)
T1w: VolumeCollection('T1w', 368 volumes, shape=240x240x155)


## Subject Indexing

Pandas-like indexing: `iloc` (by position), `loc` (by ID), `[]` (shorthand for loc).

In [9]:
# iloc: integer-location indexing
print(f"iloc[0]:       {radi.iloc[0]}")
print(f"iloc[0:3]:     {radi.iloc[0:3]}")
print(f"iloc[[0,2,4]]: {radi.iloc[[0, 2, 4]]}")

# loc: label-based indexing (uses obs_subject_id)
first_id = radi.obs_subject_ids[0]
third_id = radi.obs_subject_ids[2]
print(f"\nloc['{first_id}']:   {radi.loc[first_id]}")
print(f"loc[[first, third]]: {radi.loc[[first_id, third_id]]}")

# Bracket: shorthand for .loc[]
print(f"\nradi['{first_id}']: {radi[first_id]}")

iloc[0]:       RadiObject(1 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
iloc[0:3]:     RadiObject(3 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
iloc[[0,2,4]]: RadiObject(3 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)

loc['BraTS20_Training_001']:   RadiObject(1 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
loc[[first, third]]: RadiObject(2 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)

radi['BraTS20_Training_001']: RadiObject(1 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)


In [10]:
# Boolean mask indexing
meta = radi.obs_meta.read()
mask = (meta["age"] > 40).values

view_filtered = radi.iloc[mask]
print(f"Subjects with age > 40: {len(view_filtered)} of {len(radi)}")
print(f"First 5: {view_filtered.obs_subject_ids[:5]}")

Subjects with age > 40: 224 of 368
First 5: ['BraTS20_Training_001', 'BraTS20_Training_002', 'BraTS20_Training_003', 'BraTS20_Training_005', 'BraTS20_Training_006']


## Filtering: `filter()`, `head()`, `tail()`, `sample()`

In [11]:
# filter(): metadata expression filtering
hgg_filter = "resection_status == 'GTR'"
compound_filter = "resection_status == 'GTR' and age > 40"
print(f"filter('{hgg_filter}'): {radi.filter(hgg_filter)}")
print(f"filter('... and age > 40'): {radi.filter(compound_filter)}")

# head/tail/sample
print(f"\nhead(2): {radi.head(2).obs_subject_ids}")
print(f"tail(2): {radi.tail(2).obs_subject_ids}")
print(f"sample(n=3, seed=42): {radi.sample(n=3, seed=42).obs_subject_ids}")

filter('resection_status == 'GTR''): RadiObject(119 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
filter('... and age > 40'): RadiObject(114 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)

head(2): ['BraTS20_Training_001', 'BraTS20_Training_002']
tail(2): ['BraTS20_Training_368', 'BraTS20_Training_369']
sample(n=3, seed=42): ['BraTS20_Training_033', 'BraTS20_Training_241', 'BraTS20_Training_285']


In [12]:
## Collection Filtering & Chaining

# select_collections() filters to specific modalities
tumor_view = radi.select_collections(["FLAIR", "T2w"])
print(f"Original: {radi.collection_names} -> Filtered: {tumor_view.collection_names}")

# Chain subject + collection filters
chained = radi.iloc[0:3].select_collections(["FLAIR", "T1w"])
print(f"\nChained: subjects={chained.obs_subject_ids}, collections={chained.collection_names}")

Original: ('seg', 'T2w', 'FLAIR', 'T1gd', 'T1w') -> Filtered: ('T2w', 'FLAIR')

Chained: subjects=['BraTS20_Training_001', 'BraTS20_Training_002', 'BraTS20_Training_003'], collections=('FLAIR', 'T1w')


In [13]:
## Views & Writing to Storage

# All filtering operations return a RadiObject view (is_view=True).
# Views are:
#   - Immediate: data is accessible right away
#   - Immutable: the original RadiObject is unchanged
#   - Chainable: you can filter further (e.g., view.filter(...))

view = radi.iloc[0:2]
print(f"View type: {type(view).__name__}, is_view: {view.is_view}")

# Write to new storage with write()
TEMP_DIR = tempfile.mkdtemp(prefix="radi_demo_")
subset_uri = str(Path(TEMP_DIR) / "subset")
subset_radi = radi.iloc[0:2].select_collections(["FLAIR"]).write(subset_uri)
print(f"Written: {subset_radi}")

# Verify data integrity
orig = radi.FLAIR.iloc[0].axial(z=77)
copy = subset_radi.FLAIR.iloc[0].axial(z=77)
print(f"Data matches: {np.allclose(orig, copy)}")

shutil.rmtree(TEMP_DIR)

View type: RadiObject, is_view: True


Written: RadiObject(2 subjects, 1 collections: [FLAIR])


Data matches: True


In [14]:
radi.validate()
print("Validation passed")

Validation passed


## Next Steps

- [02_volume_collection.ipynb](./02_volume_collection.ipynb) - Working with volume groups
- [03_volume.ipynb](./03_volume.ipynb) - Single volume operations
- [04_configuration.ipynb](./04_configuration.ipynb) - Tile orientation, compression, S3