# RadiObject - The Top-Level Container

RadiObject is the top-level container for multi-collection radiology data. This notebook covers:

- Loading and exploring RadiObject
- Subject indexing (`iloc`, `loc`, `[]`) and filtering
- Views and materialization
- **Lazy Mode** with `lazy()` for transform pipelines

**Key terms:** See [Lexicon](https://srdsam.github.io/RadiObject/LEXICON/) for definitions of RadiObject, VolumeCollection, Volume, and Query.

**Prerequisites:** Run [00_ingest_brats.ipynb](./00_ingest_brats.ipynb) first.

In [1]:
import shutil
import sys
import tempfile
from pathlib import Path

sys.path.insert(0, "..")

import numpy as np

from radiobject.ctx import S3Config, configure
from radiobject.data import S3_REGION, get_brats_uri
from radiobject.radi_object import RadiObject

BRATS_URI = get_brats_uri()

# Configure S3 if using S3 URI
if BRATS_URI.startswith("s3://"):
    configure(s3=S3Config(region=S3_REGION))

print(f"RadiObject URI: {BRATS_URI}")

RadiObject URI: /Users/samueldsouza/Desktop/Code/RadiObject/notebooks/data/brats_radiobject


In [2]:
radi = RadiObject(BRATS_URI)
print(radi)

# Quick summary of the RadiObject
print("\n" + radi.describe())

RadiObject(368 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w])

RadiObject Summary
URI: /Users/samueldsouza/Desktop/Code/RadiObject/notebooks/data/brats_radiobject
Subjects: 368
Collections: 5

Collections:
  - seg: 368 volumes, shape=240x240x155
  - T2w: 368 volumes, shape=240x240x155
  - FLAIR: 368 volumes, shape=240x240x155
  - T1gd: 368 volumes, shape=240x240x155
  - T1w: 368 volumes, shape=240x240x155


In [3]:
# Read all subject metadata
radi.obs_meta.read()

Unnamed: 0,obs_subject_id,obs_id,age,survival_days,resection_status,dataset
0,BraTS20_Training_001,BraTS20_Training_001,60.463,289.0,GTR,BraTS2020
1,BraTS20_Training_002,BraTS20_Training_002,52.263,616.0,GTR,BraTS2020
2,BraTS20_Training_003,BraTS20_Training_003,54.301,464.0,GTR,BraTS2020
3,BraTS20_Training_004,BraTS20_Training_004,39.068,788.0,GTR,BraTS2020
4,BraTS20_Training_005,BraTS20_Training_005,68.493,465.0,GTR,BraTS2020
...,...,...,...,...,...,...
363,BraTS20_Training_365,BraTS20_Training_365,,,,BraTS2020
364,BraTS20_Training_366,BraTS20_Training_366,72.000,633.0,GTR,BraTS2020
365,BraTS20_Training_367,BraTS20_Training_367,60.000,437.0,STR,BraTS2020
366,BraTS20_Training_368,BraTS20_Training_368,49.000,442.0,GTR,BraTS2020


In [4]:
# Read specific columns
radi.obs_meta.read(columns=["obs_subject_id", "resection_status", "age"])

Unnamed: 0,obs_subject_id,obs_id,resection_status,age
0,BraTS20_Training_001,BraTS20_Training_001,GTR,60.463
1,BraTS20_Training_002,BraTS20_Training_002,GTR,52.263
2,BraTS20_Training_003,BraTS20_Training_003,GTR,54.301
3,BraTS20_Training_004,BraTS20_Training_004,GTR,39.068
4,BraTS20_Training_005,BraTS20_Training_005,GTR,68.493
...,...,...,...,...
363,BraTS20_Training_365,BraTS20_Training_365,,
364,BraTS20_Training_366,BraTS20_Training_366,GTR,72.000
365,BraTS20_Training_367,BraTS20_Training_367,STR,60.000
366,BraTS20_Training_368,BraTS20_Training_368,GTR,49.000


In [5]:
# Filter with QueryCondition
radi.obs_meta.read(value_filter="resection_status == 'GTR'")

Unnamed: 0,obs_subject_id,obs_id,age,survival_days,resection_status,dataset
0,BraTS20_Training_001,BraTS20_Training_001,60.463,289.0,GTR,BraTS2020
1,BraTS20_Training_002,BraTS20_Training_002,52.263,616.0,GTR,BraTS2020
2,BraTS20_Training_003,BraTS20_Training_003,54.301,464.0,GTR,BraTS2020
3,BraTS20_Training_004,BraTS20_Training_004,39.068,788.0,GTR,BraTS2020
4,BraTS20_Training_005,BraTS20_Training_005,68.493,465.0,GTR,BraTS2020
...,...,...,...,...,...,...
114,BraTS20_Training_360,BraTS20_Training_360,50.000,540.0,GTR,BraTS2020
115,BraTS20_Training_363,BraTS20_Training_363,57.000,62.0,GTR,BraTS2020
116,BraTS20_Training_366,BraTS20_Training_366,72.000,633.0,GTR,BraTS2020
117,BraTS20_Training_368,BraTS20_Training_368,49.000,442.0,GTR,BraTS2020


In [6]:
print(f"Collection names: {radi.collection_names}")
print(f"Number of collections: {radi.n_collections}")

Collection names: ('seg', 'T2w', 'FLAIR', 'T1gd', 'T1w')
Number of collections: 5


In [7]:
# Access via attribute or method
flair = radi.FLAIR  # Attribute access
flair_alt = radi.collection("FLAIR")  # Method access

# Display the VolumeCollection
flair

VolumeCollection('FLAIR', 368 volumes, shape=240x240x155)

In [8]:
# Iterate over collection names
for name in radi:
    coll = radi.collection(name)
    print(f"{name}: {coll}")

seg: VolumeCollection('seg', 368 volumes, shape=240x240x155)
T2w: VolumeCollection('T2w', 368 volumes, shape=240x240x155)
FLAIR: VolumeCollection('FLAIR', 368 volumes, shape=240x240x155)
T1gd: VolumeCollection('T1gd', 368 volumes, shape=240x240x155)
T1w: VolumeCollection('T1w', 368 volumes, shape=240x240x155)


## Subject Indexing

Pandas-like indexing: `iloc` (by position), `loc` (by ID), `[]` (shorthand for loc).

In [9]:
# iloc: integer-location indexing
print(f"iloc[0]:       {radi.iloc[0]}")
print(f"iloc[0:3]:     {radi.iloc[0:3]}")
print(f"iloc[[0,2,4]]: {radi.iloc[[0, 2, 4]]}")

# loc: label-based indexing (uses obs_subject_id)
first_id = radi.obs_subject_ids[0]
third_id = radi.obs_subject_ids[2]
print(f"\nloc['{first_id}']:   {radi.loc[first_id]}")
print(f"loc[[first, third]]: {radi.loc[[first_id, third_id]]}")

# Bracket: shorthand for .loc[]
print(f"\nradi['{first_id}']: {radi[first_id]}")

iloc[0]:       RadiObject(1 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
iloc[0:3]:     RadiObject(3 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
iloc[[0,2,4]]: RadiObject(3 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)

loc['BraTS20_Training_001']:   RadiObject(1 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
loc[[first, third]]: RadiObject(2 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)

radi['BraTS20_Training_001']: RadiObject(1 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)


In [10]:
# Boolean mask indexing
meta = radi.obs_meta.read()
mask = (meta["age"] > 40).values

view_filtered = radi.iloc[mask]
print(f"Subjects with age > 40: {view_filtered.obs_subject_ids}")

Subjects with age > 40: ['BraTS20_Training_001', 'BraTS20_Training_002', 'BraTS20_Training_003', 'BraTS20_Training_005', 'BraTS20_Training_006', 'BraTS20_Training_007', 'BraTS20_Training_008', 'BraTS20_Training_009', 'BraTS20_Training_010', 'BraTS20_Training_011', 'BraTS20_Training_012', 'BraTS20_Training_013', 'BraTS20_Training_014', 'BraTS20_Training_015', 'BraTS20_Training_016', 'BraTS20_Training_017', 'BraTS20_Training_018', 'BraTS20_Training_019', 'BraTS20_Training_020', 'BraTS20_Training_021', 'BraTS20_Training_022', 'BraTS20_Training_023', 'BraTS20_Training_024', 'BraTS20_Training_025', 'BraTS20_Training_026', 'BraTS20_Training_027', 'BraTS20_Training_028', 'BraTS20_Training_029', 'BraTS20_Training_030', 'BraTS20_Training_031', 'BraTS20_Training_032', 'BraTS20_Training_033', 'BraTS20_Training_034', 'BraTS20_Training_035', 'BraTS20_Training_036', 'BraTS20_Training_037', 'BraTS20_Training_038', 'BraTS20_Training_039', 'BraTS20_Training_040', 'BraTS20_Training_041', 'BraTS20_Traini

## Filtering: `filter()`, `head()`, `tail()`, `sample()`

In [11]:
# filter(): metadata expression filtering
hgg_filter = "resection_status == 'GTR'"
compound_filter = "resection_status == 'GTR' and age > 40"
print(f"filter('{hgg_filter}'): {radi.filter(hgg_filter)}")
print(f"filter('... and age > 40'): {radi.filter(compound_filter)}")

# head/tail/sample
print(f"\nhead(2): {radi.head(2).obs_subject_ids}")
print(f"tail(2): {radi.tail(2).obs_subject_ids}")
print(f"sample(n=3, seed=42): {radi.sample(n=3, seed=42).obs_subject_ids}")

filter('resection_status == 'GTR''): RadiObject(119 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)
filter('... and age > 40'): RadiObject(114 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w]) (view)

head(2): ['BraTS20_Training_001', 'BraTS20_Training_002']
tail(2): ['BraTS20_Training_368', 'BraTS20_Training_369']
sample(n=3, seed=42): ['BraTS20_Training_033', 'BraTS20_Training_241', 'BraTS20_Training_285']


In [12]:
## Collection Filtering & Chaining

# select_collections() filters to specific modalities
tumor_view = radi.select_collections(["FLAIR", "T2w"])
print(f"Original: {radi.collection_names} -> Filtered: {tumor_view.collection_names}")

# Chain subject + collection filters
chained = radi.iloc[0:3].select_collections(["FLAIR", "T1w"])
print(f"\nChained: subjects={chained.obs_subject_ids}, collections={chained.collection_names}")

Original: ('seg', 'T2w', 'FLAIR', 'T1gd', 'T1w') -> Filtered: ('T2w', 'FLAIR')

Chained: subjects=['BraTS20_Training_001', 'BraTS20_Training_002', 'BraTS20_Training_003'], collections=('FLAIR', 'T1w')


In [13]:
## Views & Materialization

# All filtering operations return a RadiObject view (is_view=True).
# Views are:
#   - Immediate: data is accessible right away
#   - Immutable: the original RadiObject is unchanged
#   - Chainable: you can filter further (e.g., view.filter(...))

view = radi.iloc[0:2]
print(f"View type: {type(view).__name__}, is_view: {view.is_view}")

# Materialize to new storage with materialize()
TEMP_DIR = tempfile.mkdtemp(prefix="radi_demo_")
subset_uri = str(Path(TEMP_DIR) / "subset")
subset_radi = radi.iloc[0:2].select_collections(["FLAIR"]).materialize(subset_uri)
print(f"Materialized: {subset_radi}")

# Verify data integrity
orig = radi.FLAIR.iloc[0].axial(z=77)
copy = subset_radi.FLAIR.iloc[0].axial(z=77)
print(f"Data matches: {np.allclose(orig, copy)}")

shutil.rmtree(TEMP_DIR)

View type: RadiObject, is_view: True


Materialized: RadiObject(2 subjects, 1 collections: [FLAIR])
Data matches: True


In [14]:
radi.validate()
print("Validation passed")

Validation passed


## Next Steps

- [02_volume_collection.ipynb](./02_volume_collection.ipynb) - Working with volume groups
- [03_volume.ipynb](./03_volume.ipynb) - Single volume operations
- [04_storage_configuration.ipynb](./04_storage_configuration.ipynb) - Tile orientation, compression, S3

## Two Modes: Filtering vs Lazy

RadiObject offers two approaches to working with data:

| Mode | Method | Returns | Best For |
|------|--------|---------|----------|
| **Filtering** | `filter()`, `iloc[]`, `head()` | `RadiObject` (view) | Interactive analysis, quick previews |
| **Lazy** | `lazy().filter()...` | `Query` | Transform pipelines, ML data prep |

### When to use which?

**Filtering Mode** - Returns RadiObject views immediately:
```python
# Quick filtering for interactive work
view = radi.filter("age > 40")  # Immediate: can access .FLAIR, .obs_meta right away
vol = view.FLAIR.iloc[0]        # Works immediately
view.materialize("./subset")    # Write to storage
```

**Lazy Mode** - Returns lazy `Query`, for transforms:
```python
# For transforms: build query, apply map(), then materialize
result = (
    radi.lazy()
    .filter("age > 40")
    .select_collections(["FLAIR"])
    .map(normalize_intensity)  # Transform during materialization
    .materialize("./normalized")
)
```

**Rule of thumb:** Use `filter()` for exploration and simple subsetting. Use `lazy()` when you need to apply transforms via `map()`.

In [15]:
# lazy() returns Query - for transform pipelines
q = radi.lazy()
filtered = q.filter("resection_status == 'GTR'").select_collections(["FLAIR", "T1w"]).head(3)
print(f"Query: {filtered}")
print(f"Count: {filtered.count()}")

Query: Query(3 subjects, 6 volumes across [FLAIR, T1w])
Count: QueryCount(n_subjects=3, n_volumes={'FLAIR': 3, 'T1w': 3})


In [16]:
# Streaming iteration (memory-efficient)
for vol in filtered.iter_volumes():
    print(f"Volume: {vol.obs_id}, shape: {vol.shape}")

# Batch iteration for ML
for batch in filtered.iter_batches(batch_size=2):
    print(f"\nBatch: {batch.subject_ids}")
    for name, arr in batch.volumes.items():
        print(f"  {name}: {arr.shape}")
    break

Volume: BraTS20_Training_001_FLAIR, shape: (240, 240, 155)
Volume: BraTS20_Training_003_FLAIR, shape: (240, 240, 155)
Volume: BraTS20_Training_002_FLAIR, shape: (240, 240, 155)
Volume: BraTS20_Training_003_T1w, shape: (240, 240, 155)
Volume: BraTS20_Training_002_T1w, shape: (240, 240, 155)
Volume: BraTS20_Training_001_T1w, shape: (240, 240, 155)



Batch: ('BraTS20_Training_001', 'BraTS20_Training_002')
  FLAIR: (2, 240, 240, 155)
  T1w: (2, 240, 240, 155)


In [17]:
# Export query results with streaming
TEMP_DIR_QUERY = tempfile.mkdtemp(prefix="radi_query_")
subset_from_query = filtered.materialize(str(Path(TEMP_DIR_QUERY) / "query_subset"), streaming=True)
print(f"Exported: {subset_from_query}")
shutil.rmtree(TEMP_DIR_QUERY)

Exported: RadiObject(3 subjects, 2 collections: [T1w, FLAIR])
