# RadiObject - The Top-Level Container

RadiObject is the top-level container for multi-collection radiology data. This notebook covers:

- Loading and exploring RadiObject
- Subject indexing (`iloc`, `loc`, `[]`) and filtering
- Views and materialization
- **Lazy Mode** with `lazy()` for transform pipelines

**Key terms:** See [Lexicon.md](../Lexicon.md) for definitions of RadiObject, VolumeCollection, Volume, and Query.

**Prerequisites:** Run [00_ingest_brats.ipynb](./00_ingest_brats.ipynb) first.

In [1]:
import shutil
import sys
import tempfile
from pathlib import Path

sys.path.insert(0, "..")

import numpy as np

from radiobject.ctx import S3Config, configure
from radiobject.data import S3_REGION, get_brats_uri
from radiobject.radi_object import RadiObject

BRATS_URI = get_brats_uri()

# Configure S3 if using S3 URI
if BRATS_URI.startswith("s3://"):
    configure(s3=S3Config(region=S3_REGION))

print(f"RadiObject URI: {BRATS_URI}")

RadiObject URI: /Users/samueldsouza/Desktop/Code/RadiObject/data/brats_radiobject


In [2]:
radi = RadiObject(BRATS_URI)
print(radi)

# Quick summary of the RadiObject
print("\n" + radi.describe())

RadiObject(5 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR])

RadiObject Summary
URI: /Users/samueldsouza/Desktop/Code/RadiObject/data/brats_radiobject
Subjects: 5
Collections: 4

Collections:
  - T2w: 5 volumes, shape=240x240x155
  - T1gd: 5 volumes, shape=240x240x155
  - T1w: 5 volumes, shape=240x240x155
  - FLAIR: 5 volumes, shape=240x240x155

Label Columns:
  - age: {37: 1, 50: 1, 68: 1, 48: 1, 52: 1}


In [3]:
# Read all subject metadata
radi.obs_meta.read()

Unnamed: 0,obs_subject_id,obs_id,dataset,tumor_grade,age
0,BRATS_001,BRATS_001,BraTS,LGG,37
1,BRATS_002,BRATS_002,BraTS,HGG,50
2,BRATS_003,BRATS_003,BraTS,LGG,68
3,BRATS_004,BRATS_004,BraTS,LGG,48
4,BRATS_005,BRATS_005,BraTS,LGG,52


In [4]:
# Read specific columns
radi.obs_meta.read(columns=["obs_subject_id", "tumor_grade", "age"])

Unnamed: 0,obs_subject_id,obs_id,tumor_grade,age
0,BRATS_001,BRATS_001,LGG,37
1,BRATS_002,BRATS_002,HGG,50
2,BRATS_003,BRATS_003,LGG,68
3,BRATS_004,BRATS_004,LGG,48
4,BRATS_005,BRATS_005,LGG,52


In [5]:
# Filter with QueryCondition
radi.obs_meta.read(value_filter="tumor_grade == 'HGG'")

Unnamed: 0,obs_subject_id,obs_id,dataset,tumor_grade,age
0,BRATS_002,BRATS_002,BraTS,HGG,50


In [6]:
print(f"Collection names: {radi.collection_names}")
print(f"Number of collections: {radi.n_collections}")

Collection names: ('T2w', 'T1gd', 'T1w', 'FLAIR')
Number of collections: 4


In [7]:
# Access via attribute or method
flair = radi.FLAIR  # Attribute access
flair_alt = radi.collection("FLAIR")  # Method access

# Display the VolumeCollection
flair

VolumeCollection('FLAIR', 5 volumes, shape=240x240x155)

In [8]:
# Iterate over collection names
for name in radi:
    coll = radi.collection(name)
    print(f"{name}: {coll}")

T2w: VolumeCollection('T2w', 5 volumes, shape=240x240x155)
T1gd: VolumeCollection('T1gd', 5 volumes, shape=240x240x155)
T1w: VolumeCollection('T1w', 5 volumes, shape=240x240x155)
FLAIR: VolumeCollection('FLAIR', 5 volumes, shape=240x240x155)


## Subject Indexing

Pandas-like indexing: `iloc` (by position), `loc` (by ID), `[]` (shorthand for loc).

In [9]:
# iloc: integer-location indexing
print(f"iloc[0]:       {radi.iloc[0]}")
print(f"iloc[0:3]:     {radi.iloc[0:3]}")
print(f"iloc[[0,2,4]]: {radi.iloc[[0, 2, 4]]}")

# loc: label-based indexing
print(f"\nloc['BRATS_001']:           {radi.loc['BRATS_001']}")
print(f"loc[['BRATS_001','BRATS_003']]: {radi.loc[['BRATS_001', 'BRATS_003']]}")

# Bracket: shorthand for .loc[]
print(f"\nradi['BRATS_001']: {radi['BRATS_001']}")

iloc[0]:       RadiObject(1 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)
iloc[0:3]:     RadiObject(3 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)
iloc[[0,2,4]]: RadiObject(3 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)

loc['BRATS_001']:           RadiObject(1 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)
loc[['BRATS_001','BRATS_003']]: RadiObject(2 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)

radi['BRATS_001']: RadiObject(1 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)


In [10]:
# Boolean mask indexing
meta = radi.obs_meta.read()
mask = (meta["age"] > 40).values

view_filtered = radi.iloc[mask]
print(f"Subjects with age > 40: {view_filtered.obs_subject_ids}")

Subjects with age > 40: ['BRATS_002', 'BRATS_003', 'BRATS_004', 'BRATS_005']


## Filtering: `filter()`, `head()`, `tail()`, `sample()`

In [11]:
# filter(): metadata expression filtering
hgg_filter = "tumor_grade == 'HGG'"
compound_filter = "tumor_grade == 'HGG' and age > 40"
print(f"filter('{hgg_filter}'): {radi.filter(hgg_filter)}")
print(f"filter('... and age > 40'): {radi.filter(compound_filter)}")

# head/tail/sample
print(f"\nhead(2): {radi.head(2).obs_subject_ids}")
print(f"tail(2): {radi.tail(2).obs_subject_ids}")
print(f"sample(n=3, seed=42): {radi.sample(n=3, seed=42).obs_subject_ids}")

filter('tumor_grade == 'HGG''): RadiObject(1 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)
filter('... and age > 40'): RadiObject(1 subjects, 4 collections: [T2w, T1gd, T1w, FLAIR]) (view)

head(2): ['BRATS_001', 'BRATS_002']
tail(2): ['BRATS_004', 'BRATS_005']
sample(n=3, seed=42): ['BRATS_001', 'BRATS_004', 'BRATS_005']


In [12]:
## Collection Filtering & Chaining

# select_collections() filters to specific modalities
tumor_view = radi.select_collections(["FLAIR", "T2w"])
print(f"Original: {radi.collection_names} -> Filtered: {tumor_view.collection_names}")

# Chain subject + collection filters
chained = radi.iloc[0:3].select_collections(["FLAIR", "T1w"])
print(f"\nChained: subjects={chained.obs_subject_ids}, collections={chained.collection_names}")

Original: ('T2w', 'T1gd', 'T1w', 'FLAIR') -> Filtered: ('T2w', 'FLAIR')

Chained: subjects=['BRATS_001', 'BRATS_002', 'BRATS_003'], collections=('T1w', 'FLAIR')


In [13]:
## Views & Materialization

# All filtering operations return a RadiObject view (is_view=True).
# Views are:
#   - Immediate: data is accessible right away
#   - Immutable: the original RadiObject is unchanged
#   - Chainable: you can filter further (e.g., view.filter(...))

view = radi.iloc[0:2]
print(f"View type: {type(view).__name__}, is_view: {view.is_view}")

# Materialize to new storage with materialize()
TEMP_DIR = tempfile.mkdtemp(prefix="radi_demo_")
subset_uri = str(Path(TEMP_DIR) / "subset")
subset_radi = radi.iloc[0:2].select_collections(["FLAIR"]).materialize(subset_uri)
print(f"Materialized: {subset_radi}")

# Verify data integrity
orig = radi.FLAIR.iloc[0].axial(z=77)
copy = subset_radi.FLAIR.iloc[0].axial(z=77)
print(f"Data matches: {np.allclose(orig, copy)}")

shutil.rmtree(TEMP_DIR)

View type: RadiObject, is_view: True


Materialized: RadiObject(2 subjects, 1 collections: [FLAIR])
Data matches: True


In [14]:
radi.validate()
print("Validation passed")

Validation passed


## Next Steps

- [02_volume_collection.ipynb](./02_volume_collection.ipynb) - Working with volume groups
- [03_volume.ipynb](./03_volume.ipynb) - Single volume operations
- [04_storage_configuration.ipynb](./04_storage_configuration.ipynb) - Tile orientation, compression, S3

## Two Modes: Filtering vs Lazy

RadiObject offers two approaches to working with data:

| Mode | Method | Returns | Best For |
|------|--------|---------|----------|
| **Filtering** | `filter()`, `iloc[]`, `head()` | `RadiObject` (view) | Interactive analysis, quick previews |
| **Lazy** | `lazy().filter()...` | `Query` | Transform pipelines, ML data prep |

### When to use which?

**Filtering Mode** - Returns RadiObject views immediately:
```python
# Quick filtering for interactive work
view = radi.filter("age > 40")  # Immediate: can access .FLAIR, .obs_meta right away
vol = view.FLAIR.iloc[0]        # Works immediately
view.materialize("./subset")    # Write to storage
```

**Lazy Mode** - Returns lazy `Query`, for transforms:
```python
# For transforms: build query, apply map(), then materialize
result = (
    radi.lazy()
    .filter("age > 40")
    .select_collections(["FLAIR"])
    .map(normalize_intensity)  # Transform during materialization
    .materialize("./normalized")
)
```

**Rule of thumb:** Use `filter()` for exploration and simple subsetting. Use `lazy()` when you need to apply transforms via `map()`.

In [15]:
# lazy() returns Query - for transform pipelines
q = radi.lazy()
filtered = q.filter("tumor_grade == 'HGG'").select_collections(["FLAIR", "T1w"]).head(3)
print(f"Query: {filtered}")
print(f"Count: {filtered.count()}")

Query: Query(1 subjects, 2 volumes across [T1w, FLAIR])
Count: QueryCount(n_subjects=1, n_volumes={'T1w': 1, 'FLAIR': 1})


In [16]:
# Streaming iteration (memory-efficient)
for vol in filtered.iter_volumes():
    print(f"Volume: {vol.obs_id}, shape: {vol.shape}")

# Batch iteration for ML
for batch in filtered.iter_batches(batch_size=2):
    print(f"\nBatch: {batch.subject_ids}")
    for name, arr in batch.volumes.items():
        print(f"  {name}: {arr.shape}")
    break

Volume: BRATS_002_T1w, shape: (240, 240, 155)
Volume: BRATS_002_FLAIR, shape: (240, 240, 155)



Batch: ('BRATS_002',)
  T1w: (1, 240, 240, 155)
  FLAIR: (1, 240, 240, 155)


In [17]:
# Export query results with streaming
TEMP_DIR_QUERY = tempfile.mkdtemp(prefix="radi_query_")
subset_from_query = filtered.materialize(str(Path(TEMP_DIR_QUERY) / "query_subset"), streaming=True)
print(f"Exported: {subset_from_query}")
shutil.rmtree(TEMP_DIR_QUERY)

Exported: RadiObject(1 subjects, 2 collections: [FLAIR, T1w])
