# Storage Configuration

Choose storage settings BEFORE ingestion based on your primary use case:

| Your Workflow | Recommended Setting |
|---------------|---------------------|
| Reviewing slices (radiologist) | `SliceOrientation.AXIAL` |
| Sagittal plane analysis | `SliceOrientation.SAGITTAL` |
| Extracting 3D patches for ML | `SliceOrientation.ISOTROPIC` |
| Archival with max compression | `Compressor.GZIP, level=9` |
| Fast iteration during training | `Compressor.LZ4` |

**These settings are IMMUTABLE after ingestion.** If you need different tile layouts for the same data, you must re-ingest.

This notebook covers:
- **Tile orientation** - Match access patterns for optimal read performance
- **Compression** - Trade-off between size and speed
- **S3 configuration** - Cloud-native storage

**Prerequisites:** [01_radi_object.ipynb](./01_radi_object.ipynb), [03_volume.ipynb](./03_volume.ipynb)

## Why Tile Orientation Matters

TileDB stores data in **tiles** (chunks). When you read a slice, only the overlapping tiles are loaded:

```
AXIAL tiling (1 slice = 1 tile):       ISOTROPIC tiling (64x64x64 cubes):
+---+---+---+                          +-------+-------+
| 1 | 1 | 1 |  <- Z=0                  |       |       |
+---+---+---+                          |   A   |   B   |  <- 8 tiles overlap
| 1 | 1 | 1 |  <- Z=1                  |       |       |     an axial slice
+---+---+---+                          +-------+-------+
Reading Z=77:                          Reading Z=77:
  -> 1 tile read (fast)                  -> 8+ tiles read (slower)
```

**Mismatch = wasted reads.** AXIAL tiling reads 1 tile for axial slices but 155+ tiles for sagittal slices. Choose based on your dominant access pattern.

## Tile Orientations

RadiObject supports four tile orientations optimized for different access patterns:

| Orientation | Tile Shape | Best For |
|-------------|------------|----------|
| `AXIAL` | 240 x 240 x 1 | Slice-by-slice viewing (neuroimaging, CT review) |
| `SAGITTAL` | 1 x 240 x 155 | Sagittal plane analysis |
| `CORONAL` | 240 x 1 x 155 | Coronal plane analysis |
| `ISOTROPIC` | 64 x 64 x 64 | 3D ROI extraction (ML training, tumor analysis) |

```
    AXIAL (XY slices)          SAGITTAL (YZ slices)       ISOTROPIC (64^3 cubes)
    +--------------+           +--------------+           +--------------+
    |==============| Z=0       | ||           |           | +--+--+--+   |
    |==============| Z=1       | ||           | X=0       | +--+--+--+   |
    |==============| Z=2       | ||           | X=1       | +--+--+--+   |
    |      ...     |           | || ...       |           |    ...       |
    |==============| Z=n       | ||           |           | (3D chunks)  |
    +--------------+           +--------------+           +--------------+
    
    Reading Z=77:              Reading X=120:             Reading ROI:
    Reads 1 tile               Reads 1 tile               Reads ~8 tiles
```

In [1]:
import os
import shutil
import tempfile
import time
from pathlib import Path

import numpy as np

from radiobject.ctx import (
    CompressionConfig,
    Compressor,
    ReadConfig,
    S3Config,
    SliceOrientation,
    TileConfig,
    WriteConfig,
    configure,
    get_config,
)
from radiobject.data import S3_REGION, get_brats_uri
from radiobject.radi_object import RadiObject
from radiobject.volume import Volume

BRATS_URI = get_brats_uri()
TEMP_DIR = tempfile.mkdtemp(prefix="storage_tutorial_")
print(f"Working directory: {TEMP_DIR}")

Working directory: /var/folders/dj/0_0s64j55hn0gk7rrvj09zf80000gn/T/storage_tutorial_2xo8l2vv


In [2]:
# Configure S3 if using S3 URI
if BRATS_URI.startswith("s3://"):
    configure(s3=S3Config(region=S3_REGION))

# Load RadiObject from configured URI
radi = RadiObject(BRATS_URI)
print(f"Loaded: {radi}")
print(f"Collections: {radi.collection_names}")

# Quick data access example
vol = radi.FLAIR.iloc[0]
print(f"\nSample volume shape: {vol.shape}")

Loaded: RadiObject(368 subjects, 5 collections: [seg, T2w, FLAIR, T1gd, T1w])
Collections: ('seg', 'T2w', 'FLAIR', 'T1gd', 'T1w')

Sample volume shape: (240, 240, 155)


In [3]:
# Configure for axial slice access (default)
# Write settings affect new arrays only
configure(
    write=WriteConfig(
        tile=TileConfig(orientation=SliceOrientation.AXIAL),
        compression=CompressionConfig(algorithm=Compressor.ZSTD, level=3),
    )
)

# View current configuration
config = get_config()
print(f"Tile orientation: {config.write.tile.orientation}")
print(f"Compression: {config.write.compression.algorithm}, level={config.write.compression.level}")

Tile orientation: SliceOrientation.AXIAL
Compression: Compressor.ZSTD, level=3


In [4]:
# See how tile extents are computed for different orientations
shape = (240, 240, 155)

for orient in SliceOrientation:
    tile_cfg = TileConfig(orientation=orient)
    extents = tile_cfg.extents_for_shape(shape)
    print(f"{orient.value:10s} -> tile extents: {extents}")

axial      -> tile extents: (240, 240, 1)
sagittal   -> tile extents: (1, 240, 155)
coronal    -> tile extents: (240, 1, 155)
isotropic  -> tile extents: (64, 64, 64)


In [5]:
# Create test data
test_data = np.random.randn(240, 240, 155).astype(np.float32)

# Create volumes with different tile orientations
volumes = {}
for orient in [SliceOrientation.AXIAL, SliceOrientation.SAGITTAL, SliceOrientation.ISOTROPIC]:
    configure(write=WriteConfig(tile=TileConfig(orientation=orient)))
    uri = str(Path(TEMP_DIR) / f"vol_{orient.value}")
    volumes[orient.value] = Volume.from_numpy(uri, test_data)
    print(f"Created {orient.value}: {volumes[orient.value]}")

Created axial: Volume(shape=240x240x155, dtype=float32)


Created sagittal: Volume(shape=240x240x155, dtype=float32)


Created isotropic: Volume(shape=240x240x155, dtype=float32)


In [6]:
# Benchmark: read axial slices from each volume
n_reads = 50
results = {}

for name, vol in volumes.items():
    start = time.perf_counter()
    for z in range(0, 150, 3):  # Read every 3rd slice
        _ = vol.axial(z)
    elapsed = time.perf_counter() - start
    results[name] = elapsed
    print(f"{name:10s}: {elapsed*1000:.1f}ms for {n_reads} axial reads")

print(f"\nAxial-tiled is {results['isotropic']/results['axial']:.1f}x faster for axial reads")

axial     : 84.4ms for 50 axial reads


sagittal  : 690.6ms for 50 axial reads


isotropic : 360.0ms for 50 axial reads

Axial-tiled is 4.3x faster for axial reads


In [7]:
# Benchmark: read sagittal slices
results_sag = {}

for name, vol in volumes.items():
    start = time.perf_counter()
    for x in range(0, 240, 5):
        _ = vol.sagittal(x)
    elapsed = time.perf_counter() - start
    results_sag[name] = elapsed
    print(f"{name:10s}: {elapsed*1000:.1f}ms for sagittal reads")

print(
    f"\nSagittal-tiled is {results_sag['axial']/results_sag['sagittal']:.1f}x faster for sagittal reads"
)

axial     : 592.2ms for sagittal reads
sagittal  : 44.0ms for sagittal reads


isotropic : 216.5ms for sagittal reads

Sagittal-tiled is 13.5x faster for sagittal reads


## Fragments & Consolidation

TileDB uses **fragments** for efficient writes. Each write creates a new fragment.

```
    Array on disk:
    +-------------------------------------+
    |  fragment_1/    (initial write)     |
    |  fragment_2/    (update)            |
    |  fragment_3/    (another update)    |
    |  __meta/        (array metadata)    |
    |  __schema       (array schema)      |
    +-------------------------------------+
    
    After consolidation:
    +-------------------------------------+
    |  __consolidated/  (merged data)     |
    |  __meta/                            |
    |  __schema                           |
    +-------------------------------------+
```

**When to consolidate:**
- After bulk ingestion (many writes)
- Before long-term storage
- When read performance degrades

RadiObject's `from_niftis()` creates one fragment per volume, which is efficient. For incremental updates, consider periodic consolidation.

In [8]:
# Configure memory budget (read-time settings)
configure(
    read=ReadConfig(
        memory_budget_mb=1024,  # 1GB memory budget
        concurrency=4,  # 4 parallel I/O threads
    )
)

config = get_config()
print(f"Memory budget: {config.read.memory_budget_mb} MB")
print(f"I/O concurrency: {config.read.concurrency} threads")

Memory budget: 1024 MB
I/O concurrency: 4 threads


```
    Read Operation Flow:
    
    +--------------+    +--------------+    +--------------+
    |   Storage    |--->|  I/O Buffer  |--->|    Result    |
    |  (S3/Local)  |    |  (RAM)       |    |  (NumPy)     |
    +--------------+    +--------------+    +--------------+
           |                   |
           |    Decompression  |
           |    happens here   |
           +-------------------+
    
    Memory budget controls the I/O buffer size.
    Larger buffer = more tiles read in parallel.
```

In [9]:
# ---------------------------------------------------------------------------
# ADVANCED: S3 Configuration
#
# Only needed if storing data on S3. For local storage, skip this section.
# ---------------------------------------------------------------------------
configure(
    s3=S3Config(
        region="us-east-2",
        max_parallel_ops=16,  # Parallel S3 requests for faster reads
        multipart_part_size_mb=50,  # Chunk size for large uploads
    )
)

print("S3 configuration:")
print(f"  Region: {get_config().s3.region}")
print(f"  Max parallel ops: {get_config().s3.max_parallel_ops}")
print(f"  Multipart size: {get_config().s3.multipart_part_size_mb} MB")

S3 configuration:
  Region: us-east-2
  Max parallel ops: 16
  Multipart size: 50 MB


## S3 Usage (Advanced)

RadiObject works identically with local and S3 URIs:

```python
# Local
radi = RadiObject("/data/study")

# S3 (same API!)
radi = RadiObject("s3://bucket/study")
```

**Credentials:** Set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables, or use IAM roles on AWS infrastructure.

In [10]:
# Compare compression sizes
test_data = np.random.randn(120, 120, 60).astype(np.float32)
uncompressed_size = test_data.nbytes

for compressor in [Compressor.NONE, Compressor.LZ4, Compressor.ZSTD, Compressor.GZIP]:
    configure(
        write=WriteConfig(
            compression=CompressionConfig(algorithm=compressor, level=3),
            tile=TileConfig(orientation=SliceOrientation.AXIAL),
        )
    )
    uri = str(Path(TEMP_DIR) / f"vol_{compressor.value}")
    vol = Volume.from_numpy(uri, test_data)

    # Get directory size
    total_size = sum(
        os.path.getsize(os.path.join(dp, f))
        for dp, dn, filenames in os.walk(uri)
        for f in filenames
    )
    ratio = uncompressed_size / total_size if total_size > 0 else 0
    print(f"{compressor.value:6s}: {total_size/1024:.1f} KB (ratio: {ratio:.2f}x)")

none  : 3382.6 KB (ratio: 1.00x)


lz4   : 3396.8 KB (ratio: 0.99x)


zstd  : 3128.4 KB (ratio: 1.08x)


gzip  : 3143.7 KB (ratio: 1.07x)


In [11]:
# Final example: configure for ML training workflow
configure(
    write=WriteConfig(
        tile=TileConfig(orientation=SliceOrientation.ISOTROPIC),  # 64^3 cubes for ROI
        compression=CompressionConfig(algorithm=Compressor.ZSTD, level=3),
    ),
    read=ReadConfig(memory_budget_mb=2048, concurrency=8),  # More memory for training
)

print("Configuration for ML training:")
config = get_config()
print(f"  Tiles: {config.write.tile.orientation.value}")
print(f"  Compression: {config.write.compression.algorithm.value}")
print(f"  Memory: {config.read.memory_budget_mb} MB")

Configuration for ML training:
  Tiles: isotropic
  Compression: zstd
  Memory: 2048 MB


In [12]:
shutil.rmtree(TEMP_DIR)
print(f"Cleaned up: {TEMP_DIR}")

Cleaned up: /var/folders/dj/0_0s64j55hn0gk7rrvj09zf80000gn/T/storage_tutorial_2xo8l2vv


## Summary

| Concept | Key Points |
|---------|------------|
| **Tile Orientation** | Match tiles to your access pattern for best performance |
| **Compression** | ZSTD is default; LZ4 for speed, GZIP for size |
| **Fragments** | Each write creates a fragment; consolidate after bulk ingestion |
| **S3** | Same API as local; configure region and parallelism |
| **Memory** | Increase budget for large datasets or parallel training |
| **Write vs Read** | Write settings (tile, compression) are immutable after creation; Read settings affect all reads |

### Quick Reference

```python
from radiobject.ctx import configure, WriteConfig, ReadConfig, TileConfig, SliceOrientation, CompressionConfig, Compressor

# For slice viewing (write-time setting)
configure(write=WriteConfig(tile=TileConfig(orientation=SliceOrientation.AXIAL)))

# For ML training (write-time setting)
configure(write=WriteConfig(tile=TileConfig(orientation=SliceOrientation.ISOTROPIC)))

# For archival (write-time setting)
configure(write=WriteConfig(compression=CompressionConfig(algorithm=Compressor.GZIP, level=9)))

# Increase memory for large reads (read-time setting)
configure(read=ReadConfig(memory_budget_mb=2048, concurrency=8))
```