# Multi-Interval Index Example

This notebook demonstrates how `DimensionInterval` enables automatic cross-slicing between multiple interval types over a shared continuous dimension.

![Diagram of possible sel calls for DimensionInterval](images/generic-intervals.png.excalidraw.png)

When you select a specific word, you want the time and phoneme dimensions to automatically constrain to only the overlapping values. This is exactly what `DimensionInterval` provides.

::::{note}
There are two ways to encode intervals with `DimensionInterval`:
1. **Pandas IntervalIndex** - Used in this notebook, intervals are encoded directly as `pd.IntervalIndex` objects
2. **Onset/Duration format** - Intervals are specified as separate onset and duration coordinates, see the [Onset/Duration Example](onset_duration_example.ipynb)
::::

::::{seealso}
For a comparison of `DimensionInterval` with xarray's built-in `MultiIndex`, see the [MultiIndex Comparison](alt-multiindex.ipynb) notebook.
::::

In [None]:
import numpy as np
import pandas as pd
import xarray as xr

from linked_indices import DimensionInterval


## Use Case: Speech Data

Imagine you have speech data with:
- A **continuous time dimension** (e.g., audio samples at regular intervals)
- **Word intervals** - each word spans a range of time
- **Phoneme intervals** - each phoneme spans a smaller range of time within words


## Creating the Dataset

First, let's create a dataset with:
- 1000 time points from 0 to 120
- 3 words: "run" [0-40), "quickly" [40-80), "home" [80-120)
- 6 phonemes, each spanning 20 time units
- Part of speech labels for each word (verb, adverb, noun)

Note that word intervals are larger and contain multiple phoneme intervals.

This example uses `pd.IntervalIndex` directly. For an alternative approach using onset/duration coordinates, see the [Onset/Duration Example](onset_duration_example.ipynb).

In [None]:
C = 2  # number of channels
N = 1000  # number of time points
times = np.linspace(0, 120, N)

# Word intervals - 3 words covering full time range
word_breaks = [0.0, 40.0, 80.0, 120.0]
word_intervals = pd.IntervalIndex.from_breaks(word_breaks, closed="left")
word_labels = ["run", "quickly", "home"]
word_pos = ["verb", "adverb", "noun"]  # part of speech labels

# Phoneme intervals - 6 phonemes, each 20 units
phoneme_breaks = [0.0, 20.0, 40.0, 60.0, 80.0, 100.0, 120.0]
phoneme_intervals = pd.IntervalIndex.from_breaks(phoneme_breaks, closed="left")
phoneme_labels = ["ah", "ee", "oh", "oo", "eh", "ih"]

data = np.random.rand(C, N)

# Create the dataset
ds = xr.Dataset(
    {"data": (("C", "time"), data)},
    coords={
        "time": times,
        "word_intervals": ("word", word_intervals),
        "word": ("word", word_labels),
        "part_of_speech": ("word", word_pos),
        "phoneme_intervals": ("phoneme", phoneme_intervals),
        "phoneme": ("phoneme", phoneme_labels),
    },
)

ds

## Applying the Multi-Interval Index

Now we apply `DimensionInterval` to link all these coordinates together. This replaces the default indexes with a single custom index that understands the relationships between time, words, and phonemes.

In [None]:
ds = ds.drop_indexes(["time", "word", "phoneme"]).set_xindex(
    [
        "time",
        "word_intervals",
        "phoneme_intervals",
        "word",
        "part_of_speech",
        "phoneme",
    ],
    DimensionInterval,
)

ds

Notice in the `Indexes` section that all coordinates are now grouped under a single `DimensionInterval` index. This means selections on any of these coordinates will automatically propagate constraints to the others.

In [None]:
ds.xindexes

The raw data of the intervals are directly encoded in the dataset so we can trivially retrieve them

In [None]:
ds["word_intervals"]

It's possible to make multiple selections on metadata at the same time

In [None]:
ds.sel(part_of_speech="adverb", phoneme="oh")

## Cross-Slicing: Selecting on Time

When we select a time range, both word and phoneme dimensions are automatically constrained to only include intervals that overlap with that time range.

In [None]:
# Time 30-70 overlaps:
# - word: [0,40), [40,80) -> 2 words
# - phoneme: [20,40), [40,60), [60,80) -> 3 phonemes
ds.sel(time=slice(30, 70))

## Cross-Slicing: Selecting on Word Interval

Selecting a point within a word interval selects that word and constrains time and phonemes accordingly.

In [None]:
# Selecting at time=60 picks the word interval [40,80) which is "quickly"
# This constrains:
# - time to 40-80
# - phoneme to [40,60), [60,80) -> 2 phonemes
ds.sel(word_intervals=60)

## Cross-Slicing: Selecting on Phoneme Interval

Similarly, selecting a phoneme constrains time and words.

In [None]:
# Selecting at time=70 picks phoneme interval [60,80)
# This constrains:
# - time to 60-80
# - word to [40,80) -> 1 word ("quickly")
ds.sel(phoneme_intervals=70)

## `isel`

You can isel on any of the dimension coords. In this case `time`, `word`, `phoneme`

This means you can easily perform meaningful selections such as give me the second word that was spoken.

In [None]:
ds.isel(word=1)

or more complex ones such as the 2nd phoneme of the 3rd word

In [None]:
ds.isel(word=2).isel(phoneme=1)

## Summary

`DimensionInterval` provides:

1. **Automatic cross-slicing** - Select on any dimension and all others are constrained appropriately
2. **Multiple interval types** - Support for multiple interval dimensions (words, phonemes, etc.) over a single continuous dimension
3. **Multiple labels per interval** - Each interval dimension can have multiple label coordinates (e.g., word text and part of speech)
4. **Flexible selection** - Use `sel()` with time values, interval coordinates, or label coordinates

This is particularly useful for:
- Speech/audio data with hierarchical annotations
- Time series with multiple granularities of events
- Any data where intervals at different scales need to stay synchronized