
Example script: barcode distances.

This file demonstrates the use of

- :func:`homolipop.distances.bottleneck_distance`
- :func:`homolipop.distances.wasserstein_distance`

on two small barcodes with intervals in dimensions 0 and 1.

# Mathematical conventions

## Intervals and barcodes

A persistence interval is a pair $(b,d)$ with

\begin{align}0 \le b \le d < \infty,\end{align}

representing a feature born at time $b$ and dying at time $d$.
In this file all intervals are finite.

A barcode is a family of multisets of such intervals indexed by dimension.
Formally, for each $n \ge 0$ we have a finite multiset

\begin{align}\mathcal B_n = \{(b_i,d_i)\}_{i=1}^{m_n}.\end{align}

The class :class:`homolipop.barcodes.Barcode` is assumed to store these as a dictionary
``intervals_by_dim`` mapping the integer dimension $n$ to a list of pairs ``(b, d)``.

## Diagonal and matching

Let $\Delta = \{(t,t) \mid t \in \mathbb R\}$ be the diagonal.
Distances between barcodes are defined via partial matchings that are allowed to match
an interval to the diagonal, corresponding to deleting that feature.

For a finite interval $I=(b,d)$, define its distance to the diagonal in the
$L_\infty$ metric by

\begin{align}\mathrm{dist}_\infty(I,\Delta) = \frac{d-b}{2}.\end{align}

For two intervals $I=(b,d)$ and $J=(b',d')$, define

\begin{align}\mathrm{dist}_\infty(I,J) = \max\{|b-b'|,\ |d-d'|\}.\end{align}

## Bottleneck distance

Given two barcodes $\mathcal B$ and $\mathcal B'$ in a fixed dimension,
their bottleneck distance is

\begin{align}d_B(\mathcal B,\mathcal B')
   =
   \inf_{\Gamma}\ \sup_{(I,J)\in\Gamma}\ \mathrm{dist}_\infty(I,J),\end{align}

where $\Gamma$ ranges over all matchings between intervals of $\mathcal B$
and $\mathcal B'$ augmented by allowing matches to the diagonal.
Equivalently, one may view $\Gamma$ as a bijection after adding enough diagonal
copies so both sides have the same cardinality.

## Wasserstein distance

Fix $p \in [1,\infty)$.
The $p$-Wasserstein distance between two barcodes in a fixed dimension is

\begin{align}d_{W,p}(\mathcal B,\mathcal B')
   =
   \left(
     \inf_{\Gamma}
     \sum_{(I,J)\in\Gamma} \mathrm{dist}_\infty(I,J)^p
   \right)^{1/p},\end{align}

with the same matching convention as for the bottleneck distance.

## Across dimensions

This script calls the distance routines with two aggregation conventions.

- ``aggregate="max"`` returns $\max_n d(\mathcal B_n,\mathcal B'_n)$.
- ``aggregate="sum"`` returns $\sum_n d(\mathcal B_n,\mathcal B'_n)$.

If ``dim`` is specified, the function computes the distance in that single dimension only.

## Implementation contract

The functions in :mod:`homolipop.distances` are assumed to implement the above definitions
using the $L_\infty$ interval metric and allowing matches to the diagonal.


In [None]:
from __future__ import annotations

from homolipop.barcodes import Barcode
from homolipop.distances import bottleneck_distance, wasserstein_distance


def main() -> None:
    """
    Construct two barcodes and compute bottleneck and Wasserstein distances.

    The barcodes are defined in dimensions 0 and 1.

    - In dimension 0 there are two finite intervals.
    - In dimension 1 there is one finite interval.

    The script prints

    - bottleneck distance aggregated by max over dimensions
    - Wasserstein distance with ``p=2`` aggregated by sum over dimensions
    - bottleneck distance restricted to dimension 0
    - Wasserstein distance with ``p=2`` restricted to dimension 1

    Notes
    -----
    All intervals in this example are finite. If your implementation supports infinite
    deaths, the distance definitions require an extension of the interval metric and
    matching conventions. This file intentionally avoids that case.
    """
    barcode_a = Barcode(intervals_by_dim={0: [(0.0, 1.0), (2.0, 3.0)], 1: [(0.5, 2.5)]})
    barcode_b = Barcode(intervals_by_dim={0: [(0.1, 1.1), (2.0, 2.9)], 1: [(0.6, 2.6)]})

    print("bottleneck max over dims:", bottleneck_distance(barcode_a, barcode_b, aggregate="max"))
    print(
        "wasserstein p=2 sum over dims:",
        wasserstein_distance(barcode_a, barcode_b, p=2, aggregate="sum"),
    )
    print("bottleneck dim=0:", bottleneck_distance(barcode_a, barcode_b, dim=0))
    print("wasserstein dim=1:", wasserstein_distance(barcode_a, barcode_b, p=2, dim=1))


if __name__ == "__main__":
    main()