# Sparse Implementation of Exact OTC

This notebook introduces the module `src/pyotc/otc_backend/policy_iteration/sparse`, which significantly improves the scalability of the `exact_otc` function, particularly in terms of memory efficiency. 

## Problem:
The original `exact_otc` implementation in `src/pyotc/otc_backend/policy_iteration/dense` suffers from severy memory limitations. When comparing two graphs with $n$ nodes, the transition coupling matrix has dimensions $n^4$, making the function infeasible for graphs with more than ~200 nodes due to excessive memory usage.

## Solution:
We address this bottleneck by leveraging the sparsity often present in transition matrices. Recall that a transition coupling $R$ between $P$ and $Q$ must satisfy: 

$$ \sum_{y \in Y} R(x_1, y \mid x_0, y_0) = P(x_1 \mid x_0) \quad \text{ and } \quad \sum_{x \in X} R(x, y_1 \mid x_0, y_0) = Q(y_1 \mid y_0), $$
for every $x_0, x_1 \in X$ and $y_0, y_1 \in Y$.

If either $(x_0, x_1)$ or $(y_0, y_1)$ is not connected in the graph, then $R(x_1, y_1 \mid x_0, y_0) = 0$. This leads to a large number of zero entries, introducing natural sparsity in $R$.

To exploit this, we replace `numpy` arrays with `scipy.sparse` matrices in the implementation. This change reduces memory usage and enables efficient computation, allowing us to handle larger graphs.

## Experimental Results:

We evaluated the maximum graph size that each stationary distribution computation method can handle before running out of memory. Experiments were conducted on a server with 256 GB of RAM, using two randomly generated Stochastic Block Models (SBMs), each with 4 blocks and $n$ nodes.

Dense Version: 
- `best`: $n=200$
- `iterative`, `eigen`: $n=200$
 
Sparse Version
- `best`: $n=280$
- `iterative`, `eigen`: $n=320$

We expect the performance improvement to be even more pronounced for graphs that are sparser than the 4-block SBMs used above. For example, with 10-block SBMs, the sparse version can handle graphs with up to 430 nodes. In contrast, the dense version still caps out at around 200 nodes, as its implementation does not take advantage of sparsity.

## Code Example:

### 1. Import Libraries

In [2]:
import numpy as np
import sys
import os

sys.path.append(os.path.abspath("../src"))

from pyotc.otc_backend.policy_iteration.sparse.exact import exact_otc
from pyotc.otc_backend.graph.utils import adj_to_trans, get_degree_cost
from pyotc.examples.stochastic_block_model import stochastic_block_model

### 2. Load Graphs

In [3]:
# Seed number
np.random.seed(1)

# Generate two stochastic block model graphs with 4 blocks, each containing 10 nodes
m = 10
A1 = stochastic_block_model(
    (m, m, m, m),
    np.array(
        [
            [0.9, 0.1, 0.1, 0.1],
            [0.1, 0.9, 0.1, 0.1],
            [0.1, 0.1, 0.9, 0.1],
            [0.1, 0.1, 0.1, 0.9],
        ]
    ),
)

A2 = stochastic_block_model(
    (m, m, m, m),
    np.array(
        [
            [0.9, 0.1, 0.1, 0.1],
            [0.1, 0.9, 0.1, 0.1],
            [0.1, 0.1, 0.9, 0.1],
            [0.1, 0.1, 0.1, 0.9],
        ]
    ),
)

### 3. Run Sparse `exact_otc`

In [4]:
# Convert adjacency matrices to transition matrices
P1 = adj_to_trans(A1)
P2 = adj_to_trans(A2)

# Define cost function
c = get_degree_cost(A1, A2)

# Run Exact OTC using the sparse implementation
exp_cost, R, stat_dist = exact_otc(P1, P2, c, stat_dist="best")
print("\nExact OTC cost between SBM1 and SBM2:", exp_cost)

Starting exact_otc_sparse...
Iteration: 0
Computing exact TCE...
Computing exact TCI...
Iteration: 1
Computing exact TCE...
Computing exact TCI...
Iteration: 2
Computing exact TCE...
Computing exact TCI...
Iteration: 3
Computing exact TCE...
Computing exact TCI...
Iteration: 4
Computing exact TCE...
Computing exact TCI...
Convergence reached in 5 iterations. Computing stationary distribution...
[exact_otc] Finished. Total time elapsed: 10.326 seconds.

Exact OTC cost between SBM1 and SBM2: 0.9336440288840459
