# Base Matrix Functionality - Functional Tests

This notebook demonstrates and tests the core matrix functionality:
- **MatrixInstance**: Generic matrix wrapper with flexible metadata
- **MatrixSeries**: Collection of MatrixInstance objects with the same label
- **MatrixFrame**: DataFrame-like structure containing multiple series

---

In [1]:
# Imports
import numpy as np
import scipy.sparse as sp
from canonical_toolkit.base.matrix import MatrixInstance, MatrixSeries, MatrixFrame

## 1. MatrixInstance - Basic Matrix Wrapper

The `MatrixInstance` class wraps a sparse or dense matrix with flexible metadata:
- `matrix`: The actual data (sparse or dense)
- `label`: A label for this matrix (e.g., experiment name, space name)
- `index`: A sortable, hashable index (e.g., timepoint, radius, generation)
- `tags`: Additional metadata as a dictionary

### 1.1 Creating Sparse Matrix Instances

In [2]:
# Create a sparse matrix instance
sparse_mat = sp.random(10, 20, density=0.3, format="csr", random_state=42)

sparse_instance = MatrixInstance(
    matrix=sparse_mat,
    label="experiment_A",
    index=0,
    tags={"condition": "control", "temperature": 37}
)

print(f"Created sparse instance: {sparse_instance.short_description}")
print(f"Shape: {sparse_instance.shape}")
print(f"Tags: {sparse_instance.tags}")
print()
sparse_instance

Created sparse instance: experiment_A [0]
Shape: (10, 20)
Tags: {'condition': 'control', 'temperature': 37}





### 1.2 Creating Dense Matrix Instances

In [3]:
# Create a small dense matrix (similarity-like)
dense_mat = np.array([
    [1.00, 0.85, 0.62, 0.45],
    [0.85, 1.00, 0.73, 0.58],
    [0.62, 0.73, 1.00, 0.81],
    [0.45, 0.58, 0.81, 1.00],
])

dense_instance = MatrixInstance(
    matrix=dense_mat,
    label="similarity_matrix",
    index="gen_100",
    tags={"metric": "cosine", "normalized": True}
)

print(f"Created dense instance: {dense_instance.long_description}")
print()
dense_instance

Created dense instance: similarity_matrix [gen_100] (metric: cosine, normalized: True)





### 1.3 Matrix Indexing Operations

In [4]:
# MatrixInstance supports direct indexing (delegates to underlying matrix)

# Single element access
print("Dense matrix indexing:")
print(f"  Element [0, 2]: {dense_instance[0, 2]}")
print(f"  Row 1: {dense_instance[1]}")
print(f"  Column 2: {dense_instance[:, 2]}")

print("\nSparse matrix indexing:")
print(f"  Element [0, 5]: {sparse_instance[0, 5]}")
print(f"  Row slice [:3]: shape = {sparse_instance[:3].shape}")

Dense matrix indexing:
  Element [0, 2]: 0.62
  Row 1: [0.850 1.000 0.730 0.580]
  Column 2: [0.620 0.730 1.000 0.810]

Sparse matrix indexing:
  Element [0, 5]: 0.0
  Row slice [:3]: shape = (3, 20)


### 1.4 Matrix Addition

In [5]:
# Create two matrices with same index for addition
mat1 = MatrixInstance(
    matrix=sp.random(5, 10, density=0.3, format="csr", random_state=1),
    label="features",
    index=0,
    tags={"type": "A"}
)

mat2 = MatrixInstance(
    matrix=sp.random(5, 10, density=0.3, format="csr", random_state=2),
    label="features",
    index=0,
    tags={"type": "B"}
)

# Add matrices
mat_sum = mat1 + mat2
print(f"Matrix addition successful: {mat_sum.shape}")
print(f"Result index: {mat_sum.index}")
print(f"Result tags: {mat_sum.tags}")

Matrix addition successful: (5, 10)
Result index: 0
Result tags: {'type': 'A'}


### 1.5 Replace Method (Immutable Updates)

In [6]:
# Use replace() to create modified copies
modified = sparse_instance.replace(
    index=1,
    tags={"condition": "experimental", "temperature": 42}
)

print(f"Original: {sparse_instance.short_description}, tags={sparse_instance.tags}")
print(f"Modified: {modified.short_description}, tags={modified.tags}")
print("\nOriginal unchanged (immutable pattern):")
print(f"  Original index: {sparse_instance.index}")

Original: experiment_A [0], tags={'condition': 'control', 'temperature': 37}
Modified: experiment_A [1], tags={'condition': 'experimental', 'temperature': 42}

Original unchanged (immutable pattern):
  Original index: 0


### 1.6 Save and Load

In [7]:
from pathlib import Path
import tempfile

# Save and load a matrix instance
with tempfile.TemporaryDirectory() as tmpdir:
    save_path = Path(tmpdir)
    
    # Save
    sparse_instance.save(save_path, "test_matrix")
    print(f"Saved to: {save_path}")
    print(f"Files created: {list(save_path.glob('test_matrix*'))}")
    
    # Load
    loaded = MatrixInstance.load(save_path, "test_matrix")
    print(f"\nLoaded: {loaded.short_description}")
    print(f"Shape matches: {loaded.shape == sparse_instance.shape}")
    print(f"Tags match: {loaded.tags == sparse_instance.tags}")

Saved to: /tmp/tmpzuhie94u
Files created: [PosixPath('/tmp/tmpzuhie94u/test_matrix.json'), PosixPath('/tmp/tmpzuhie94u/test_matrix.sparse.npz')]

Loaded: experiment_A [0]
Shape matches: True
Tags match: True


---
## 2. MatrixSeries - Collection of Instances

The `MatrixSeries` class manages a collection of `MatrixInstance` objects that share the same label.
Think of it like a time series of matrices, or a sequence indexed by some parameter.

### 2.1 Creating a Series from Instance List

In [8]:
# Create a series of matrices with same label but different indices
instances_list = []
for idx in range(5):
    mat = sp.random(10, 50, density=0.2, format="csr", random_state=idx)
    inst = MatrixInstance(
        matrix=mat,
        label="experiment_A",
        index=idx,
        tags={"generation": idx * 10}
    )
    instances_list.append(inst)

series = MatrixSeries(instances_list=instances_list)

print(f"Created series with {len(series.indices)} instances")
print(f"Label: {series.label}")
print(f"Indices: {series.indices}")
print()
series

Created series with 5 instances
Label: experiment_A
Indices: [0, 1, 2, 3, 4]





### 2.2 Accessing Series Elements

In [9]:
# Access by index
instance_2 = series[2]
print(f"Instance at index 2: {instance_2.short_description}")
print(f"Shape: {instance_2.shape}")
print()
instance_2

Instance at index 2: experiment_A [2]
Shape: (10, 50)





### 2.3 Series Slicing

In [10]:
# Slice series to get subset
sliced_series = series[:3]
print(f"Original series indices: {series.indices}")
print(f"Sliced series[:3] indices: {sliced_series.indices}")
print(f"Sliced series label: {sliced_series.label}")
print()
sliced_series

Original series indices: [0, 1, 2, 3, 4]
Sliced series[:3] indices: [0, 1, 2]
Sliced series label: experiment_A





### 2.4 Series Properties

In [11]:
# Access various properties
print(f"Label: {series.label}")
print(f"Indices: {series.indices}")
print(f"Number of matrices: {len(series.matrices)}")
print(f"Labels list (one per instance): {series.labels}")
print(f"First matrix shape: {series.matrices[0].shape}")

Label: experiment_A
Indices: [0, 1, 2, 3, 4]
Number of matrices: 5
Labels list (one per instance): ['experiment_A', 'experiment_A', 'experiment_A', 'experiment_A', 'experiment_A']
First matrix shape: (10, 50)


### 2.5 Series Iteration

In [12]:
# Iterate over indices
print("Iterating over series:")
for idx in series:
    inst = series[idx]
    print(f"  Index {idx}: {inst.shape}, tags={inst.tags}")

Iterating over series:
  Index 0: (10, 50), tags={'generation': 0}
  Index 1: (10, 50), tags={'generation': 10}
  Index 2: (10, 50), tags={'generation': 20}
  Index 3: (10, 50), tags={'generation': 30}
  Index 4: (10, 50), tags={'generation': 40}


### 2.6 Map Operations on Series

In [13]:
# Apply a function to all instances in the series
def add_processing_tag(inst: MatrixInstance) -> MatrixInstance:
    """Add a 'processed' tag to each instance."""
    new_tags = inst.tags.copy()
    new_tags["processed"] = True
    return inst.replace(tags=new_tags)

# Map with inplace=False (creates new series)
processed_series = series.map(add_processing_tag, inplace=False)

print("Original series instance 0 tags:", series[0].tags)
print("Processed series instance 0 tags:", processed_series[0].tags)

Original series instance 0 tags: {'generation': 0}
Processed series instance 0 tags: {'generation': 0, 'processed': True}


### 2.7 Series Addition

In [14]:
# Create two series with overlapping indices
series_a_list = []
series_b_list = []

for idx in range(3):
    mat_a = MatrixInstance(
        matrix=sp.random(5, 10, density=0.3, format="csr", random_state=idx),
        label="data",
        index=idx,
        tags={"type": "A"}
    )
    mat_b = MatrixInstance(
        matrix=sp.random(5, 10, density=0.3, format="csr", random_state=idx+10),
        label="data",
        index=idx,
        tags={"type": "B"}
    )
    series_a_list.append(mat_a)
    series_b_list.append(mat_b)

series_a = MatrixSeries(instances_list=series_a_list)
series_b = MatrixSeries(instances_list=series_b_list)

# Add series element-wise
series_sum = series_a + series_b
print(f"Series A indices: {series_a.indices}")
print(f"Series B indices: {series_b.indices}")
print(f"Sum series indices: {series_sum.indices}")
print(f"Sum preserves matching indices only")

Series A indices: [0, 1, 2]
Series B indices: [0, 1, 2]
Sum series indices: [0, 1, 2]
Sum preserves matching indices only


---
## 3. MatrixFrame - DataFrame-like Container

The `MatrixFrame` class provides a high-level interface for managing multiple `MatrixSeries` objects.
Think of it like a DataFrame where columns are series and rows are indexed by the instance indices.

### 3.1 Creating a Frame from Multiple Series

In [15]:
# Create multiple series with different labels
def create_series(label: str, n_instances: int, seed_offset: int) -> MatrixSeries:
    """Helper to create a series."""
    instances = []
    for idx in range(n_instances):
        mat = sp.random(8, 20, density=0.25, format="csr", random_state=seed_offset+idx)
        inst = MatrixInstance(
            matrix=mat,
            label=label,
            index=idx,
            tags={"experiment": label, "timestep": idx}
        )
        instances.append(inst)
    return MatrixSeries(instances_list=instances)

# Create three different series
series_x = create_series("experiment_X", n_instances=5, seed_offset=0)
series_y = create_series("experiment_Y", n_instances=5, seed_offset=100)
series_z = create_series("experiment_Z", n_instances=5, seed_offset=200)

# Create frame
frame = MatrixFrame(series=[series_x, series_y, series_z])

print(f"Created frame with {len(list(frame.keys()))} series")
print(f"Series labels: {list(frame.keys())}")
print()
frame

Created frame with 3 series
Series labels: ['experiment_X', 'experiment_Y', 'experiment_Z']





### 3.2 Accessing Series by Label

In [16]:
# Access a single series by label
series_x_retrieved = frame["experiment_X"]
print(f"Retrieved series: {series_x_retrieved.label}")
print(f"Number of instances: {len(series_x_retrieved.indices)}")
print()
series_x_retrieved

Retrieved series: experiment_X
Number of instances: 5





### 3.3 Frame Slicing by Index

In [17]:
# Slice frame by index (applies to all series)
sliced_frame = frame[:3]
print(f"Original frame indices per series: {frame.indices[0]}")
print(f"Sliced frame[:3] indices per series: {sliced_frame.indices[0]}")
print()
sliced_frame

Original frame indices per series: [0, 1, 2, 3, 4]
Sliced frame[:3] indices per series: [0, 1, 2]





### 3.4 Multi-Series Selection

In [18]:
# Select specific series by label list
selected_frame = frame[["experiment_X", "experiment_Z"]]
print(f"Original frame series: {list(frame.keys())}")
print(f"Selected frame series: {list(selected_frame.keys())}")
print()
selected_frame

Original frame series: ['experiment_X', 'experiment_Y', 'experiment_Z']
Selected frame series: ['experiment_X', 'experiment_Z']





### 3.5 2D Slicing with .loc

In [19]:
# Use .loc for 2D slicing (index, label)
print("2D slicing examples:\n")

# Get indices 0-2 of a single series
result1 = frame.loc[:2, "experiment_X"]
print(f"frame.loc[:2, 'experiment_X']:")
print(f"  Type: {type(result1).__name__}")
print(f"  Indices: {result1.indices}")
print()

# Get all indices of multiple series
result2 = frame.loc[:, ["experiment_X", "experiment_Y"]]
print(f"frame.loc[:, ['experiment_X', 'experiment_Y']]:")
print(f"  Type: {type(result2).__name__}")
print(f"  Series: {list(result2.keys())}")
print()

# Get single instance
result3 = frame.loc[2, "experiment_X"]
print(f"frame.loc[2, 'experiment_X']:")
print(f"  Type: {type(result3).__name__}")
print(f"  Description: {result3.short_description}")

2D slicing examples:

frame.loc[:2, 'experiment_X']:
  Type: MatrixSeries
  Indices: [0, 1]

frame.loc[:, ['experiment_X', 'experiment_Y']]:
  Type: MatrixFrame
  Series: ['experiment_X', 'experiment_Y']

frame.loc[2, 'experiment_X']:
  Type: MatrixInstance
  Description: experiment_X [2]


### 3.6 Frame Properties

In [20]:
# Access frame properties
print(f"Frame keys (series labels): {list(frame.keys())}")
print(f"Frame series (list): {len(frame.series)} series")
print(f"Frame matrices (2D list): {len(frame.matrices)} series x {len(frame.matrices[0])} instances")
print(f"Frame indices (2D list): {frame.indices[0]}")
print(f"Frame labels (2D list): {frame.labels[0][:3]}...")

Frame keys (series labels): ['experiment_X', 'experiment_Y', 'experiment_Z']
Frame series (list): 3 series
Frame matrices (2D list): 3 series x 5 instances
Frame indices (2D list): [0, 1, 2, 3, 4]
Frame labels (2D list): ['experiment_X', 'experiment_X', 'experiment_X']...


### 3.7 Map Operations on Frame

In [21]:
# Apply function to all series in frame
def add_series_id_tag(series: MatrixSeries) -> MatrixSeries:
    """Add series label as a tag to each instance."""
    def add_tag(inst: MatrixInstance) -> MatrixInstance:
        new_tags = inst.tags.copy()
        new_tags["series_label"] = series.label
        return inst.replace(tags=new_tags)
    return series.map(add_tag, inplace=False)

tagged_frame = frame.map(add_series_id_tag, inplace=False)

print("Original frame instance tags:")
print(f"  {frame['experiment_X'][0].tags}")
print("\nTagged frame instance tags:")
print(f"  {tagged_frame['experiment_X'][0].tags}")

Original frame instance tags:
  {'experiment': 'experiment_X', 'timestep': 0}

Tagged frame instance tags:
  {'experiment': 'experiment_X', 'timestep': 0, 'series_label': 'experiment_X'}


### 3.8 Frame Save and Load

In [22]:
# Save and load entire frame
with tempfile.TemporaryDirectory() as tmpdir:
    save_path = Path(tmpdir) / "test_frame"
    
    # Save
    frame.save(save_path, tag="v1")
    print(f"Saved frame to: {save_path}")
    print(f"Files created: {len(list(save_path.glob('*')))} files")
    
    # Load
    loaded_frame = MatrixFrame.load(save_path, tag="v1")
    print(f"\nLoaded frame series: {list(loaded_frame.keys())}")
    print(f"Loaded frame matches original: {list(loaded_frame.keys()) == list(frame.keys())}")
    print()
    loaded_frame

Saved frame to: /tmp/tmpii3hlzh2/test_frame
Files created: 31 files

Loaded frame series: ['experiment_Z', 'experiment_X', 'experiment_Y']
Loaded frame matches original: False



### 3.9 Frame Description

In [23]:
# Get auto-generated frame description (useful for auto-naming)
print(f"Frame description: {frame.description}")
print("\nThis description is automatically generated based on:")
print("  - Number of series")
print("  - Number of unique indices")
print("  - Series labels (up to 2, then 'etc')")

Frame description: frame_3s_5i_experiment_x_experiment_y_etc

This description is automatically generated based on:
  - Number of series
  - Number of unique indices
  - Series labels (up to 2, then 'etc')


---
## Summary

This notebook demonstrated the core functionality of the base matrix classes:

### MatrixInstance
- Wraps sparse/dense matrices with flexible metadata
- Supports indexing, addition, replace, save/load
- Immutable design pattern for safety

### MatrixSeries
- Collection of instances with same label
- Supports slicing, iteration, map operations
- Element-wise operations at matching indices

### MatrixFrame
- DataFrame-like container for multiple series
- Pandas-style .loc indexing for 2D access
- Batch operations via map
- Persistent storage with save/load

All classes follow consistent patterns:
- Rich pretty-printing for interactive use
- Immutable operations via `.replace()`
- Functional programming style with `.map()`
- Support for both sparse and dense matrices