# Similarity Matrix Functionality - Functional Tests

This notebook demonstrates and tests the similarity-specific matrix functionality:
- **SimilarityMatrix**: Specialized matrix for morphological similarity analysis with typed attributes (space, radius, domain)
- **SimilaritySeries**: Collection of SimilarityMatrix objects across radii
- **SimilarityFrame**: DataFrame-like structure for multiple body parts/spaces

These classes extend the base matrix classes with domain-specific operations like:
- Cosine similarity computation
- Radius normalization
- Aggregation across radii and spaces
- UMAP embeddings

---

In [1]:
# Imports
import numpy as np
import scipy.sparse as sp
from canonical_toolkit.morphology.similarity.sim_matrix import (
    SimilarityMatrix,
    SimilaritySeries,
    SimilarityFrame
)
from canonical_toolkit.morphology.similarity.options import VectorSpace, MatrixDomain

## 1. SimilarityMatrix - Typed Matrix Wrapper

The `SimilarityMatrix` class extends `MatrixInstance` with morphological similarity-specific attributes:
- `space`: VectorSpace enum (e.g., FRONT, BACK, LEFT, RIGHT)
- `radius`: Integer radius for neighborhood analysis
- `domain`: MatrixDomain enum (FEATURES, SIMILARITY, EMBEDDING)
- `tags`: Additional metadata

It also provides domain-specific methods like `cosine_similarity()`, `normalize_by_radius()`, and `umap_embed()`.

### 1.1 Creating Feature Matrices

In [2]:
# Create a sparse feature matrix (robots × features)
feature_mat = sp.random(10, 50, density=0.3, format="csr", random_state=42)

feature_matrix = SimilarityMatrix(
    matrix=feature_mat,
    space=VectorSpace.FRONT,
    radius=2,
    domain=MatrixDomain.FEATURES
)

print(f"Created feature matrix: {feature_matrix.short_description}")
print(f"Space: {feature_matrix.space}")
print(f"Radius: {feature_matrix.radius}")
print(f"Domain: {feature_matrix.domain}")
print(f"Shape: {feature_matrix.shape}")
print()
feature_matrix

Created feature matrix: FRONT [2]
Space: FRONT
Radius: 2
Domain: FEATURES
Shape: (10, 50)





### 1.2 Cosine Similarity Computation

In [3]:
# Compute cosine similarity (converts FEATURES → SIMILARITY)
similarity_matrix = feature_matrix.cosine_similarity()

print(f"Similarity matrix: {similarity_matrix.short_description}")
print(f"Domain changed: {feature_matrix.domain} → {similarity_matrix.domain}")
print(f"Shape changed: {feature_matrix.shape} → {similarity_matrix.shape}")
print(f"Storage changed: sparse → dense")
print()
similarity_matrix

Similarity matrix: FRONT [2]
Domain changed: FEATURES → SIMILARITY
Shape changed: (10, 50) → (10, 10)
Storage changed: sparse → dense





### 1.3 Radius Normalization

In [4]:
# Normalize by radius (divides by radius + 1)
normalized = similarity_matrix.normalize_by_radius()

print(f"Original values (sample):")
print(f"  {similarity_matrix.matrix[0, 1]:.4f}")
print(f"\nNormalized values (divided by {similarity_matrix.radius + 1}):")
print(f"  {normalized.matrix[0, 1]:.4f}")
print(f"\nVerification: {similarity_matrix.matrix[0, 1]:.4f} / {similarity_matrix.radius + 1} = {normalized.matrix[0, 1]:.4f}")

Original values (sample):
  0.1282

Normalized values (divided by 3):
  0.0427

Verification: 0.1282 / 3 = 0.0427


### 1.4 Similarity Scoring - Mean Scores (Normalized)

The `sum_to_rows()` method aggregates similarity scores per row. By default, it returns **mean scores** (normalized by population length) rather than raw sums.

In [5]:
# Mean similarity scores per robot (normalized by default)
mean_scores = similarity_matrix.sum_to_rows(zero_diagonal=True)

print(f"Mean similarity scores per robot (excluding self-similarity):")
print(f"  Shape: {mean_scores.shape}")
print(f"  Mean score: {mean_scores.mean():.3f}")
print(f"  Min score: {mean_scores.min():.3f}")
print(f"  Max score: {mean_scores.max():.3f}")
print()
print(f"Scores: {mean_scores}")
print(f"\nThese are AVERAGES (sum divided by {similarity_matrix.shape[0] - 1} neighbors)")

Mean similarity scores per robot (excluding self-similarity):
  Shape: (10,)
  Mean score: 0.230
  Min score: 0.198
  Max score: 0.267

Scores: [0.202 0.198 0.244 0.256 0.215 0.267 0.230 0.256 0.214 0.220]

These are AVERAGES (sum divided by 9 neighbors)


### 1.4b Similarity Scoring - Raw Sums (Unnormalized)

In [6]:
# Raw sum of similarity scores (unnormalized)
sum_scores = similarity_matrix.sum_to_rows(zero_diagonal=True, normalise_by_pop_len=False)

print(f"Raw sum similarity scores per robot:")
print(f"  Mean: {sum_scores.mean():.3f}")
print(f"  Scores: {sum_scores}")
print(f"\nThese are RAW SUMS (not averaged)")
print(f"\nVerification: mean_scores * n_neighbors = sum_scores")
n_neighbors = similarity_matrix.shape[0] - 1  # -1 for zero_diagonal
print(f"  {mean_scores[0]:.4f} * {n_neighbors} = {mean_scores[0] * n_neighbors:.4f}")
print(f"  sum_scores[0] = {sum_scores[0]:.4f}")
print(f"  Match: {np.isclose(mean_scores[0] * n_neighbors, sum_scores[0])}")

Raw sum similarity scores per robot:
  Mean: 2.071
  Scores: [1.816 1.782 2.194 2.300 1.934 2.404 2.068 2.303 1.926 1.981]

These are RAW SUMS (not averaged)

Verification: mean_scores * n_neighbors = sum_scores
  0.2018 * 9 = 1.8161
  sum_scores[0] = 1.8161
  Match: True


### 1.5 Top-K and Bottom-K Similarity Scoring

In [7]:
# Get mean of top-3 most similar neighbors per robot
top3_mean = similarity_matrix.sum_to_rows(k=3, largest=True, zero_diagonal=True)

print(f"Top-3 mean similarity scores (normalized):")
print(f"  Overall mean: {top3_mean.mean():.3f}")
print(f"  Scores: {top3_mean}")
print(f"  These are AVERAGES of top 3 neighbors (sum / 3)")

# Get sum of top-3 (unnormalized)
top3_sum = similarity_matrix.sum_to_rows(k=3, largest=True, zero_diagonal=True, normalise_by_pop_len=False)
print(f"\nTop-3 sum similarity scores (unnormalized):")
print(f"  Overall mean: {top3_sum.mean():.3f}")
print(f"  Scores: {top3_sum}")
print(f"\nVerification: top3_mean * 3 = top3_sum")
print(f"  {top3_mean[0]:.4f} * 3 = {top3_mean[0] * 3:.4f}")
print(f"  top3_sum[0] = {top3_sum[0]:.4f}")
print(f"  Match: {np.isclose(top3_mean[0] * 3, top3_sum[0])}")

Top-3 mean similarity scores (normalized):
  Overall mean: 0.355
  Scores: [0.315 0.278 0.404 0.334 0.320 0.372 0.421 0.397 0.328 0.386]
  These are AVERAGES of top 3 neighbors (sum / 3)

Top-3 sum similarity scores (unnormalized):
  Overall mean: 1.066
  Scores: [0.946 0.833 1.211 1.001 0.960 1.117 1.262 1.190 0.983 1.159]

Verification: top3_mean * 3 = top3_sum
  0.3155 * 3 = 0.9465
  top3_sum[0] = 0.9465
  Match: True


In [8]:
# Get bottom-3 least similar neighbors
bottom3_mean = similarity_matrix.sum_to_rows(k=3, largest=False, zero_diagonal=True)
print(f"Bottom-3 mean similarity scores:")
print(f"  Overall mean: {bottom3_mean.mean():.3f}")
print(f"  Scores: {bottom3_mean}")
print(f"\nComparison:")
print(f"  Top-3 mean: {top3_mean.mean():.3f} (most similar neighbors)")
print(f"  Bottom-3 mean: {bottom3_mean.mean():.3f} (least similar neighbors)")
print(f"  All neighbors mean: {mean_scores.mean():.3f}")

Bottom-3 mean similarity scores:
  Overall mean: 0.066
  Scores: [0.053 0.049 0.048 0.101 0.066 0.123 0.046 0.070 0.060 0.041]

Comparison:
  Top-3 mean: 0.355 (most similar neighbors)
  Bottom-3 mean: 0.066 (least similar neighbors)
  All neighbors mean: 0.230


### 1.6 UMAP Embeddings

In [9]:
# Compute UMAP embeddings from features
embedding_from_features = feature_matrix.umap_embed(
    n_neighbors=5,
    n_components=2,
    random_state=42
)

print(f"Embedding from features: {embedding_from_features.short_description}")
print(f"Domain: {embedding_from_features.domain}")
print(f"Shape: {embedding_from_features.shape}")
print()
embedding_from_features

Embedding from features: FRONT [2]
Domain: EMBEDDING
Shape: (10, 2)





In [10]:
# Can also compute UMAP from similarity matrix (uses precomputed metric)
embedding_from_sim = similarity_matrix.umap_embed(
    n_neighbors=5,
    n_components=2,
    random_state=42
)

print(f"Embedding from similarity: {embedding_from_sim.short_description}")
print(f"Domain: {embedding_from_sim.domain}")
print(f"Shape: {embedding_from_sim.shape}")

  warn("using precomputed metric; inverse_transform will be unavailable")


Embedding from similarity: FRONT [2]
Domain: EMBEDDING
Shape: (10, 2)


### 1.7 Replace Method with Typed Parameters

In [11]:
# Replace supports both generic (label, index) and specific (space, radius) keys
modified = feature_matrix.replace(
    space=VectorSpace.BACK,  # Specific key
    radius=5,                 # Specific key
)

print(f"Original: {feature_matrix.short_description}")
print(f"  Space: {feature_matrix.space}, Radius: {feature_matrix.radius}")
print(f"\nModified: {modified.short_description}")
print(f"  Space: {modified.space}, Radius: {modified.radius}")

Original: FRONT [2]
  Space: FRONT, Radius: 2

Modified: BACK [5]
  Space: BACK, Radius: 5


---
## 2. SimilaritySeries - Collection Across Radii

The `SimilaritySeries` class manages a collection of `SimilarityMatrix` objects across different radii.
All instances in a series share the same space but have different radii (0, 1, 2, ...).

### 2.1 Creating a Feature Series

In [12]:
# Create a series of feature matrices across radii
feature_instances = []
for radius in range(5):
    mat = sp.random(10, 50, density=0.3, format="csr", random_state=radius)
    inst = SimilarityMatrix(
        matrix=mat,
        space=VectorSpace.FRONT,
        radius=radius,
        domain=MatrixDomain.FEATURES
    )
    feature_instances.append(inst)

feature_series = SimilaritySeries(instances_list=feature_instances)

print(f"Created feature series: {feature_series.label}")
print(f"Space: {feature_series.space}")
print(f"Radii: {feature_series.radii}")
print(f"Number of instances: {len(feature_series.indices)}")
print()
feature_series

Created feature series: FRONT
Space: FRONT
Radii: [0, 1, 2, 3, 4]
Number of instances: 5





### 2.2 Series-wide Cosine Similarity

In [13]:
# Compute cosine similarity for all radii at once
similarity_series = feature_series.cosine_similarity(inplace=False)

print(f"Converted feature series to similarity series:")
print(f"  Original domain: {feature_series[0].domain}")
print(f"  New domain: {similarity_series[0].domain}")
print(f"  Original shape: {feature_series[0].shape}")
print(f"  New shape: {similarity_series[0].shape}")
print()
similarity_series

Converted feature series to similarity series:
  Original domain: FEATURES
  New domain: SIMILARITY
  Original shape: (10, 50)
  New shape: (10, 10)





### 2.3 Series-wide Normalization

In [14]:
# Normalize all instances by their respective radii
normalized_series = similarity_series.normalize_by_radius(inplace=False)

print("Radius normalization across series:")
for r in [0, 1, 2]:
    orig_val = similarity_series[r].matrix[0, 1]
    norm_val = normalized_series[r].matrix[0, 1]
    print(f"  Radius {r}: {orig_val:.4f} / {r+1} = {norm_val:.4f}")

Radius normalization across series:
  Radius 0: 0.4599 / 1 = 0.4599
  Radius 1: 0.2410 / 2 = 0.1205
  Radius 2: 0.3288 / 3 = 0.1096


### 2.4 Cumulative Series

In [15]:
# Create cumulative sum across radii
cumulative_series = normalized_series.to_cumulative(inplace=False)

print("Cumulative series:")
print(f"  Radius 0: {cumulative_series[0].matrix[0, 1]:.4f}")
print(f"  Radius 1: {cumulative_series[1].matrix[0, 1]:.4f} (sum of 0+1)")
print(f"  Radius 2: {cumulative_series[2].matrix[0, 1]:.4f} (sum of 0+1+2)")
print()
print("Verification:")
expected = normalized_series[0].matrix[0, 1] + normalized_series[1].matrix[0, 1]
actual = cumulative_series[1].matrix[0, 1]
print(f"  Expected (0+1): {expected:.4f}")
print(f"  Actual: {actual:.4f}")
print(f"  Match: {np.isclose(expected, actual)}")

Cumulative series:
  Radius 0: 0.4599
  Radius 1: 0.5804 (sum of 0+1)
  Radius 2: 0.6900 (sum of 0+1+2)

Verification:
  Expected (0+1): 0.5804
  Actual: 0.5804
  Match: True


### 2.5 Aggregating Across Radii

In [16]:
# Aggregate feature matrices across all radii (collapses radius dimension)
aggregated_features = feature_series.aggregate()

print(f"Aggregated features: {aggregated_features.short_description}")
print(f"Original space: {feature_series.space}")
print(f"Aggregated space: {aggregated_features.space}")
print(f"Shape: {aggregated_features.shape}")
print(f"Domain: {aggregated_features.domain}")
print()
aggregated_features

Aggregated features: AGGREGATED [4]
Original space: FRONT
Aggregated space: AGGREGATED
Shape: (10, 50)
Domain: FEATURES





### 2.6 Chaining Operations

In [17]:
# Chain multiple operations: cosine → normalize → cumulative
chained_result = (
    feature_series
    .cosine_similarity(inplace=False)
    .normalize_by_radius(inplace=False)
    .to_cumulative(inplace=False)
)

print(f"Chained operations result:")
print(f"  Space: {chained_result.space}")
print(f"  Domain: {chained_result[0].domain}")
print(f"  Radii: {chained_result.radii}")
print()
print(f"Final cumulative value at radius 4:")
print(f"  {chained_result[4].matrix[0, 1]:.4f}")

Chained operations result:
  Space: FRONT
  Domain: SIMILARITY
  Radii: [0, 1, 2, 3, 4]

Final cumulative value at radius 4:
  0.7723


### 2.7 Series Slicing and Properties

In [18]:
# Slice series to get subset of radii
sliced_series = feature_series[:3]

print(f"Original series radii: {feature_series.radii}")
print(f"Sliced series[:3] radii: {sliced_series.radii}")
print(f"\nSeries properties:")
print(f"  Space: {sliced_series.space}")
print(f"  Label: {sliced_series.label}")
print(f"  Indices (alias for radii): {sliced_series.indices}")
print()
sliced_series

Original series radii: [0, 1, 2, 3, 4]
Sliced series[:3] radii: [0, 1, 2]

Series properties:
  Space: FRONT
  Label: FRONT
  Indices (alias for radii): [0, 1, 2]





---
## 3. SimilarityFrame - Multi-Space Container

The `SimilarityFrame` class provides a high-level interface for managing multiple `SimilaritySeries` objects,
typically representing different body parts or morphological spaces (FRONT, BACK, LEFT, RIGHT, etc.).

### 3.1 Creating a Multi-Space Frame

In [19]:
# Create series for different body parts
def create_feature_series(space: VectorSpace, n_radii: int, seed_offset: int) -> SimilaritySeries:
    """Helper to create a feature series for a given space."""
    instances = []
    for r in range(n_radii):
        mat = sp.random(10, 50, density=0.3, format="csr", random_state=seed_offset+r)
        inst = SimilarityMatrix(
            matrix=mat,
            space=space,
            radius=r,
            domain=MatrixDomain.FEATURES
        )
        instances.append(inst)
    return SimilaritySeries(instances_list=instances)

# Create frame with multiple body parts
front_series = create_feature_series(VectorSpace.FRONT, n_radii=5, seed_offset=0)
back_series = create_feature_series(VectorSpace.BACK, n_radii=5, seed_offset=100)
left_series = create_feature_series(VectorSpace.LEFT, n_radii=5, seed_offset=200)

frame = SimilarityFrame(series=[front_series, back_series, left_series])

print(f"Created frame with {len(list(frame.keys()))} spaces")
print(f"Spaces: {list(frame.keys())}")
print()
frame

Created frame with 3 spaces
Spaces: ['FRONT', 'BACK', 'LEFT']





### 3.2 Accessing Series by Space

In [20]:
# Access individual series by space name
front_retrieved = frame["FRONT"]

print(f"Retrieved series: {front_retrieved.label}")
print(f"Space: {front_retrieved.space}")
print(f"Radii: {front_retrieved.radii}")
print()
front_retrieved

Retrieved series: FRONT
Space: FRONT
Radii: [0, 1, 2, 3, 4]





### 3.3 Frame Slicing by Radius

In [21]:
# Slice frame by radius (applies to all spaces)
sliced_frame = frame[:3]

print(f"Original frame radii: {frame.indices[0]}")
print(f"Sliced frame[:3] radii: {sliced_frame.indices[0]}")
print()
sliced_frame

Original frame radii: [0, 1, 2, 3, 4]
Sliced frame[:3] radii: [0, 1, 2]





### 3.4 Multi-Space Selection

In [22]:
# Select specific spaces
selected_frame = frame[["FRONT", "LEFT"]]

print(f"Original frame spaces: {list(frame.keys())}")
print(f"Selected frame spaces: {list(selected_frame.keys())}")
print()
selected_frame

Original frame spaces: ['FRONT', 'BACK', 'LEFT']
Selected frame spaces: ['FRONT', 'LEFT']





### 3.5 2D Access with .loc

In [23]:
# Use .loc for pandas-style 2D indexing (radius, space)
print("2D access examples:\n")

# Get radii 0-2 of FRONT space
result1 = frame.loc[:2, "FRONT"]
print(f"frame.loc[:2, 'FRONT']:")
print(f"  Type: {type(result1).__name__}")
print(f"  Radii: {result1.radii}")
print()

# Get all radii of multiple spaces
result2 = frame.loc[:, ["FRONT", "BACK"]]
print(f"frame.loc[:, ['FRONT', 'BACK']]:")
print(f"  Type: {type(result2).__name__}")
print(f"  Spaces: {list(result2.keys())}")
print()

# Get single instance at radius 2, FRONT space
result3 = frame.loc[2, "FRONT"]
print(f"frame.loc[2, 'FRONT']:")
print(f"  Type: {type(result3).__name__}")
print(f"  Description: {result3.short_description}")

2D access examples:

frame.loc[:2, 'FRONT']:
  Type: SimilaritySeries
  Radii: [0, 1]

frame.loc[:, ['FRONT', 'BACK']]:
  Type: SimilarityFrame
  Spaces: ['FRONT', 'BACK']

frame.loc[2, 'FRONT']:
  Type: SimilarityMatrix
  Description: FRONT [2]


### 3.6 Frame-wide Operations

In [24]:
# Apply cosine similarity to all series in frame
similarity_frame = frame.map(
    lambda series: series.cosine_similarity(inplace=False),
    inplace=False
)

print(f"Frame-wide cosine similarity:")
print(f"  Original domain: {frame['FRONT'][0].domain}")
print(f"  New domain: {similarity_frame['FRONT'][0].domain}")
print()
similarity_frame

Frame-wide cosine similarity:
  Original domain: FEATURES
  New domain: SIMILARITY





### 3.7 Aggregating Across Spaces

In [25]:
# Aggregate frame: sum across all spaces at each radius
aggregated_series = frame.aggregate()

print(f"Aggregated across spaces:")
print(f"  Original spaces: {list(frame.keys())}")
print(f"  Aggregated space: {aggregated_series.space}")
print(f"  Radii: {aggregated_series.radii}")
print(f"  Domain: {aggregated_series[0].domain}")
print()
print(f"Each radius contains sum of FRONT + BACK + LEFT features")
print()
aggregated_series

Aggregated across spaces:
  Original spaces: ['FRONT', 'BACK', 'LEFT']
  Aggregated space: AGGREGATED
  Radii: [0, 1, 2, 3, 4]
  Domain: FEATURES

Each radius contains sum of FRONT + BACK + LEFT features





### 3.8 Full Pipeline: Features → Similarity → Normalized

In [26]:
# Complete analysis pipeline
def analysis_pipeline(series: SimilaritySeries) -> SimilaritySeries:
    """Feature extraction → Similarity → Normalization → Cumulative."""
    return (
        series
        .cosine_similarity(inplace=False)
        .normalize_by_radius(inplace=False)
        .to_cumulative(inplace=False)
    )

# Apply to entire frame
processed_frame = frame.map(analysis_pipeline, inplace=False)

print(f"Processed frame:")
print(f"  Spaces: {list(processed_frame.keys())}")
print(f"  Domain: {processed_frame['FRONT'][0].domain}")
print(f"  Each value is normalized cumulative similarity")
print()
processed_frame

Processed frame:
  Spaces: ['FRONT', 'BACK', 'LEFT']
  Domain: SIMILARITY
  Each value is normalized cumulative similarity





### 3.9 Frame Properties

In [27]:
# Access frame properties
print(f"Frame properties:")
print(f"  Keys (spaces): {list(frame.keys())}")
print(f"  Number of series: {len(frame.series)}")
print(f"  Matrices (2D list): {len(frame.matrices)} spaces × {len(frame.matrices[0])} radii")
print(f"  Indices per series: {frame.indices[0]}")
print(f"  Description: {frame.description}")

print(frame)

Frame properties:
  Keys (spaces): ['FRONT', 'BACK', 'LEFT']
  Number of series: 3
  Matrices (2D list): 3 spaces × 5 radii
  Indices per series: [0, 1, 2, 3, 4]
  Description: frame_3s_5i_back_front_etc





In [41]:
frame.save('__data__/test_frame')
print(frame)
loaded_frame = frame.load('__data__/test_frame')
print(loaded_frame)







### 3.10 Save and Load

In [29]:
from pathlib import Path
import tempfile

# Save and load frame
with tempfile.TemporaryDirectory() as tmpdir:
    save_path = Path(tmpdir) / "test_similarity_frame"
    
    # Save
    frame.save(save_path, tag="analysis_v1")
    print(f"Saved frame to: {save_path}")
    print(f"Files created: {len(list(save_path.glob('*')))} files")
    
    # Load
    loaded_frame = SimilarityFrame.load(save_path, tag="analysis_v1")
    print(f"\nLoaded frame spaces: {list(loaded_frame.keys())}")
    print(f"Loaded frame matches: {list(loaded_frame.keys()) == list(frame.keys())}")

Saved frame to: /tmp/tmp3omgf3bq/test_similarity_frame
Files created: 31 files

Loaded frame spaces: ['BACK', 'FRONT', 'LEFT']
Loaded frame matches: False


---
## 4. Advanced Patterns

### 4.1 Double Aggregation: Spaces → Radii

In [30]:
# Step 1: Aggregate across spaces (Frame → Series)
space_aggregated = frame.aggregate()
print(f"After space aggregation:")
print(f"  Type: {type(space_aggregated).__name__}")
print(f"  Space: {space_aggregated.space}")
print(f"  Radii: {space_aggregated.radii}")
print()

# Step 2: Aggregate across radii (Series → Matrix)
fully_aggregated = space_aggregated.aggregate()
print(f"After radius aggregation:")
print(f"  Type: {type(fully_aggregated).__name__}")
print(f"  Space: {fully_aggregated.space}")
print(f"  Radius: {fully_aggregated.radius}")
print(f"  Shape: {fully_aggregated.shape}")
print()
print(f"Result: Single matrix with all spaces and radii combined")
fully_aggregated

After space aggregation:
  Type: SimilaritySeries
  Space: AGGREGATED
  Radii: [0, 1, 2, 3, 4]

After radius aggregation:
  Type: SimilarityMatrix
  Space: AGGREGATED
  Radius: 4
  Shape: (10, 50)

Result: Single matrix with all spaces and radii combined




### 4.2 Similarity Analysis on Aggregated Features

In [31]:
# Common pattern: aggregate features → compute similarity
aggregated_features = space_aggregated

# Compute similarity on aggregated features at each radius
agg_similarity_series = aggregated_features.cosine_similarity(inplace=False)

print(f"Similarity from aggregated features:")
print(f"  Space: {agg_similarity_series.space}")
print(f"  Domain: {agg_similarity_series[0].domain}")
print(f"  Shape per radius: {agg_similarity_series[0].shape}")
print()
agg_similarity_series

Similarity from aggregated features:
  Space: AGGREGATED
  Domain: SIMILARITY
  Shape per radius: (10, 10)





### 4.3 Comparing Individual vs Aggregated Similarity

In [32]:
# Compare FRONT-only vs AGGREGATED similarity at radius 2
front_sim = frame['FRONT'].cosine_similarity(inplace=False)[2]
agg_sim = agg_similarity_series[2]

# Compute mean row scores (normalized by default)
front_scores = front_sim.sum_to_rows(zero_diagonal=True)
agg_scores = agg_sim.sum_to_rows(zero_diagonal=True)

print(f"Mean similarity scores at radius 2:")
print(f"\nFRONT-only:")
print(f"  Mean: {front_scores.mean():.3f}")
print(f"  Std: {front_scores.std():.3f}")
print(f"\nAGGREGATED (FRONT+BACK+LEFT):")
print(f"  Mean: {agg_scores.mean():.3f}")
print(f"  Std: {agg_scores.std():.3f}")
print(f"\nAggregated features capture more morphological information!")

Mean similarity scores at radius 2:

FRONT-only:
  Mean: 0.232
  Std: 0.021

AGGREGATED (FRONT+BACK+LEFT):
  Mean: 0.502
  Std: 0.022

Aggregated features capture more morphological information!


### 4.4 Comparing Normalized vs Unnormalized Scoring

In [33]:
# Demonstrate the difference between normalized and unnormalized scoring
sim_mat = agg_sim

# Normalized (mean)
norm_scores = sim_mat.sum_to_rows(zero_diagonal=True, normalise_by_pop_len=True)
# Unnormalized (sum)
unnorm_scores = sim_mat.sum_to_rows(zero_diagonal=True, normalise_by_pop_len=False)

print(f"Scoring comparison (all neighbors):")
print(f"\nNormalized (mean per neighbor):")
print(f"  Range: [{norm_scores.min():.3f}, {norm_scores.max():.3f}]")
print(f"  Mean: {norm_scores.mean():.3f}")
print(f"  First robot: {norm_scores[0]:.3f}")

print(f"\nUnnormalized (total sum):")
print(f"  Range: [{unnorm_scores.min():.3f}, {unnorm_scores.max():.3f}]")
print(f"  Mean: {unnorm_scores.mean():.3f}")
print(f"  First robot: {unnorm_scores[0]:.3f}")

print(f"\nVerification:")
n_neighbors = sim_mat.shape[0] - 1
print(f"  norm_scores[0] * {n_neighbors} = {norm_scores[0] * n_neighbors:.3f}")
print(f"  unnorm_scores[0] = {unnorm_scores[0]:.3f}")
print(f"  Match: {np.isclose(norm_scores[0] * n_neighbors, unnorm_scores[0])}")

Scoring comparison (all neighbors):

Normalized (mean per neighbor):
  Range: [0.470, 0.540]
  Mean: 0.502
  First robot: 0.503

Unnormalized (total sum):
  Range: [4.228, 4.862]
  Mean: 4.520
  First robot: 4.524

Verification:
  norm_scores[0] * 9 = 4.524
  unnorm_scores[0] = 4.524
  Match: True


---
## Summary

This notebook demonstrated the similarity-specific matrix classes:

### SimilarityMatrix
- Extends MatrixInstance with typed attributes (space, radius, domain)
- Domain-specific operations:
  - `cosine_similarity()`: FEATURES → SIMILARITY
  - `normalize_by_radius()`: Divide by (radius + 1)
  - `sum_to_rows()`: Aggregate similarity scores
    - `normalise_by_pop_len=True` (default): Returns mean scores
    - `normalise_by_pop_len=False`: Returns sum scores
  - `umap_embed()`: Dimensionality reduction
- Three domains: FEATURES (sparse), SIMILARITY (dense), EMBEDDING (dense)

### SimilaritySeries
- Collection of matrices across radii (0, 1, 2, ...)
- Series-wide operations:
  - `cosine_similarity()`: Apply to all radii
  - `normalize_by_radius()`: Normalize each by its radius
  - `to_cumulative()`: Running sum across radii
  - `aggregate()`: Collapse radius dimension (Series → Matrix)
- Supports method chaining

### SimilarityFrame
- Multi-space container (FRONT, BACK, LEFT, RIGHT, ...)
- DataFrame-like structure (rows=radii, cols=spaces)
- Frame-wide operations via `map()`
- `aggregate()`: Collapse space dimension (Frame → Series)
- Pandas-style `.loc` indexing

### Common Patterns
1. **Single-space analysis**: Series → cosine → normalize → cumulative
2. **Multi-space analysis**: Frame → aggregate spaces → cosine on aggregated
3. **Full aggregation**: Frame → aggregate spaces → aggregate radii → single matrix
4. **Comparative analysis**: Compare individual vs aggregated similarities
5. **Scoring options**: Use normalized for comparisons, unnormalized for totals

All classes maintain:
- Immutable operations (via `inplace=False`)
- Rich pretty-printing
- Type safety with enums
- Efficient sparse/dense storage