# 20d: Motif Discovery

**Goal**: Find recurring strategic patterns in game sequences.

**Method**: aeon StompMotifDiscovery on V trajectories.

**Data Limitation**: Current oracle data is aggregated per-depth statistics, not game-level time series.

In [1]:
# === CONFIGURATION ===
PROJECT_ROOT = "/home/jason/v2/mk5-tailwind"

import sys
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

from forge.analysis.utils import viz

viz.setup_notebook_style()
np.random.seed(42)
print("Ready")

Ready


## Data Limitation Analysis

Motif discovery requires:
- Multiple game-level time series (one per game)
- Each series: V values at each ply during actual gameplay
- Format: `games × time_steps` array

Current data provides:
- Aggregated per-depth statistics across many game states
- No tracking of individual game trajectories
- Single mean trajectory, not multiple comparable series

In [2]:
# Load current trajectory data
data_path = Path(PROJECT_ROOT) / "forge/analysis/results/tables/20a_v_trajectory.csv"
df = pd.read_csv(data_path)

print(f"Current data shape: {len(df)} depth levels")
print(f"\nThis is a SINGLE aggregated trajectory, not multiple game series.")
print(f"Motif discovery requires N individual game trajectories.")
print(f"\nColumns available:")
print(df.columns.tolist())

Current data shape: 25 depth levels

This is a SINGLE aggregated trajectory, not multiple game series.
Motif discovery requires N individual game trajectories.

Columns available:
['depth', 'n_states', 'v_mean', 'v_std', 'v_min', 'v_max', 'v_range', 'v_iqr', 'v_p10', 'v_p90', 'delta_std', 'delta_range']


In [3]:
# What we WOULD need for motif discovery:
print("Required data structure for motif discovery:")
print("="*50)
print("""
game_trajectories = [
    [V_d28, V_d27, V_d26, ..., V_d0],  # Game 1 trajectory
    [V_d28, V_d27, V_d26, ..., V_d0],  # Game 2 trajectory
    ...
    [V_d28, V_d27, V_d26, ..., V_d0],  # Game N trajectory
]

Shape: (n_games, 28) where each row is one game's V evolution.
""")

print("\nWhat current oracle provides:")
print("- Per-seed aggregated statistics")
print("- Mean V at each depth across ALL possible states")
print("- No individual game trajectories")

Required data structure for motif discovery:

game_trajectories = [
    [V_d28, V_d27, V_d26, ..., V_d0],  # Game 1 trajectory
    [V_d28, V_d27, V_d26, ..., V_d0],  # Game 2 trajectory
    ...
    [V_d28, V_d27, V_d26, ..., V_d0],  # Game N trajectory
]

Shape: (n_games, 28) where each row is one game's V evolution.


What current oracle provides:
- Per-seed aggregated statistics
- Mean V at each depth across ALL possible states
- No individual game trajectories


In [4]:
# Demonstrate what motif discovery WOULD look like
print("\nExample motif discovery setup (hypothetical):")
print("="*50)

print("""
from aeon.transformations.collection import StompMotifDiscovery

# If we had game trajectories:
motif_finder = StompMotifDiscovery(k=5, m=7)  # Find 5 motifs of length 7
motifs = motif_finder.fit_transform(game_trajectories)

# Expected outputs:
# - Recurring patterns in V evolution
# - E.g., "late comeback" motif: [+5, +10, +15, +20]
# - E.g., "early collapse" motif: [-10, -15, -20, -25]
""")


Example motif discovery setup (hypothetical):

from aeon.transformations.collection import StompMotifDiscovery

# If we had game trajectories:
motif_finder = StompMotifDiscovery(k=5, m=7)  # Find 5 motifs of length 7
motifs = motif_finder.fit_transform(game_trajectories)

# Expected outputs:
# - Recurring patterns in V evolution
# - E.g., "late comeback" motif: [+5, +10, +15, +20]
# - E.g., "early collapse" motif: [-10, -15, -20, -25]



## Alternative: Phase Pattern Analysis

Using the phase segmentation from 20c, we can describe typical patterns:

In [5]:
# Load phase data
phase_path = Path(PROJECT_ROOT) / "forge/analysis/results/tables/20c_phase_segmentation.csv"
phase_df = pd.read_csv(phase_path)

print("Phase patterns (from 20c):")
print("="*50)

for phase in ['chaotic', 'transition', 'deterministic']:
    sub = phase_df[phase_df['phase'] == phase]
    if len(sub) > 0:
        depth_range = f"{int(sub['depth'].min())}-{int(sub['depth'].max())}"
        mean_std = sub['v_std'].mean()
        print(f"\n{phase.upper()}:")
        print(f"  Depth range: {depth_range}")
        print(f"  Mean σ(V): {mean_std:.1f}")

Phase patterns (from 20c):

CHAOTIC:
  Depth range: 13-23
  Mean σ(V): 19.1

TRANSITION:
  Depth range: 5-12
  Mean σ(V): 12.5

DETERMINISTIC:
  Depth range: 1-25
  Mean σ(V): 5.1


In [6]:
print("\n" + "="*60)
print("KEY INSIGHTS: Motif Discovery")
print("="*60)

print(f"\n1. DATA LIMITATION:")
print(f"   Current oracle data is aggregated, not game-level")
print(f"   Cannot extract individual game trajectories")

print(f"\n2. ALTERNATIVE FINDINGS:")
print(f"   Phase patterns from 20c provide game structure:")
print(f"   - Chaotic phase: High variance, early/mid game")
print(f"   - Deterministic phase: Low variance, end game")

print(f"\n3. FUTURE WORK:")
print(f"   To enable motif discovery, would need:")
print(f"   - Game simulator with V tracking per ply")
print(f"   - Dataset of (game_id, ply, V) records")
print(f"   - Multiple game trajectories to compare")


KEY INSIGHTS: Motif Discovery

1. DATA LIMITATION:
   Current oracle data is aggregated, not game-level
   Cannot extract individual game trajectories

2. ALTERNATIVE FINDINGS:
   Phase patterns from 20c provide game structure:
   - Chaotic phase: High variance, early/mid game
   - Deterministic phase: Low variance, end game

3. FUTURE WORK:
   To enable motif discovery, would need:
   - Game simulator with V tracking per ply
   - Dataset of (game_id, ply, V) records
   - Multiple game trajectories to compare


## Conclusion

**Motif discovery requires game-level time series data** that the current oracle aggregation doesn't provide.

The phase segmentation analysis (20c) provides the relevant strategic insights:
- Games follow chaotic → transition → deterministic pattern
- Early decisions have highest impact
- End-game outcomes are largely predetermined

**Future work**: Generate game-level trajectory data from simulator for proper motif discovery.