# Texas 42 Oracle Data Analysis - Quickstart

This notebook verifies the analysis toolkit setup and demonstrates basic data loading.

## Goals
1. Verify utils module imports work
2. Load sample shard data
3. Explore basic V and Q-value distributions
4. Test feature extraction

In [None]:
# === CONFIGURATION ===
DATA_DIR = "/mnt/d/shards-standard/"
PROJECT_ROOT = "/home/jason/v2/mk5-tailwind"

# === Setup imports ===
import sys
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from forge.analysis.utils import loading, features, compression, viz
from forge.oracle import schema, tables

viz.setup_notebook_style()
print("âœ“ Ready")

## 1. Data Discovery

Let's see what shard data is available.

In [None]:
# Find all shard files
shard_files = loading.find_shard_files(DATA_DIR)
print(f"Total shard files: {len(shard_files)}")

# Count by split
counts = loading.count_shards(DATA_DIR)
print(f"\nBy split:")
for split, count in counts.items():
    print(f"  {split}: {count}")

In [None]:
# Show first few files
print("First 5 shard files:")
for f in shard_files[:5]:
    print(f"  {f.name}")

## 2. Load a Single Shard

Load one shard to understand the data structure.

In [None]:
# Load first available shard
df, seed, decl_id = schema.load_file(shard_files[0])

print(f"Seed: {seed}")
print(f"Declaration: {decl_id} ({schema.DECL_NAMES[decl_id]})")
print(f"States: {len(df):,}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nData types:")
print(df.dtypes)

In [None]:
# Basic stats
print(f"V range: [{df['V'].min()}, {df['V'].max()}]")
print(f"V mean: {df['V'].mean():.2f}")
print(f"V std: {df['V'].std():.2f}")
print(f"V unique values: {df['V'].nunique()}")

## 3. V Distribution

Visualize the distribution of minimax values.

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))
viz.plot_v_distribution(df['V'].values, ax=ax, title=f"V Distribution (seed={seed}, decl={schema.DECL_NAMES[decl_id]})")
plt.show()

## 4. Feature Extraction

Extract analytical features from packed states.

In [None]:
states = df['state'].values
V = df['V'].values

# Extract basic features
depths = features.depth(states)
teams = features.team(states)
players = features.player(states)

print(f"Depth range: [{depths.min()}, {depths.max()}]")
print(f"Team 0 turns: {teams.sum():,} ({100*teams.mean():.1f}%)")
print(f"Player distribution: {np.bincount(players)}")

In [None]:
# V by depth
fig, ax = plt.subplots(figsize=(14, 6))
viz.plot_v_by_depth(V, depths, ax=ax, title="V Distribution by Depth")
plt.show()

In [None]:
# State count by depth
depth_counts = pd.Series(depths).value_counts().sort_index()
print("States per depth:")
print(depth_counts)

## 5. Q-Value Structure

Analyze the Q-values (move evaluations).

In [None]:
# Extract Q-values
q_cols = ['q0', 'q1', 'q2', 'q3', 'q4', 'q5', 'q6']
q_values = df[q_cols].values

# Compute Q-statistics
q_stats = features.q_stats(q_values)
print("Q-value statistics:")
print(q_stats.describe())

In [None]:
# Plot Q-structure
viz.plot_q_structure(q_stats, title=f"Q-Value Structure (seed={seed})")
plt.show()

## 6. Count Domino Analysis

The "count" dominoes (5-count and 10-count) are critical strategic elements.

In [None]:
# Show count dominoes
print("Count dominoes:")
for d in features.COUNT_DOMINO_IDS:
    pips = schema.domino_pips(d)
    points = tables.DOMINO_COUNT_POINTS[d]
    print(f"  ID {d}: {pips[0]}-{pips[1]} = {points} points")

In [None]:
# Track count locations in this seed
hands = schema.deal_from_seed(seed)
print(f"\nDeal for seed {seed}:")
for p, hand in enumerate(hands):
    hand_str = ", ".join(f"{schema.domino_pips(d)}" for d in hand)
    print(f"  P{p}: {hand_str}")

# Who holds each count at start?
print("\nCount domino locations at deal:")
for d in features.COUNT_DOMINO_IDS:
    pips = schema.domino_pips(d)
    for p, hand in enumerate(hands):
        if d in hand:
            team = "Team 0" if p % 2 == 0 else "Team 1"
            print(f"  {pips[0]}-{pips[1]} ({tables.DOMINO_COUNT_POINTS[d]}): Player {p} ({team})")
            break

In [None]:
# Count points remaining vs V
counts_rem = features.counts_remaining(states, seed)

print(f"Count points remaining: [{counts_rem.min()}, {counts_rem.max()}]")
print(f"\nCorrelation with V: {np.corrcoef(counts_rem, V)[0,1]:.4f}")

## 7. Information Theory Preview

Quick entropy calculation to preview structure.

In [None]:
# Entropy of V
h_v = compression.entropy_bits(V)
print(f"H(V) = {h_v:.3f} bits")
print(f"Max possible (85 values): {np.log2(85):.3f} bits")
print(f"Efficiency: {h_v / np.log2(85) * 100:.1f}%")

In [None]:
# Conditional entropy given depth
h_v_depth = compression.conditional_entropy(V, depths)
mi_depth = h_v - h_v_depth

print(f"H(V|depth) = {h_v_depth:.3f} bits")
print(f"I(V; depth) = {mi_depth:.3f} bits")
print(f"Reduction from depth: {100 * mi_depth / h_v:.1f}%")

## 8. Compression Preview

Test LZMA compressibility.

In [None]:
# Compare compression under different orderings
comp_results = compression.compression_analysis(states, V.astype(np.int8))

print("LZMA compression ratios (lower = more structure):")
for ordering, ratio in comp_results.items():
    print(f"  {ordering}: {ratio:.4f}")

In [None]:
# Visualize
fig, ax = plt.subplots(figsize=(8, 5))
viz.plot_compression_comparison(comp_results, ax=ax)
plt.show()

## Summary

This notebook verified:
- Analysis utils load correctly
- Shard data is accessible
- Feature extraction works
- Basic entropy and compression metrics

**Next notebooks:**
- `01a_distribution_profiles.ipynb` - Deep dive into V/Q distributions
- `02a_entropy_decomposition.ipynb` - Full information theory analysis
- `03b_basin_analysis.ipynb` - Count domino basin partitioning