# K₇ Canonical Spectral Estimator

## Objective: First-Principles k-Scaling (No Tuning)

**The Problem**: Previous work used empirical k ∝ √N scaling. This allows "tuning" the coefficient to get desired results.

**The Solution**: Use Belkin-Niyogi (2008) canonical scaling:

```
For d-dimensional manifold:
  k_canonical = c × N^(6/(d+6))
  
For d=7 (K₇):
  k_canonical = c × N^(6/13) ≈ c × N^0.462
  
Convergence rate: O(N^(-2/(d+6))) = O(N^(-2/13)) ≈ O(N^(-0.154))
```

**Key Test**: If the theory is right, the N→∞ limit should be **independent of coefficient c**.

---

## References
- Belkin & Niyogi, "Towards a theoretical foundation for Laplacian-based manifold methods" (2008)
- Calder & Garcia Trillos, "Improved spectral convergence rates" (2019)

In [None]:
import numpy as np
import json
from datetime import datetime

# Check for GPU
try:
    import cupy as cp
    from cupyx.scipy.sparse import csr_matrix as cp_csr
    from cupyx.scipy.sparse.linalg import eigsh as cp_eigsh
    GPU_AVAILABLE = True
    print("✓ GPU available (CuPy)")
except ImportError:
    GPU_AVAILABLE = False
    print("○ CPU mode (NumPy/SciPy)")
    from scipy.sparse import csr_matrix
    from scipy.sparse.linalg import eigsh

print(f"Started: {datetime.now().strftime('%H:%M:%S')}")

## 1. Theoretical Framework

### Belkin-Niyogi Optimal Scaling

For k-NN graph Laplacian on d-dimensional manifold:

| Quantity | Formula | d=7 value |
|----------|---------|----------|
| Optimal k exponent | 6/(d+6) | 6/13 ≈ 0.462 |
| Convergence rate | N^(-2/(d+6)) | N^(-0.154) |
| Bias-variance balance | ε ~ N^(-1/(d+6)) | N^(-0.077) |

### Key Insight
The **coefficient c** in k = c × N^0.462 affects:
- Finite-N values (shifts the curve)
- But NOT the N→∞ limit (if theory holds)

We test c ∈ {1, 2, 4, 8} and verify limit convergence.

In [None]:
# GIFT Constants
b2, b3 = 21, 77
H_STAR = b2 + b3 + 1  # = 99
DIM_G2 = 14
DIM_K7 = 7

# Belkin-Niyogi theoretical scaling for d=7
K_EXPONENT = 6 / (DIM_K7 + 6)  # = 6/13 ≈ 0.462
CONVERGENCE_RATE = 2 / (DIM_K7 + 6)  # = 2/13 ≈ 0.154

# TCS metric parameters
DET_G = 65/32
RATIO = H_STAR / 84  # ≈ 1.179

print(f"K₇ Canonical Scaling:")
print(f"  k exponent (theoretical): {K_EXPONENT:.4f}")
print(f"  k exponent (empirical √N): 0.5000")
print(f"  Difference: {0.5 - K_EXPONENT:.4f}")
print(f"")
print(f"Convergence rate: O(N^{-CONVERGENCE_RATE:.4f})")
print(f"H* = {H_STAR}, target range: [13, 14]")

In [None]:
def sample_S3_quaternion(n, rng):
    """Sample uniformly on S³ using quaternion normalization."""
    x = rng.standard_normal((n, 4))
    x /= np.linalg.norm(x, axis=1, keepdims=True)
    return x

def sample_S1(n, rng):
    """Sample uniformly on S¹."""
    return rng.uniform(0, 2*np.pi, n)

def sample_TCS_K7(N, rng, ratio=RATIO):
    """Sample TCS construction: S¹ × S³ × S³ with metric scaling."""
    theta = sample_S1(N, rng)
    q1 = sample_S3_quaternion(N, rng)
    q2 = sample_S3_quaternion(N, rng)
    return theta, q1, q2, ratio

def geodesic_distance_S3(q1, q2):
    """Geodesic distance on S³: d = 2·arccos(|q₁·q₂|)."""
    dot = np.abs(np.sum(q1 * q2, axis=1))
    dot = np.clip(dot, -1, 1)
    return 2 * np.arccos(dot)

def geodesic_distance_S1(t1, t2):
    """Geodesic distance on S¹: d = min(|Δθ|, 2π-|Δθ|)."""
    diff = np.abs(t1 - t2)
    return np.minimum(diff, 2*np.pi - diff)

In [None]:
def compute_distance_matrix_chunked(theta, q1, q2, ratio, chunk_size=2000):
    """Compute TCS distance matrix with memory-efficient chunking."""
    N = len(theta)
    alpha = DET_G / (ratio ** 3)
    
    # Use float32 to save memory
    D = np.zeros((N, N), dtype=np.float32)
    
    for i in range(0, N, chunk_size):
        i_end = min(i + chunk_size, N)
        for j in range(0, N, chunk_size):
            j_end = min(j + chunk_size, N)
            
            # S¹ distances
            t1 = theta[i:i_end, None]
            t2 = theta[None, j:j_end]
            d_S1 = geodesic_distance_S1(t1, t2)
            
            # S³ distances (first factor)
            d_S3_1 = np.zeros((i_end-i, j_end-j), dtype=np.float32)
            for ii, qi in enumerate(q1[i:i_end]):
                dot = np.abs(np.sum(qi * q2[j:j_end], axis=1))
                d_S3_1[ii] = 2 * np.arccos(np.clip(dot, -1, 1))
            
            # S³ distances (second factor, scaled by ratio)
            d_S3_2 = np.zeros((i_end-i, j_end-j), dtype=np.float32)
            for ii, qi in enumerate(q2[i:i_end]):
                # Note: reusing q2 for both S³ factors (simplification)
                dot = np.abs(np.sum(qi * q2[j:j_end], axis=1))
                d_S3_2[ii] = 2 * np.arccos(np.clip(dot, -1, 1))
            
            # TCS metric: d² = α·d²_S¹ + d²_S³₁ + r²·d²_S³₂
            D[i:i_end, j:j_end] = np.sqrt(
                alpha * d_S1**2 + d_S3_1**2 + ratio**2 * d_S3_2**2
            )
    
    return D

In [None]:
def compute_normalized_laplacian_knn(D, k):
    """Compute symmetric normalized Laplacian using k-NN graph.
    
    L = I - D^(-1/2) W D^(-1/2)
    
    where W_ij = exp(-d²_ij / 2σ²) for k nearest neighbors.
    """
    N = D.shape[0]
    k = min(k, N - 1)
    
    # Find k nearest neighbors and compute σ as median of k-NN distances
    knn_distances = np.zeros(N)
    neighbors = np.zeros((N, k), dtype=np.int32)
    
    for i in range(N):
        idx = np.argpartition(D[i], k+1)[:k+1]
        idx = idx[idx != i][:k]  # exclude self
        neighbors[i] = idx
        knn_distances[i] = np.median(D[i, idx])
    
    sigma = np.median(knn_distances)
    
    # Build sparse weight matrix (COO format for CuPy compatibility)
    rows, cols, data = [], [], []
    
    for i in range(N):
        for j in neighbors[i]:
            w = np.exp(-D[i, j]**2 / (2 * sigma**2))
            rows.append(i)
            cols.append(j)
            data.append(w)
    
    # Symmetrize
    rows_sym = rows + cols
    cols_sym = cols + rows
    data_sym = data + data
    
    if GPU_AVAILABLE:
        W = cp_csr((cp.array(data_sym), (cp.array(rows_sym), cp.array(cols_sym))), shape=(N, N))
        # Degree matrix
        d = cp.array(W.sum(axis=1)).flatten()
        d_inv_sqrt = 1.0 / cp.sqrt(d + 1e-10)
        D_inv_sqrt = cp_csr((d_inv_sqrt, (cp.arange(N), cp.arange(N))), shape=(N, N))
        # Normalized Laplacian: L = I - D^(-1/2) W D^(-1/2)
        L = cp_csr((cp.ones(N), (cp.arange(N), cp.arange(N))), shape=(N, N)) - D_inv_sqrt @ W @ D_inv_sqrt
    else:
        from scipy.sparse import csr_matrix as sp_csr
        W = sp_csr((data_sym, (rows_sym, cols_sym)), shape=(N, N))
        d = np.array(W.sum(axis=1)).flatten()
        d_inv_sqrt = 1.0 / np.sqrt(d + 1e-10)
        D_inv_sqrt = sp_csr((d_inv_sqrt, (np.arange(N), np.arange(N))), shape=(N, N))
        from scipy.sparse import eye
        L = eye(N) - D_inv_sqrt @ W @ D_inv_sqrt
    
    return L, sigma

In [None]:
def compute_lambda1(L, n_eigs=6):
    """Compute first non-zero eigenvalue of Laplacian."""
    if GPU_AVAILABLE:
        eigenvalues = cp_eigsh(L, k=n_eigs, which='SA', return_eigenvectors=False)
        eigenvalues = cp.asnumpy(eigenvalues)
    else:
        eigenvalues = eigsh(L, k=n_eigs, which='SA', return_eigenvectors=False)
    
    eigenvalues = np.sort(eigenvalues)
    # First non-zero eigenvalue (skip λ₀ ≈ 0)
    lambda1 = eigenvalues[eigenvalues > 1e-8][0] if np.any(eigenvalues > 1e-8) else eigenvalues[1]
    return lambda1

## 2. Canonical Estimator: Test c-Independence

We test multiple coefficients c ∈ {1, 2, 4, 8} with the canonical exponent 6/13.

**Hypothesis**: All should converge to the same limit as N→∞.

In [None]:
def canonical_k(N, c=1.0):
    """Belkin-Niyogi canonical k for d=7."""
    return max(10, int(c * N ** K_EXPONENT))

# Test parameters
N_VALUES = [3000, 5000, 8000, 12000]  # Limited by memory
C_VALUES = [1.0, 2.0, 4.0, 8.0]
N_SEEDS = 3

print("Canonical k values (k = c × N^0.462):")
print("-" * 50)
print(f"{'N':>8} | " + " | ".join([f"c={c}" for c in C_VALUES]))
print("-" * 50)
for N in N_VALUES:
    ks = [canonical_k(N, c) for c in C_VALUES]
    print(f"{N:>8} | " + " | ".join([f"{k:>4}" for k in ks]))

In [None]:
%%time
# Main computation: test c-independence
results = {c: [] for c in C_VALUES}

for N in N_VALUES:
    print(f"\n{'='*60}")
    print(f"N = {N}")
    print(f"{'='*60}")
    
    for seed in range(N_SEEDS):
        rng = np.random.default_rng(42 + seed)
        theta, q1, q2, ratio = sample_TCS_K7(N, rng)
        
        print(f"\n  Seed {seed}: Computing distance matrix...", end=" ")
        D = compute_distance_matrix_chunked(theta, q1, q2, ratio)
        print("done")
        
        for c in C_VALUES:
            k = canonical_k(N, c)
            L, sigma = compute_normalized_laplacian_knn(D, k)
            lambda1 = compute_lambda1(L)
            product = float(lambda1 * H_STAR)
            
            results[c].append({
                'N': N,
                'k': k,
                'c': c,
                'seed': seed,
                'lambda1': float(lambda1),
                'product': product,
                'sigma': float(sigma)
            })
            
            print(f"    c={c}: k={k:3d}, λ₁×H* = {product:.3f}")
        
        # Clear memory
        del D
        if GPU_AVAILABLE:
            cp.get_default_memory_pool().free_all_blocks()

print(f"\n\nCompleted: {datetime.now().strftime('%H:%M:%S')}")

## 3. Analysis: Convergence and c-Independence

In [None]:
import matplotlib.pyplot as plt

# Compute means and stds for each (N, c)
summary = {}
for c in C_VALUES:
    summary[c] = {}
    for N in N_VALUES:
        products = [r['product'] for r in results[c] if r['N'] == N]
        summary[c][N] = {'mean': np.mean(products), 'std': np.std(products)}

# Plot convergence curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: λ₁×H* vs N for each c
ax = axes[0]
colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(C_VALUES)))
for i, c in enumerate(C_VALUES):
    means = [summary[c][N]['mean'] for N in N_VALUES]
    stds = [summary[c][N]['std'] for N in N_VALUES]
    ax.errorbar(N_VALUES, means, yerr=stds, marker='o', label=f'c={c}', 
                color=colors[i], capsize=3, linewidth=2, markersize=8)

ax.axhline(y=14, color='red', linestyle='--', alpha=0.7, label='Pell (14)')
ax.axhline(y=13, color='blue', linestyle='--', alpha=0.7, label='Spinor (13)')
ax.set_xlabel('N (sample size)', fontsize=12)
ax.set_ylabel('λ₁ × H*', fontsize=12)
ax.set_title('Convergence: Canonical k = c × N^0.462', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

# Right: Richardson extrapolation with theoretical rate
ax = axes[1]

# Use N^(-2/13) as the convergence variable
x_theory = [N ** (-CONVERGENCE_RATE) for N in N_VALUES]

for i, c in enumerate(C_VALUES):
    means = [summary[c][N]['mean'] for N in N_VALUES]
    ax.plot(x_theory, means, 'o-', label=f'c={c}', color=colors[i], 
            linewidth=2, markersize=8)
    
    # Linear fit for extrapolation
    coeffs = np.polyfit(x_theory, means, 1)
    x_extrap = np.linspace(0, max(x_theory), 100)
    y_extrap = np.polyval(coeffs, x_extrap)
    ax.plot(x_extrap, y_extrap, '--', color=colors[i], alpha=0.5)
    
    # Print extrapolated limit
    limit = coeffs[1]  # y-intercept = N→∞ limit
    print(f"c={c}: Extrapolated limit = {limit:.3f}")

ax.axhline(y=14, color='red', linestyle='--', alpha=0.7)
ax.axhline(y=13, color='blue', linestyle='--', alpha=0.7)
ax.axvline(x=0, color='black', linestyle='-', alpha=0.3)
ax.set_xlabel(f'N^(-{CONVERGENCE_RATE:.3f}) → 0 as N→∞', fontsize=12)
ax.set_ylabel('λ₁ × H*', fontsize=12)
ax.set_title('Richardson Extrapolation (Theoretical Rate)', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('canonical_estimator_convergence.png', dpi=150)
plt.show()

## 4. Critical Test: Do Different c Values Converge to Same Limit?

In [None]:
# Compute extrapolated limits for each c
x_theory = [N ** (-CONVERGENCE_RATE) for N in N_VALUES]
limits = {}

print("\n" + "="*60)
print("CRITICAL TEST: c-Independence of N→∞ Limit")
print("="*60)
print(f"\nTheoretical convergence rate: O(N^{-CONVERGENCE_RATE:.4f})")
print("\nExtrapolated limits (linear fit in N^(-0.154)):")
print("-" * 40)

for c in C_VALUES:
    means = [summary[c][N]['mean'] for N in N_VALUES]
    coeffs = np.polyfit(x_theory, means, 1)
    limits[c] = coeffs[1]
    slope = coeffs[0]
    
    # Compute R²
    y_pred = np.polyval(coeffs, x_theory)
    ss_res = np.sum((np.array(means) - y_pred)**2)
    ss_tot = np.sum((np.array(means) - np.mean(means))**2)
    r2 = 1 - ss_res / ss_tot if ss_tot > 0 else 0
    
    print(f"  c = {c}: limit = {limits[c]:.3f}, slope = {slope:.2f}, R² = {r2:.3f}")

# Check spread of limits
limit_values = list(limits.values())
mean_limit = np.mean(limit_values)
std_limit = np.std(limit_values)
spread = max(limit_values) - min(limit_values)

print("\n" + "-" * 40)
print(f"Mean limit:  {mean_limit:.3f}")
print(f"Std:         {std_limit:.3f}")
print(f"Spread:      {spread:.3f}")
print(f"Relative:    {100*spread/mean_limit:.1f}%")

# Verdict
print("\n" + "="*60)
if spread < 1.0:  # Less than 1 unit spread
    print("✓ PASS: Limits are c-INDEPENDENT (spread < 1)")
    print(f"  Canonical limit: λ₁×H* = {mean_limit:.2f} ± {std_limit:.2f}")
else:
    print("✗ FAIL: Limits depend on c (spread ≥ 1)")
    print("  The estimator is NOT canonical.")
print("="*60)

## 5. Comparison: Theoretical vs Empirical Scaling

In [None]:
# Compare k-scalings
print("\nk-Scaling Comparison:")
print("=" * 60)
print(f"{'N':>8} | {'Theoretical':>12} | {'Empirical √N':>12} | {'Pell':>12}")
print(f"{'':>8} | {'c×N^0.462':>12} | {'0.74×N^0.5':>12} | {'0.366×N^0.5':>12}")
print("-" * 60)

for N in [5000, 10000, 20000, 50000]:
    k_theory = 2.0 * N ** K_EXPONENT  # c=2 as middle value
    k_empirical = 0.74 * np.sqrt(N)
    k_pell = 0.366 * np.sqrt(N)
    print(f"{N:>8} | {k_theory:>12.1f} | {k_empirical:>12.1f} | {k_pell:>12.1f}")

print("\nNote: Theoretical k grows SLOWER than √N")
print(f"      N^0.462 / √N = N^(-0.038) → 0 as N→∞")

## 6. Save Results

In [None]:
# Compile final results
final_results = {
    'metadata': {
        'date': datetime.now().isoformat(),
        'method': 'Belkin-Niyogi canonical k-scaling',
        'k_exponent': float(K_EXPONENT),
        'convergence_rate': float(CONVERGENCE_RATE),
        'H_star': int(H_STAR),
        'N_values': N_VALUES,
        'c_values': C_VALUES,
        'n_seeds': N_SEEDS
    },
    'raw_results': results,
    'summary': {
        c: {
            str(N): {'mean': float(summary[c][N]['mean']), 
                     'std': float(summary[c][N]['std'])}
            for N in N_VALUES
        }
        for c in C_VALUES
    },
    'extrapolated_limits': {str(c): float(limits[c]) for c in C_VALUES},
    'conclusion': {
        'mean_limit': float(mean_limit),
        'std_limit': float(std_limit),
        'spread': float(spread),
        'c_independent': bool(spread < 1.0)
    }
}

with open('canonical_estimator_results.json', 'w') as f:
    json.dump(final_results, f, indent=2)

print("Results saved to canonical_estimator_results.json")

## 7. Conclusion

### Key Findings

1. **Theoretical k-scaling**: k = c × N^0.462 (Belkin-Niyogi for d=7)
2. **Convergence rate**: O(N^-0.154) (theoretical)
3. **c-independence test**: [RESULT ABOVE]

### Interpretation

If limits are **c-independent**:
- The canonical estimator works
- The limit is a true geometric invariant
- No tuning required

If limits **depend on c**:
- The graph Laplacian approximation has systematic bias
- More sophisticated methods needed (heat kernel, etc.)
- Or the N range is insufficient