# H-AKORN: Hyperbolic Attention with Kuramoto Oscillator Regularized Networks

**Author:** Éric Gustavo Reis de Sena  
**Date:** January 2026

This notebook demonstrates the H-AKORN transformer, which combines:
1. **Hyperbolic geometry** for hierarchical representation
2. **Kuramoto oscillator dynamics** for phase synchronization
3. **Adaptive coupling** between attention heads

## Architecture Overview

```
Input → Embeddings → H-AKORN Layers → LM Head → Output
                           ↓
                    [Hyperbolic Attention]
                           ↓
                  [Kuramoto Phase Dynamics]
                           ↓
                   [Adaptive Coupling]
```

In [None]:
# Installation (if needed)
# !pip install torch numpy matplotlib tqdm

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

# Import H-AKORN modules
from hakorn import (
    HAKORNTransformer,
    HAKORNLoss,
    KuramotoPhaseEvolution,
    AdaptiveCoupling,
    HyperbolicKuramotoAttention,
)

print("H-AKORN modules loaded successfully!")
print(f"Using device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

## 1. Understanding Kuramoto Dynamics

The Kuramoto model describes how coupled oscillators synchronize:

$$\frac{d\theta_i}{dt} = \omega_i + \frac{K}{N} \sum_j A_{ij} \sin(\theta_j - \theta_i)$$

Where:
- $\theta_i$: phase of oscillator $i$
- $\omega_i$: natural frequency
- $K$: coupling strength
- $A_{ij}$: coupling matrix

The **order parameter** $r$ measures synchronization:

$$r = \left| \frac{1}{N} \sum_j e^{i\theta_j} \right|$$

where $r \in [0, 1]$ ($r=1$ = perfect sync, $r=0$ = no sync)

In [None]:
# Demonstrate Kuramoto dynamics
def visualize_kuramoto_evolution(num_heads=8, num_steps=100, coupling_strength=1.0):
    """
    Visualize phase evolution of Kuramoto oscillators.
    """
    # Create phase evolution module
    phase_evolution = KuramotoPhaseEvolution(
        d_model=768,
        num_heads=num_heads,
        coupling_strength=coupling_strength,
        dt=0.05,
    )
    
    # Create uniform coupling matrix
    coupling_matrix = torch.ones(num_heads, num_heads) / num_heads
    
    # Evolve phases
    phases_history = []
    order_params = []
    
    for _ in range(num_steps):
        phases, order_param = phase_evolution(coupling_matrix, batch_size=1)
        phases_history.append(phases[0].cpu().numpy())
        order_params.append(order_param[0].item())
    
    phases_history = np.array(phases_history)
    order_params = np.array(order_params)
    
    # Plot
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Phase evolution
    for i in range(num_heads):
        axes[0].plot(phases_history[:, i], label=f'Head {i+1}')
    axes[0].set_xlabel('Time Step')
    axes[0].set_ylabel('Phase (radians)')
    axes[0].set_title('Phase Evolution')
    axes[0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    axes[0].grid(True, alpha=0.3)
    
    # Order parameter
    axes[1].plot(order_params, linewidth=2, color='red')
    axes[1].set_xlabel('Time Step')
    axes[1].set_ylabel('Order Parameter r')
    axes[1].set_title(f'Synchronization (K={coupling_strength})')
    axes[1].set_ylim([0, 1.1])
    axes[1].grid(True, alpha=0.3)
    axes[1].axhline(y=1.0, color='green', linestyle='--', label='Perfect Sync')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()
    
    print(f"Final order parameter: {order_params[-1]:.4f}")
    print(f"Average order parameter: {order_params.mean():.4f}")

# Visualize
visualize_kuramoto_evolution(num_heads=8, num_steps=200, coupling_strength=1.5)

## 2. Create H-AKORN Model

Initialize a small H-AKORN transformer for demonstration.

In [None]:
# Model configuration
config = {
    'vocab_size': 50257,  # GPT-2 vocab size
    'd_model': 512,
    'num_layers': 6,
    'num_heads': 8,
    'd_ff': 2048,
    'max_position_embeddings': 256,
    'dropout': 0.1,
    'curvature': -1.0,
    'coupling_strength': 1.0,
    'use_phase_modulation': True,
}

# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = HAKORNTransformer(**config).to(device)

print(f"Model created with {model.get_num_params():,} parameters")
print(f"Non-embedding parameters: {model.get_num_params(non_embedding=True):,}")

## 3. Test Forward Pass

Perform a forward pass and inspect outputs, including phase dynamics.

In [None]:
# Create dummy input
batch_size = 4
seq_length = 32
input_ids = torch.randint(0, config['vocab_size'], (batch_size, seq_length)).to(device)
labels = input_ids.clone()

# Forward pass
with torch.no_grad():
    output = model(
        input_ids=input_ids,
        labels=labels,
        output_attentions=True,
        output_hidden_states=True,
        return_dict=True,
    )

print("\n=== Forward Pass Output ===")
print(f"Logits shape: {output['logits'].shape}")
print(f"Loss: {output['loss'].item():.4f}")
print(f"Hidden states shape: {output['hidden_states'].shape}")
print(f"Number of layers: {len(output['all_phases'])}")

# Analyze phase dynamics
print("\n=== Phase Dynamics ===")
for layer_idx, (phases, order_param) in enumerate(zip(output['all_phases'], output['all_order_params'])):
    print(f"Layer {layer_idx + 1}:")
    print(f"  Phases shape: {phases.shape}")
    print(f"  Order parameter: {order_param.mean().item():.4f}")
    print(f"  Phase range: [{phases.min().item():.2f}, {phases.max().item():.2f}]")

## 4. Visualize Attention and Phase Coupling

Visualize how attention patterns relate to phase synchronization.

In [None]:
def visualize_attention_and_phases(output, layer_idx=0, batch_idx=0):
    """
    Visualize attention patterns and phase values for a specific layer.
    """
    if output['all_attentions'] is None:
        print("No attention weights available. Set output_attentions=True")
        return
    
    attention = output['all_attentions'][layer_idx][batch_idx].cpu().numpy()
    phases = output['all_phases'][layer_idx][batch_idx].cpu().numpy()
    
    num_heads = attention.shape[0]
    
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # Average attention pattern
    avg_attention = attention.mean(axis=0)
    im1 = axes[0].imshow(avg_attention, cmap='viridis', aspect='auto')
    axes[0].set_title(f'Average Attention Pattern (Layer {layer_idx + 1})')
    axes[0].set_xlabel('Key Position')
    axes[0].set_ylabel('Query Position')
    plt.colorbar(im1, ax=axes[0])
    
    # Phase values per head
    axes[1].bar(range(num_heads), phases)
    axes[1].set_title(f'Phase Values (Layer {layer_idx + 1})')
    axes[1].set_xlabel('Head Index')
    axes[1].set_ylabel('Phase (radians)')
    axes[1].set_ylim([0, 2*np.pi])
    axes[1].grid(True, alpha=0.3)
    
    # Phase coherence (pairwise phase differences)
    phase_diff = np.abs(phases[:, None] - phases[None, :])
    phase_coherence = np.cos(phase_diff)
    im3 = axes[2].imshow(phase_coherence, cmap='RdBu', vmin=-1, vmax=1)
    axes[2].set_title(f'Phase Coherence Matrix (Layer {layer_idx + 1})')
    axes[2].set_xlabel('Head Index')
    axes[2].set_ylabel('Head Index')
    plt.colorbar(im3, ax=axes[2])
    
    plt.tight_layout()
    plt.show()

# Visualize layer 0 and layer 3
print("=== Layer 0 ===")
visualize_attention_and_phases(output, layer_idx=0)

print("\n=== Layer 3 ===")
visualize_attention_and_phases(output, layer_idx=3)

## 5. Training Loop with H-AKORN Loss

Demonstrate training with H-AKORN-specific regularization.

In [None]:
# Create loss criterion
criterion = HAKORNLoss(
    lambda_sync=0.1,
    lambda_variance=0.05,
)

# Create optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0.01)

# Simple training loop
num_steps = 100
log_interval = 10

model.train()
loss_history = []
sync_history = []
order_param_history = []

print("\n=== Training Loop ===")
for step in tqdm(range(num_steps)):
    # Generate random input
    input_ids = torch.randint(0, config['vocab_size'], (batch_size, seq_length)).to(device)
    labels = input_ids.clone()
    
    # Forward pass
    output = model(
        input_ids=input_ids,
        labels=labels,
        return_dict=True,
    )
    
    lm_loss = output['loss']
    order_parameters = output['all_order_params']
    
    # Compute H-AKORN loss
    loss_dict = criterion(lm_loss, order_parameters)
    loss = loss_dict['total']
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
    optimizer.step()
    
    # Log
    loss_history.append(loss.item())
    sync_history.append(loss_dict['sync'])
    avg_order_param = torch.stack(order_parameters).mean().item()
    order_param_history.append(avg_order_param)
    
    if step % log_interval == 0:
        print(f"Step {step}: Loss={loss.item():.4f}, LM={loss_dict['lm']:.4f}, "
              f"Sync={loss_dict['sync']:.4f}, Order={avg_order_param:.4f}")

print("\nTraining complete!")

In [None]:
# Plot training curves
fig, axes = plt.subplots(1, 3, figsize=(18, 4))

# Total loss
axes[0].plot(loss_history)
axes[0].set_title('Total Loss')
axes[0].set_xlabel('Step')
axes[0].set_ylabel('Loss')
axes[0].grid(True, alpha=0.3)

# Sync loss
axes[1].plot(sync_history, color='orange')
axes[1].set_title('Synchronization Loss')
axes[1].set_xlabel('Step')
axes[1].set_ylabel('L_sync')
axes[1].grid(True, alpha=0.3)

# Order parameter
axes[2].plot(order_param_history, color='green')
axes[2].set_title('Order Parameter Evolution')
axes[2].set_xlabel('Step')
axes[2].set_ylabel('r')
axes[2].set_ylim([0, 1.1])
axes[2].axhline(y=1.0, color='red', linestyle='--', label='Perfect Sync')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nFinal order parameter: {order_param_history[-1]:.4f}")
print(f"Order parameter improvement: {order_param_history[-1] - order_param_history[0]:.4f}")

## 6. Text Generation with H-AKORN

Test generation capabilities (requires proper tokenizer).

In [None]:
# Simple generation example
model.eval()

# Create prompt (using random tokens as placeholder)
prompt = torch.tensor([[1, 2, 3, 4, 5]], dtype=torch.long).to(device)

# Generate
with torch.no_grad():
    generated = model.generate(
        prompt,
        max_length=50,
        temperature=0.8,
        do_sample=True,
    )

print(f"Generated sequence shape: {generated.shape}")
print(f"Generated tokens: {generated[0].tolist()[:20]}...")  # First 20 tokens

## 7. Phase Dynamics Analysis

Analyze how phases evolve across layers during inference.

In [None]:
def analyze_layer_phase_progression():
    """
    Analyze how phases and order parameters evolve across layers.
    """
    model.eval()
    model.reset_phases()
    
    # Forward pass
    input_ids = torch.randint(0, config['vocab_size'], (1, 64)).to(device)
    
    with torch.no_grad():
        output = model(
            input_ids=input_ids,
            return_dict=True,
        )
    
    # Extract phase information
    all_phases = output['all_phases']
    all_order_params = output['all_order_params']
    
    num_layers = len(all_phases)
    num_heads = all_phases[0].shape[1]
    
    # Convert to numpy
    phases_array = np.array([p[0].cpu().numpy() for p in all_phases])  # [L, H]
    order_params_array = np.array([o[0].cpu().item() for o in all_order_params])  # [L]
    
    # Plot
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Phase evolution across layers
    im = axes[0].imshow(phases_array.T, aspect='auto', cmap='twilight', vmin=0, vmax=2*np.pi)
    axes[0].set_title('Phase Values Across Layers')
    axes[0].set_xlabel('Layer Index')
    axes[0].set_ylabel('Head Index')
    plt.colorbar(im, ax=axes[0], label='Phase (radians)')
    
    # Order parameter across layers
    axes[1].plot(range(num_layers), order_params_array, marker='o', linewidth=2)
    axes[1].set_title('Order Parameter Across Layers')
    axes[1].set_xlabel('Layer Index')
    axes[1].set_ylabel('Order Parameter r')
    axes[1].set_ylim([0, 1.1])
    axes[1].grid(True, alpha=0.3)
    axes[1].axhline(y=1.0, color='red', linestyle='--', label='Perfect Sync')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()
    
    # Statistics
    print("\n=== Phase Statistics ===")
    print(f"Layer 0 order parameter: {order_params_array[0]:.4f}")
    print(f"Layer {num_layers-1} order parameter: {order_params_array[-1]:.4f}")
    print(f"Average order parameter: {order_params_array.mean():.4f}")
    print(f"Order parameter improvement: {order_params_array[-1] - order_params_array[0]:.4f}")

analyze_layer_phase_progression()

## 8. Compare with/without Phase Modulation

Compare attention behavior with and without phase modulation.

In [None]:
def compare_phase_modulation():
    """
    Compare models with and without phase modulation.
    """
    # Create two models
    config_with_phase = config.copy()
    config_with_phase['use_phase_modulation'] = True
    config_with_phase['num_layers'] = 4  # Smaller for comparison
    
    config_without_phase = config.copy()
    config_without_phase['use_phase_modulation'] = False
    config_without_phase['num_layers'] = 4
    
    model_with = HAKORNTransformer(**config_with_phase).to(device).eval()
    model_without = HAKORNTransformer(**config_without_phase).to(device).eval()
    
    # Forward pass
    input_ids = torch.randint(0, config['vocab_size'], (1, 32)).to(device)
    
    with torch.no_grad():
        output_with = model_with(input_ids, return_dict=True)
        output_without = model_without(input_ids, return_dict=True)
    
    # Compare order parameters
    order_with = [o[0].item() for o in output_with['all_order_params']]
    order_without = [o[0].item() for o in output_without['all_order_params']]
    
    # Plot
    fig, ax = plt.subplots(figsize=(10, 6))
    
    x = range(len(order_with))
    ax.plot(x, order_with, marker='o', label='With Phase Modulation', linewidth=2)
    ax.plot(x, order_without, marker='s', label='Without Phase Modulation', linewidth=2)
    
    ax.set_title('Order Parameter Comparison')
    ax.set_xlabel('Layer Index')
    ax.set_ylabel('Order Parameter r')
    ax.set_ylim([0, 1.1])
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\n=== Comparison Statistics ===")
    print(f"With phase modulation:")
    print(f"  Average order parameter: {np.mean(order_with):.4f}")
    print(f"  Final order parameter: {order_with[-1]:.4f}")
    print(f"\nWithout phase modulation:")
    print(f"  Average order parameter: {np.mean(order_without):.4f}")
    print(f"  Final order parameter: {order_without[-1]:.4f}")

compare_phase_modulation()

## 9. Save and Load Model

In [None]:
# Save model
save_path = "hakorn_model.pt"
torch.save({
    'model_state_dict': model.state_dict(),
    'config': config,
}, save_path)

print(f"Model saved to {save_path}")

# Load model
checkpoint = torch.load(save_path)
loaded_model = HAKORNTransformer(**checkpoint['config']).to(device)
loaded_model.load_state_dict(checkpoint['model_state_dict'])
loaded_model.eval()

print(f"Model loaded from {save_path}")

## 10. Summary and Next Steps

This notebook demonstrated:

1. ✅ **Kuramoto phase dynamics** for attention head synchronization
2. ✅ **Hyperbolic attention** with geodesic distances
3. ✅ **Adaptive coupling** between oscillators
4. ✅ **H-AKORN loss** with synchronization regularization
5. ✅ **Phase-modulated attention** for improved representation

### Next Steps:

- **Integration with CGT project**: Use `LorentzSubstrateHardened` for hyperbolic operations
- **Distillation from GPT-2**: Use `TeacherDistillationLoss` from `hyperbolic_lm_losses.py`
- **Large-scale training**: Use `train_hakorn.py` script
- **Real tokenization**: Integrate with GPT-2 tokenizer
- **Evaluation**: Test on downstream NLP tasks

### Key References:

- Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators
- Nickel & Kiela (2017). Poincaré Embeddings for Learning Hierarchical Representations
- Ganea et al. (2018). Hyperbolic Neural Networks

---

**Author:** Éric Gustavo Reis de Sena  
**License:** CC-BY-NC-SA-4.0  
**Contact:** eirikreisena@gmail.com