# üêâ BDH Interpretability Suite - Training Notebook

This notebook trains BDH models on the Europarl parallel corpus for the KRITI 2026 AI Interpretability Challenge.

**What we'll do:**
1. Download Europarl English-French and English-Portuguese data
2. Train a French specialist model
3. Train a Portuguese specialist model
4. Merge both models into a polyglot
5. Generate visualization data for the frontend

**Requirements:** Google Colab Pro (for GPU and memory)

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Clone the repository (or upload files)
!git clone https://github.com/YOUR_USERNAME/bdh-interpretability.git
%cd bdh-interpretability

In [None]:
# Install dependencies
!pip install torch numpy datasets tqdm pyyaml

## 1. Download Europarl Data

In [None]:
# Download Europarl for En-Fr and En-Pt
!python training/download_europarl.py --languages en-fr en-pt --output data/

In [None]:
# Check data files
!ls -lh data/en-fr/
!ls -lh data/en-pt/

## 2. Train French Specialist

In [None]:
# Training configuration for French model
french_config = """
train_data: "data/en-fr/train.bin"
val_data: "data/en-fr/val.bin"

# Model architecture (must match for merging!)
n_layer: 8
n_embd: 256
n_head: 4
mlp_multiplier: 128
dropout: 0.1
vocab_size: 256

# Training
batch_size: 32
block_size: 512
max_iters: 10000
learning_rate: 1.0e-3
gradient_accumulation_steps: 4

# Output
output_dir: "checkpoints"
run_name: "french_specialist"

device: "cuda"
dtype: "bfloat16"
compile_model: true
"""

with open('training/configs/french_colab.yaml', 'w') as f:
    f.write(french_config)

print("French config saved!")

In [None]:
# Train French model
!python training/train.py --config training/configs/french_colab.yaml

## 3. Train Portuguese Specialist

In [None]:
# Training configuration for Portuguese model
portuguese_config = """
train_data: "data/en-pt/train.bin"
val_data: "data/en-pt/val.bin"

# Model architecture (MUST match French!)
n_layer: 8
n_embd: 256
n_head: 4
mlp_multiplier: 128
dropout: 0.1
vocab_size: 256

# Training
batch_size: 32
block_size: 512
max_iters: 10000
learning_rate: 1.0e-3
gradient_accumulation_steps: 4

# Output
output_dir: "checkpoints"
run_name: "portuguese_specialist"

device: "cuda"
dtype: "bfloat16"
compile_model: true
"""

with open('training/configs/portuguese_colab.yaml', 'w') as f:
    f.write(portuguese_config)

print("Portuguese config saved!")

In [None]:
# Train Portuguese model
!python training/train.py --config training/configs/portuguese_colab.yaml

## 4. Merge Models

In [None]:
# Merge the two specialists into a polyglot
!python analysis/merge.py \
    --model1 checkpoints/french_specialist/checkpoint_best.pt \
    --model2 checkpoints/portuguese_specialist/checkpoint_best.pt \
    --output checkpoints/merged_polyglot.pt \
    --name1 french \
    --name2 portuguese

## 5. Run Monosemanticity Analysis

In [None]:
# Analyze French model for monosemantic synapses
!python analysis/monosemanticity.py \
    --model checkpoints/french_specialist/checkpoint_best.pt \
    --output analysis_results/french/

In [None]:
# Analyze merged model
!python analysis/monosemanticity.py \
    --model checkpoints/merged_polyglot.pt \
    --output analysis_results/merged/

## 6. Generate Playback Data for Frontend

In [None]:
# Generate visualization data for the French model
!python scripts/generate_playback.py \
    --model checkpoints/french_specialist/checkpoint_best.pt \
    --output frontend/public/playback/french/ \
    --include-graph

In [None]:
# Generate for merged model
!python scripts/generate_playback.py \
    --model checkpoints/merged_polyglot.pt \
    --output frontend/public/playback/merged/ \
    --include-graph

## 7. Quick Test

In [None]:
import torch
import sys
sys.path.insert(0, 'training')
from bdh import load_model, ExtractionConfig

# Load French model
model = load_model('checkpoints/french_specialist/checkpoint_best.pt', 'cuda')
print(f"Loaded model: {model.config.n_layer}L, {model.config.n_embd}D, {model.config.n_head}H")
print(f"Neurons per head: {model.config.n_neurons}")
print(f"Total neurons: {model.config.total_neurons}")

In [None]:
# Test sparsity
text = "The European Parliament adopted the resolution."
tokens = torch.tensor([list(text.encode('utf-8'))], dtype=torch.long, device='cuda')

config = ExtractionConfig(capture_sparse_activations=True)

with torch.no_grad():
    with model.extraction_mode(config) as buffer:
        logits, _ = model(tokens)
        stats = buffer.get_sparsity_stats()

print(f"\nüìä Sparsity Statistics:")
print(f"   Overall X sparsity: {stats['overall_x_sparsity']:.1%}")
print(f"   Overall Y sparsity: {stats['overall_y_sparsity']:.1%}")
print(f"   Combined sparsity: {stats['overall_sparsity']:.1%}")

In [None]:
# Test generation
prompt = "<F:en>The European Union<T:fr>"
prompt_tokens = torch.tensor([list(prompt.encode('utf-8'))], dtype=torch.long, device='cuda')

with torch.no_grad():
    output = model.generate(prompt_tokens, max_new_tokens=50, top_k=5)

generated = bytes(output[0].cpu().tolist()).decode('utf-8', errors='backslashreplace')
print(f"Generated: {generated}")

## 8. Download Results

In [None]:
# Zip checkpoints and analysis results for download
!zip -r bdh_results.zip checkpoints/ analysis_results/ frontend/public/playback/

In [None]:
# Download (in Colab)
from google.colab import files
files.download('bdh_results.zip')

## Done! üéâ

Your trained models and analysis results are ready. Next steps:

1. Extract `bdh_results.zip` 
2. Copy `frontend/public/playback/` files to your local frontend
3. Run the frontend: `cd frontend && npm install && npm run dev`
4. Explore your trained BDH models!