# üß† PIL Language Model Training

**Hybrid Transformer with Pseudoinverse Learning (PIL)**

This notebook trains a language model that uses PIL (gradient-free) for FFN layers instead of backpropagation.

---
**Quick Start:**
1. Runtime ‚Üí Change runtime type ‚Üí GPU (T4)
2. Run all cells

In [None]:
# Step 1: Clone the repository
!git clone https://github.com/sanjuz-cas/PIL.git
%cd PIL

In [None]:
# Step 2: Install dependencies
!pip install -q torch datasets transformers tqdm

In [None]:
# Step 3: Verify installation
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Step 4: Run training on WikiText-2
!python examples/train_pil_lm.py \
    --dataset wikitext \
    --embed_dim 256 \
    --num_layers 4 \
    --max_train_samples 5000 \
    --max_eval_samples 1000 \
    --num_epochs 1 \
    --device cuda

## üî¨ Interactive Testing

Run the cells below to test the model interactively.

In [None]:
# Load the trained model and test generation
import sys
sys.path.insert(0, '.')

from app.core.pil_lm import PILLMConfig, PILLanguageModel
from transformers import GPT2Tokenizer
import torch

# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Create model
config = PILLMConfig(
    vocab_size=50257,
    embed_dim=256,
    num_heads=4,
    num_layers=4,
    max_seq_len=128
)
model = PILLanguageModel(config)

# Check if saved model exists
import os
if os.path.exists('pil_lm_model.pt'):
    model.load_state_dict(torch.load('pil_lm_model.pt'))
    print("Loaded trained model!")
else:
    print("No saved model found. Using untrained model.")

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
print(f"Model on: {device}")

In [None]:
# Generate text
def generate_text(prompt, max_tokens=50, temperature=0.8):
    tokens = tokenizer.encode(prompt, return_tensors='pt').to(device)
    
    with torch.no_grad():
        output = model.generate(
            tokens,
            max_new_tokens=max_tokens,
            temperature=temperature,
            top_k=50,
            top_p=0.9
        )
    
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test prompts
prompts = [
    "The future of artificial intelligence",
    "Once upon a time",
    "In the year 2050"
]

for prompt in prompts:
    print(f"\nüìù Prompt: {prompt}")
    print(f"ü§ñ Generated: {generate_text(prompt)}")
    print("-" * 50)

## üìä Training with Custom Parameters

Adjust these parameters based on your needs:

In [None]:
# Custom training with larger model (use GPU!)
!python examples/train_pil_lm.py \
    --dataset wikitext \
    --embed_dim 384 \
    --num_layers 6 \
    --num_heads 6 \
    --max_train_samples 10000 \
    --max_eval_samples 2000 \
    --num_epochs 3 \
    --device cuda