# TransJect Demo: Knowledge Transfer Framework

This notebook demonstrates the complete functionality of TransJect, a novel knowledge transfer framework for neural networks.

## Features Demonstrated:
1. **SequenceClassification** - Classification tasks (CB dataset from SuperGLUE)
2. **AutoModel** - Language modeling tasks (Alpaca dataset)
3. **Meta-Learning** - Using multiple meta dataloaders
4. **WandB Integration** - Logging to Weights & Biases
5. **Model Saving** - Saving trained models
6. **Layer Slicing** - Using student_layers parameter

## Setup Instructions for Google Colab:
```python
# Clone the repository
!git clone https://github.com/yourusername/transject.git
%cd transject

# Install the package
!pip install -e .
```

## üì¶ Installation & Setup

In [None]:
# For Google Colab: Upload the transject folder or install from pip
# Option 1: If running locally, just import
# Option 2: If on Colab, install dependencies

!pip install torch transformers datasets numpy tqdm scipy wandb -q

# Import sys to add local path if needed
import sys
import os

# Add parent directory to path if running from examples folder
if os.path.exists('../transject'):
    sys.path.insert(0, '..')

print("‚úÖ Dependencies installed!")

In [None]:
# Import TransJect modules
from transject import SequenceClassification, AutoModel, TransJectConfig
from transject.data_utils import (
    create_superglue_dataloaders,
    create_alpaca_dataloaders,
    create_meta_dataloaders
)

import torch
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
torch.manual_seed(42)

print(f"‚úÖ TransJect imported successfully!")
print(f"üîß PyTorch version: {torch.__version__}")
print(f"üñ•Ô∏è  Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")

## üéØ Part 1: Sequence Classification with CB Dataset

We'll use the CommitmentBank (CB) dataset from SuperGLUE for a 3-class classification task.

### 1.1 Basic Classification Example

In [None]:
# Create a SequenceClassification model
print("üöÄ Creating SequenceClassification model...")

model_classification = SequenceClassification(
    student_model="distilbert-base-uncased",
    teacher_model="bert-base-uncased",
    num_labels=3,  # CB has 3 labels
    student_layers=-1,  # -1 means use full model (no slicing)
    temperature=2.0,
    alpha=0.5,  # Balance between distillation and task loss
    learning_rate=2e-5,
    warmup_steps=50,
    log_interval=5,
    eval_interval=20
)

print("‚úÖ Model created successfully!")
print(f"üìä Student model: {model_classification.student_model_name}")
print(f"üë®‚Äçüè´ Teacher model: {model_classification.teacher_model_name}")

In [None]:
# Create dataloaders for CB task
print("üìö Loading CB dataset...")

train_loader_cb, val_loader_cb, num_labels = create_superglue_dataloaders(
    task_name="cb",
    tokenizer=model_classification.tokenizer,
    batch_size=8,
    max_length=128,
    num_train_samples=200,  # Use subset for quick demo
    num_val_samples=50
)

print(f"‚úÖ Dataloaders created!")
print(f"üìä Training batches: {len(train_loader_cb)}")
print(f"üìä Validation batches: {len(val_loader_cb)}")
print(f"üè∑Ô∏è  Number of labels: {num_labels}")

In [None]:
# Train the model (without W&B logging for simplicity)
print("üèãÔ∏è Training classification model...\n")

model_classification.fit(
    train_dataloader=train_loader_cb,
    val_dataloader=val_loader_cb,
    epochs=2,  # Short training for demo
    report_to=None,  # Set to "wandb" if you want W&B logging
    output_dir="./output/cb_basic"
)

print("\n‚úÖ Training completed!")

In [None]:
# Save the trained student model
print("üíæ Saving trained model...")

output_path = "./output/cb_basic/final_student_model"
model_classification.student_model.save_pretrained(output_path)
model_classification.tokenizer.save_pretrained(output_path)

print(f"‚úÖ Model saved to: {output_path}")

### 1.2 Classification with Meta-Learning

In [None]:
# Create a new model with meta-learning enabled
print("üöÄ Creating model with meta-learning...")

model_classification_meta = SequenceClassification(
    student_model="distilbert-base-uncased",
    teacher_model="bert-base-uncased",
    num_labels=3,
    student_layers=-1,
    use_meta_learning=True,
    meta_learning_rate=1e-4
)

print("‚úÖ Model with meta-learning created!")

In [None]:
# Create meta-learning dataloaders
print("üìö Creating meta-learning dataloaders...")

meta_loaders = create_meta_dataloaders(
    tokenizer=model_classification_meta.tokenizer,
    tasks=["rte", "wic"],  # Additional SuperGLUE tasks
    batch_size=8,
    max_length=128,
    num_samples_per_task=50  # Small subset for demo
)

print(f"‚úÖ Created {len(meta_loaders)} meta-dataloaders: {list(meta_loaders.keys())}")

In [None]:
# Train with meta-learning
print("üèãÔ∏è Training with meta-learning...\n")

model_classification_meta.fit(
    train_dataloader=train_loader_cb,
    meta_dataloader=meta_loaders,  # Dictionary of meta-loaders
    val_dataloader=val_loader_cb,
    epochs=2,
    report_to=None,
    output_dir="./output/cb_meta"
)

print("\n‚úÖ Meta-learning training completed!")

## ü§ñ Part 2: Language Modeling with Alpaca Dataset

Now we'll demonstrate language modeling using the Alpaca instruction-following dataset.

### 2.1 Basic Language Modeling

In [None]:
# Create an AutoModel for language modeling
print("üöÄ Creating AutoModel for language modeling...")

model_lm = AutoModel(
    student_model="gpt2",
    teacher_model="gpt2-medium",  # Using medium as teacher
    student_layers=-1,  # Use full GPT-2 model
    temperature=2.0,
    alpha=0.5,
    learning_rate=5e-5,
    warmup_steps=50,
    log_interval=5,
    eval_interval=20
)

print("‚úÖ Language model created successfully!")
print(f"üìä Student model: {model_lm.student_model_name}")
print(f"üë®‚Äçüè´ Teacher model: {model_lm.teacher_model_name}")

In [None]:
# Create Alpaca dataloaders
print("üìö Loading Alpaca dataset...")

try:
    train_loader_alpaca, val_loader_alpaca = create_alpaca_dataloaders(
        tokenizer=model_lm.tokenizer,
        batch_size=4,  # Smaller batch size for language modeling
        max_length=256,  # Shorter sequences for demo
        num_train_samples=100,  # Small subset for quick demo
        num_val_samples=20,
        dataset_name="tatsu-lab/alpaca"  # Official Alpaca dataset
    )
    
    print(f"‚úÖ Alpaca dataloaders created!")
    print(f"üìä Training batches: {len(train_loader_alpaca)}")
    print(f"üìä Validation batches: {len(val_loader_alpaca)}")
except Exception as e:
    print(f"‚ö†Ô∏è  Could not load Alpaca dataset: {e}")
    print("üí° You can use a custom dataset or skip this section")

In [None]:
# Train the language model
print("üèãÔ∏è Training language model...\n")

model_lm.fit(
    train_dataloader=train_loader_alpaca,
    val_dataloader=val_loader_alpaca,
    epochs=1,  # Just 1 epoch for demo
    report_to=None,  # Set to "wandb" for W&B logging
    output_dir="./output/alpaca_basic"
)

print("\n‚úÖ Language model training completed!")

In [None]:
# Test text generation with the trained model
print("üé® Testing text generation...\n")

prompt = "### Instruction:\nWrite a short poem about AI\n\n### Response:\n"
inputs = model_lm.tokenizer(prompt, return_tensors="pt")

if torch.cuda.is_available():
    inputs = {k: v.cuda() for k, v in inputs.items()}

# Generate text
with torch.no_grad():
    outputs = model_lm.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=True,
        top_p=0.9
    )

generated_text = model_lm.tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Generated text:")
print("="*50)
print(generated_text)
print("="*50)

In [None]:
# Save the trained language model
print("üíæ Saving trained language model...")

output_path_lm = "./output/alpaca_basic/final_student_model"
model_lm.student_model.save_pretrained(output_path_lm)
model_lm.tokenizer.save_pretrained(output_path_lm)

print(f"‚úÖ Model saved to: {output_path_lm}")

### 2.2 Language Modeling with Layer Slicing

Demonstrating the `student_layers` parameter to use only a subset of layers from the student model.

In [None]:
# Create model with layer slicing
print("üöÄ Creating model with layer slicing...")

model_lm_sliced = AutoModel(
    student_model="gpt2-medium",  # Start with GPT2-medium
    teacher_model="gpt2-large",   # Teacher is GPT2-large
    student_layers=12,  # Use only first 12 layers (GPT2-medium has 24)
    temperature=2.0,
    alpha=0.5
)

print("‚úÖ Model with layer slicing created!")
print(f"üìä Using first 12 layers of {model_lm_sliced.student_model_name}")

## üìä Part 3: Advanced Features

### 3.1 Using Custom Configuration

In [None]:
# Create custom configuration
from transject import TransJectConfig

custom_config = TransJectConfig(
    student_layers=-1,
    temperature=3.0,  # Higher temperature
    alpha=0.7,  # More weight on distillation
    learning_rate=1e-4,
    warmup_steps=200,
    max_grad_norm=1.0,
    accumulation_steps=2,  # Gradient accumulation
    use_meta_learning=True,
    meta_learning_rate=5e-5,
    log_interval=10,
    eval_interval=50,
    save_interval=200,
    fp16=True,  # Mixed precision training (if CUDA available)
    seed=42
)

# Save configuration
custom_config.to_json("./output/my_config.json")

print("‚úÖ Custom configuration created and saved!")
print(f"üìä Config: {custom_config.to_dict()}")

In [None]:
# Example with Llama models (requires HF token and model access)
# First, login to HuggingFace

from huggingface_hub import login
import os

# Get your HuggingFace token
# Option 1: Set as environment variable
# HF_TOKEN = os.getenv("HF_TOKEN")

# Option 2: Use userdata in Colab
try:
    from google.colab import userdata
    HF_TOKEN = userdata.get('HF_TOKEN')
except:
    # Option 3: Paste your token here (not recommended for public repos)
    HF_TOKEN = "YOUR_HF_TOKEN_HERE"  # Replace with your actual token
    print("‚ö†Ô∏è  Warning: Replace YOUR_HF_TOKEN_HERE with your actual token")
    print("üí° Better: Use Colab Secrets (left sidebar > üîë key icon) to store HF_TOKEN")

# Login to HuggingFace
if HF_TOKEN and HF_TOKEN != "YOUR_HF_TOKEN_HERE":
    login(token=HF_TOKEN)
    print("‚úÖ Logged in to HuggingFace!")
else:
    print("‚ùå Please set your HuggingFace token")
    print("üìù Get your token from: https://huggingface.co/settings/tokens")

# Now create Llama model with token
print("\nüöÄ Creating Llama-3-8B model...")

model_llama = AutoModel(
    student_model="meta-llama/Meta-Llama-3-8B",
    teacher_model="meta-llama/Meta-Llama-3-70B-Instruct",
    student_layers=-1,  # Use full Llama-3-8B
    temperature=2.0,
    alpha=0.5,
    token=HF_TOKEN  # Pass token for gated model access
)

print("\n‚úÖ Llama model created successfully!")
print(f"üìä Student: {model_llama.student_model_name}")
print(f"üë®‚Äçüè´ Teacher: {model_llama.teacher_model_name}")

# Test generation
prompt = "### Instruction:\nWrite a haiku about AI\n\n### Response:\n"
inputs = model_llama.tokenizer(prompt, return_tensors="pt")

import torch
if torch.cuda.is_available():
    inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    outputs = model_llama.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=True
    )

generated = model_llama.tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nüìù Generated text:")
print("="*60)
print(generated)
print("="*60)

### 3.2 Training with W&B Logging

To use Weights & Biases logging, uncomment and run the following cells:

In [None]:
# # Initialize W&B (uncomment to use)
# import wandb
# 
# wandb.login()  # You'll need to paste your API key
# 
# wandb.init(
#     project="transject-demo",
#     name="cb-classification",
#     config={
#         "student_model": "distilbert-base-uncased",
#         "teacher_model": "bert-base-uncased",
#         "task": "cb",
#         "epochs": 3
#     }
# )

In [None]:
# # Train with W&B logging (uncomment to use)
# model_classification.fit(
#     train_dataloader=train_loader_cb,
#     val_dataloader=val_loader_cb,
#     epochs=3,
#     report_to="wandb",  # Enable W&B logging
#     output_dir="./output/cb_wandb"
# )
# 
# wandb.finish()

### 3.3 Working with Llama Models

TransJect supports modern LLMs like Llama-3.

In [None]:
# # Example with Llama models (requires HF token and model access)
# # Uncomment to use
# 
# model_llama = AutoModel(
#     student_model="meta-llama/Llama-3-8B",
#     teacher_model="meta-llama/Llama-3-70B",  # If you have access
#     student_layers=-1,  # Use full Llama-3-8B
#     temperature=2.0,
#     alpha=0.5
# )
# 
# print("‚úÖ Llama model created!")

## üéì Part 4: Loading Saved Models

In [None]:
# Load a saved student model
from transformers import AutoModelForSequenceClassification, AutoTokenizer

print("üìÇ Loading saved model...")

# Load the model and tokenizer
loaded_model = AutoModelForSequenceClassification.from_pretrained(
    "./output/cb_basic/final_student_model"
)
loaded_tokenizer = AutoTokenizer.from_pretrained(
    "./output/cb_basic/final_student_model"
)

print("‚úÖ Model loaded successfully!")

# Test inference
test_text = ("The economy is improving.", "The financial situation is getting better.")
inputs = loaded_tokenizer(*test_text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = loaded_model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

print(f"\nüìä Test inference:")
print(f"Premise: {test_text[0]}")
print(f"Hypothesis: {test_text[1]}")
print(f"Prediction: {predictions.item()} (0=entailment, 1=contradiction, 2=neutral)")

## üìù Summary

In this demo, we covered:

### ‚úÖ Classification Tasks
- Basic sequence classification with CB dataset
- Meta-learning with multiple tasks
- Model saving and loading

### ‚úÖ Language Modeling Tasks
- Training on Alpaca instruction dataset
- Text generation with trained models
- Layer slicing for efficient knowledge transfer

### ‚úÖ Advanced Features
- Custom configuration
- W&B integration (optional)
- Support for modern LLMs (Llama-3, etc.)

### üéØ Key API Pattern

```python
# 1. Create model
model = SequenceClassification(  # or AutoModel
    student_model="model-name",
    teacher_model="teacher-name",
    student_layers=-1,  # -1 for full model
    **config_params
)

# 2. Create dataloaders
train_loader, val_loader = create_dataloaders(...)
meta_loaders = create_meta_dataloaders(...)  # Optional

# 3. Train
model.fit(
    train_dataloader=train_loader,
    meta_dataloader=meta_loaders,  # Optional
    val_dataloader=val_loader,
    epochs=3,
    report_to="wandb"  # Optional
)

# 4. Save
model.student_model.save_pretrained("path")
```

## üöÄ Next Steps

1. Try different model combinations
2. Experiment with hyperparameters
3. Use your own datasets
4. Enable W&B logging for better tracking
5. Try layer slicing for efficient training

## üìö Resources

- [GitHub Repository](https://github.com/transject/transject)
- [Documentation](https://transject.readthedocs.io)
- [Paper (coming soon)]()

Happy knowledge transferring! üéì