# 🔬 DiffLlama vs Llama: Google Colab Experiment

This Notebook is designed to run a comparative experiment on Google Colab environment to evaluate the noise robustness of DiffLlama and Llama on mathematical reasoning tasks.

## 📋 Experiment Overview
- **Objective**: Compare DiffLlama-375M and Llama-375M performance on noisy math problems
- **Dataset**: GSM8K math reasoning dataset and its noisy variants
- **Evaluation**: Zero-shot performance + attention mechanism analysis
- **Environment**: Google Colab (GPU recommended)

---

## 🚀 Step 1: Environment Setup

First, check the runtime environment and configure necessary settings.

In [None]:
# Check GPU availability
import torch
print(f"🖥️  CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🔧 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️  No GPU detected. Experiment will be slow on CPU.")

🖥️  CUDA available: False
⚠️  No GPU detected. Experiment will be slow on CPU.


In [None]:
# Clone from Git repository if project files are not in current directory
# Replace with your actual repository URL
import os
if not os.path.exists('colab/experiment.py'):
    print("📥 Cloning repository...")
    !git clone https://github.com/github-bowen/DiffLlama-Math-Robustness.git
    print("📥 Copying files...")
    !cp -r DiffLlama-Math-Robustness/* .
    print("📥 Removing repository...")
    !rm -rf DiffLlama-Math-Robustness
    print("📥 Done")
else:
    print("✅ Project files found")

✅ Project files found


## 📁 Step 2: Upload Project Files

If you didn't clone using Git, manually upload the following files to Colab:

**Required Files**:
- `colab_experiment.py` (main Colab script)
- `pre_download_models.py` (model download script)
- All Python files in the `src/` directory
- `requirements.txt`

Use Colab's file upload feature or copy files from Google Drive.

## 📖 Step 3: View Usage Instructions

Run the command below to view detailed usage instructions and options.

In [None]:
# Display usage instructions
!python colab/experiment.py --instructions


🎯 GOOGLE COLAB USAGE INSTRUCTIONS

1. 📱 Basic Setup (Run once):
   !python colab/experiment.py --setup

2. 🚀 Quick Test (Recommended first run):
   !python colab/experiment.py --mode quick

3. 📊 Medium Experiment:
   !python colab/experiment.py --mode medium

4. 🔬 Full Experiment:
   !python colab/experiment.py --mode full --max-samples 500

🔧 Options:
   --mode: quick/medium/full
  : Save models and results to Google Drive
   --max-samples: Limit number of evaluation samples
   --skip-attention: Skip attention analysis to save time
   --help: Show all options

💡 Tips:
   - Use to persist models across sessions
   - Start with quick mode to verify everything works
   - Monitor GPU memory usage in Colab

📁 Results will be saved to:
   - Local: /content/results/
   - Drive: /content/drive/MyDrive/DiffLlama_Experiment/results/



## 🔧 Step 4: Initial Setup

Run initial setup to install dependencies and configure the environment.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Run initial setup (includes Google Drive mounting)
!python colab/experiment.py --setup

🔬 DIFFLAMA VS LLAMA - GOOGLE COLAB EXPERIMENT
🕐 Start time: 2025-06-01 23:13:08
📦 Installing dependencies...
✅ Dependencies installed
🔧 Setting up Colab environment...
🔗 Mounting Google Drive...
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✅ Google Drive mounted successfully
📁 Using Google Drive storage: /content/drive/MyDrive/DiffLlama_Experiment
  ✓ cache -> /content/drive/MyDrive/DiffLlama_Experiment/models
  ✓ data -> /content/drive/MyDrive/DiffLlama_Experiment/data
  ✓ results -> /content/drive/MyDrive/DiffLlama_Experiment/results
✅ Setup completed


In [None]:
!ls -al

total 100
drwxr-xr-x 1 root root  4096 Jun  1 23:14 .
drwxr-xr-x 1 root root  4096 Jun  1 23:06 ..
lrwxrwxrwx 1 root root    50 Jun  1 23:14 cache -> /content/drive/MyDrive/DiffLlama_Experiment/models
drwxr-xr-x 2 root root  4096 Jun  1 23:14 colab
drwxr-xr-x 4 root root  4096 May 29 14:01 .config
lrwxrwxrwx 1 root root    48 Jun  1 23:14 data -> /content/drive/MyDrive/DiffLlama_Experiment/data
-rw-r--r-- 1 root root 16921 Jun  1 23:07 DiffLlama_Colab_Experiment.ipynb
drwxr-xr-x 2 root root  4096 Jun  1 23:07 docs
drwx------ 6 root root  4096 Jun  1 23:10 drive
-rw-r--r-- 1 root root  2663 Jun  1 23:07 interactive_inference.py
-rw-r--r-- 1 root root  1074 Jun  1 23:07 LICENSE
-rw-r--r-- 1 root root 16092 Jun  1 23:07 main.py
drwxr-xr-x 2 root root  4096 Jun  1 23:11 models_finetuned
-rw-r--r-- 1 root root  8948 Jun  1 23:07 README.md
-rw-r--r-- 1 root root   219 Jun  1 23:13 requirements.txt
lrwxrwxrwx 1 root root    51 Jun  1 23:14 results -> /content/drive/MyDrive/DiffLlama_Experimen

## 🚀 Step 5: Run Experiments

Choose an appropriate experiment mode based on your needs:

### 🏃 Quick Test (Recommended for First Run)
Validate the experiment workflow using a small number of samples, takes about 30-60 minutes.

In [None]:
# Quick test mode
!python colab/experiment.py --mode quick

### 📊 Medium-Scale Experiment
Use a moderate number of samples, balancing time and result quality.

In [None]:
# Medium-scale experiment (make sure quick test runs successfully first)
!python colab/experiment.py --mode medium

### 🔬 Full Experiment
Use the complete dataset for the experiment, may take several hours.

In [None]:
# Full experiment (run only when you have enough time)
!python colab/experiment.py --mode full --max-samples 500

### 🛠 Custom Experiment
Adjust experiment parameters as needed.

In [None]:
# Custom experiment example
# Only run evaluation, skip attention analysis to save time
!python colab/experiment.pyy --mode medium --skip-attention --max-samples 100

## 📊 Step 6: View Experiment Results

After completing the experiment, review the generated result files.

In [None]:
# List generated result files
!ls -la results/

In [None]:
# View the latest experiment summary
import json
import glob

# Find the latest summary file
summary_files = glob.glob('results/colab_summary_*.json')
if summary_files:
    latest_summary = max(summary_files)
    print(f"📋 Latest experiment summary: {latest_summary}")

    with open(latest_summary, 'r') as f:
        summary = json.load(f)

    print("\n📊 Experiment Summary:")
    for key, value in summary.items():
        print(f"  {key}: {value}")
else:
    print("No experiment summary found. Please run an experiment first.")

In [None]:
# Display main results
import pandas as pd

# Find the latest results file
result_files = glob.glob('results/colab_results_*.csv')
if result_files:
    latest_results = max(result_files)
    print(f"📈 Latest results: {latest_results}")

    df = pd.read_csv(latest_results)
    print("\n📊 Performance Comparison:")
    print(df.pivot(index='model', columns='dataset', values='accuracy'))

    # Calculate performance differences
    pivot_df = df.pivot(index='model', columns='dataset', values='accuracy')
    if 'llama' in pivot_df.index and 'diffllama' in pivot_df.index:
        print("\n🔍 Performance Difference (DiffLlama - Llama):")
        diff = pivot_df.loc['diffllama'] - pivot_df.loc['llama']
        print(diff)
else:
    print("No results found. Please run an experiment first.")

## 📈 Step 7: Results Visualization

If your experiment included attention analysis, you can view the generated attention heatmaps.

In [None]:
# Display attention heatmaps
import matplotlib.pyplot as plt
from IPython.display import Image, display
import os

attention_dir = 'results/attention_maps'
if os.path.exists(attention_dir):
    print("🧠 Attention Visualization Files:")

    # List all attention map files
    for root, dirs, files in os.walk(attention_dir):
        for file in files:
            if file.endswith('.png'):
                file_path = os.path.join(root, file)
                print(f"  📊 {file_path}")

                # Display images (optional, uncomment to show)
                # display(Image(file_path))
else:
    print("No attention maps found. Run experiment with attention analysis enabled.")

In [None]:
# Display attention analysis results
attention_files = glob.glob('results/colab_attention_*.json')
if attention_files:
    latest_attention = max(attention_files)
    print(f"🧠 Latest attention analysis: {latest_attention}")

    with open(latest_attention, 'r') as f:
        attention_data = json.load(f)

    print("\n📊 Attention Allocation Analysis:")
    for model, data in attention_data.items():
        print(f"\n{model.upper()} Model:")
        for condition, stats in data.items():
            print(f"  {condition.capitalize()}:")
            print(f"    KMI (Key Math Info): {stats['kmi_mean']:.3f} ± {stats['kmi_std']:.3f}")
            print(f"    NI (Noise Info): {stats['ni_mean']:.3f} ± {stats['ni_std']:.3f}")
            print(f"    OC (Other Context): {stats['oc_mean']:.3f} ± {stats['oc_std']:.3f}")
else:
    print("No attention analysis found. Run experiment with attention analysis enabled.")

## 💾 Step 8: Download Results

Download experiment results locally or ensure they are saved in Google Drive.

In [None]:
# Compress result files for download
import zipfile
from datetime import datetime

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
zip_filename = f'experiment_results_{timestamp}.zip'

with zipfile.ZipFile(zip_filename, 'w') as zipf:
    # Add all files from results directory
    for root, dirs, files in os.walk('results'):
        for file in files:
            file_path = os.path.join(root, file)
            zipf.write(file_path)

print(f"📦 Results packaged in: {zip_filename}")
print("You can download this file from Colab's Files panel.")

# Reminder if Google Drive was used
if os.path.exists('/content/drive/MyDrive/DiffLlama_Experiment'):
    print("\n💾 Results are also saved in Google Drive:")
    print("  /content/drive/MyDrive/DiffLlama_Experiment/")

## 🛠 Troubleshooting

If you encounter issues, try the following solutions:

In [None]:
# Clear GPU memory cache
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("✅ GPU cache cleared")

# Check available memory
import psutil
memory = psutil.virtual_memory()
print(f"💾 RAM: {memory.available / 1e9:.1f}GB available / {memory.total / 1e9:.1f}GB total")

In [None]:
# If memory is insufficient, you can restart the runtime (use with caution)
# import os
# os.kill(os.getpid(), 9)

## 🎯 Experiment Conclusions

Based on the experiment results, you can analyze the following key questions:

1. **Noise Robustness**: Does DiffLlama perform better on noisy data?
2. **Attention Mechanism**: Is differential attention more effective at focusing on key information?
3. **Performance Degradation**: How do both models' performances change across different noise types?

---

**Thank you for using this experiment framework!** 🎉

If you have issues, please check:
- If GPU memory is sufficient
- If all required files are uploaded
- If network connection is stable

**Tip**: It's recommended to run the quick test mode first to validate the environment before running the full experiment.