<a href="https://colab.research.google.com/github/peremartra/llama-glu-expansion-pruning/blob/main/notebooks/04_Paper_Graphics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GLU Pruning Research - Llama-3.2-1B Benchmark Analysis
## 04 - Visualization and Analysis of Benchmark Results

### Exploring the dichotomy between knowledge degradation and reasoning improvement
by [Pere Martra](https://github.com/peremartra)

[![Paper](https://img.shields.io/badge/OSF-Paper-blue?logo=osf&logoColor=white)](https://doi.org/10.31219/osf.io/qgxea)
[![GitHub](https://img.shields.io/badge/⭐_Star-OptiPFair-orange?logo=github&logoColor=white)](https://github.com/peremartra/optipfair)
[![PyPI](https://img.shields.io/pypi/v/optipfair?logo=python&logoColor=white&label=v)](https://pypi.org/project/optipfair/)

**Repository:** [github.com/peremartra/llama-glu-expansion-pruning](https://github.com/peremartra/llama-glu-expansion-pruning)

---

**Colab Environment:** CPU (no GPU required)

**Estimated Runtime:** ~5 minutes
## Objective
This notebook loads the complete evaluation results for the Llama-3.2-1B model from `llama_1b_complete_results_latest.json`.

The primary goal is to visualize the impact of pruning, using the **`expansion_rate`** as the primary independent variable (X-axis). We will explore the "capability trade-off" hypothesis: identifying which benchmarks degrade ("Fragile Capabilities") and which remain robust or even improve ("Robust Capabilities") as the expansion rate is reduced.

In [None]:
# === 1. Setup & Imports ===

# Install necessary libraries
!pip install pandas matplotlib seaborn

# Import libraries
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Configure plots for better readability
sns.set_theme(style="whitegrid", palette="muted")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14



In [None]:
# === 2. Load Data ===

# IMPORTANT: Upload your 'llama_1b_complete_results_latest.json' file to the Colab environment first.

# Download utils.py from GitHub repository
!wget -q https://raw.githubusercontent.com/peremartra/llama-glu-expansion-pruning/main/results/llama_1b_complete_results_latest.json
!wget -q https://raw.githubusercontent.com/peremartra/llama-glu-expansion-pruning/main/results/llama_3b_complete_results_latest.json

# Verify download
import os
if os.path.exists('llama_1b_complete_results_latest.json'):
    print("✅ llama_1b_complete_results_latest.json downloaded successfully")
else:
    print("❌ Failed to download llama_1b_complete_results_latest.json")

# Load the JSON data
try:
    file_path_1b = 'llama_1b_complete_results_latest.json' # Define file_path here
    with open(file_path_1b, 'r') as f:
        data1b = json.load(f)
    file_path_3b = 'llama_3b_complete_results_latest.json' # Define file_path here
    with open(file_path_3b, 'r') as f:
        data3b = json.load(f)
    print("File loaded successfully into 'data' variable.")
except Exception as e:
    print(f"ERROR: Could not read or parse JSON file. {e}")
    data = None

if data:
    models_data = data['models_evaluated']
    # print(json.dumps(models_data['baseline'], indent=2)) # Uncomment to inspect baseline data

✅ llama_1b_complete_results_latest.json downloaded successfully
File loaded successfully into 'data' variable.


## Section 1: Key Hypothesis Graphs (The "Trade-off")

This section focuses on our core hypothesis. By plotting normalized scores, we can directly visualize the trade-off: some capabilities (Fragile) collapse, while others (Robust) are maintained or even improve.