# Lab 6 – Pruning an LLM with Unsloth (SST-2)> **⚠️ IMPORTANT**: This lab requires **Google Colab with GPU enabled**> - Go to Runtime → Change runtime type → GPU (T4 or better)> - Unsloth requires CUDA and will not work on Mac/Windows locally> - See `COLAB_SETUP.md` for detailed setup instructionsPruning removes redundant neurons and weights from a neural network to reduce its size and inference time. In this lab, you'll experiment with both structured and unstructured pruning on a sentiment-classification task using the SST-2 dataset.## Why Prune? The Trade-offs**Benefits of Pruning:**- 🚀 **Faster Inference**: Fewer parameters = faster computation- 💾 **Memory Savings**: Smaller model size = less RAM/VRAM usage- 📱 **Deployment**: Easier to deploy on edge devices- ⚡ **Energy Efficiency**: Less computation = lower power consumption**Trade-offs:**- 📉 **Accuracy Loss**: Removing parameters can hurt performance- 🔧 **Tuning Required**: Finding the right sparsity level is crucial- ⚖️ **Balance**: More pruning = more speed, but potentially more accuracy loss## Objectives- Fine-tune a model for sentiment analysis on the SST-2 dataset.- **Evaluate baseline performance** before pruning (accuracy, speed, memory)- Apply pruning techniques to remove unnecessary parameters- **Compare performance** after pruning (accuracy vs. speed trade-offs)- Measure sparsity, model size reduction, and changes in inference speed and accuracy- **Analyze the trade-offs**: How much accuracy do we lose for how much speed gain?You can use Unsloth's API or PyTorch's pruning utilities (e.g., `torch.nn.utils.prune`) to perform pruning. Adjust hyperparameters to explore different sparsity levels.

In [None]:
# Install Unsloth using the official auto-install script
# This automatically detects your environment and installs the correct version
!wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -

# Alternative manual installation if auto-install fails:
# !pip install --upgrade pip
# !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# !pip install "unsloth_zoo @ git+https://github.com/unslothai/unsloth-zoo.git"

print("✅ Unsloth installation complete! Now restart runtime before proceeding.")
print("⚠️ IMPORTANT: Use GPU runtime, not TPU! Unsloth requires CUDA GPU.")

> **⚠️ CRITICAL IMPORT ORDER**: 
> - Always import `unsloth` FIRST before any other ML libraries
> - This prevents weights/biases initialization errors
> - Example: `from unsloth import FastLanguageModel` then `import torch`


### Step 1: Load SST-2 dataset

**Documentation:**
- GLUE benchmark: https://huggingface.co/datasets/glue
- SST-2 dataset: https://huggingface.co/datasets/glue/viewer/sst2


In [None]:
# TODO: Import datasets and tokenizer libraries# TODO: Load subsets of the SST-2 dataset# Hint: Use load_dataset('glue', 'sst2', split='train[:5%]')# TODO: Print a sample from the dataset# TODO: Initialize tokenizer from base model# TODO: Define max_length for tokenization# TODO: Create tokenization function that:#   - Tokenizes the sentence field#   - Uses padding='max_length' and truncation=True# TODO: Apply tokenization to train and validation datasets# TODO: Print confirmation that tokenized dataset is ready

### Step 2: Fine-tune a sentiment classifier on SST-2

**Documentation:**
- Transformers training: https://huggingface.co/docs/transformers/training
- Sequence classification: https://huggingface.co/docs/transformers/tasks/sequence_classification


In [None]:
# CRITICAL: Import unsloth FIRST to avoid weights/biases initialization errors# TODO: Import torch and FastLanguageModel from unsloth# TODO: Load a base model for classification# Example: "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"# TODO: Attach a classification head to the model# Hint: You may need to use AutoModelForSequenceClassification# TODO: Create data loaders for training and validation# TODO: Implement training loop:#   - Define optimizer and loss function#   - For each epoch:#     - For each batch:#       - Forward pass#       - Calculate loss#       - Backward pass#       - Update weights# TODO: Print confirmation that fine-tuning is complete

### Step 3: Apply pruning to the fine-tuned model

**Documentation:**
- PyTorch pruning tutorial: https://pytorch.org/tutorials/intermediate/pruning_tutorial.html
- torch.nn.utils.prune: https://pytorch.org/docs/stable/nn.html#utilities
- **Note**: Unsloth doesn't have specific pruning examples, but you can:
  - Fine-tune with Unsloth first: [Qwen 2.5 example](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb)
  - Then apply PyTorch pruning to the trained model
  - Pruning is typically done as a post-training optimization


In [None]:
# TODO: Import pruning utilities from torch

# TODO: Apply unstructured pruning to linear layers
# Example approach:
#   - Iterate through model.named_modules()
#   - For each Linear layer, apply prune.l1_unstructured()
#   - Choose pruning amount (e.g., 0.2 for 20%)

# TODO: Alternatively, try structured pruning
# Hint: Use prune.ln_structured() for structured pruning

# TODO: Print confirmation that pruning is complete


### Step 4: Evaluate pruned model and measure sparsity

**Documentation:**
- Model evaluation: https://huggingface.co/docs/transformers/tasks/sequence_classification#evaluate


In [None]:
# TODO: Evaluate the model on validation set

# TODO: Compute sparsity
# Hint: Create a function that:
#   - Counts total parameters
#   - Counts zero parameters
#   - Calculates sparsity = zero_params / total_params

# TODO: Measure inference latency

# TODO: Print evaluation results including:
#   - Accuracy
#   - Sparsity percentage
#   - Inference latency


## Reflection

- What sparsity levels did you achieve with different pruning configurations (e.g., 20%, 50%)?
- How did pruning affect accuracy and inference latency? Did structured pruning behave differently from unstructured pruning?
- Discuss how pruning, combined with quantization or distillation, could make LLMs more viable for deployment on resource-constrained devices.
