# SWS without Hyperpriors - LeNet-300-100

Run soft weight-sharing on LeNet-300-100 **without hyperpriors** and with `init_sigma=0.25`.

## Configuration
- **Model**: LeNet-300-100 (pretrained)
- **Hyperpriors**: Disabled (`--no-hyperpriors`)
- **Init sigma**: 0.25
- **Epochs**: 100 (full retraining)

## Outputs
- Trained models: `no_hp/lenet300_no_hp/`
- All diagnostic plots: `no_hp/lenet300_no_hp/figures/`
- Training GIF animation showing weight evolution

---
## 1. Run SWS Retraining (No Hyperpriors)

In [None]:
!python run_sws.py \
    --preset lenet_300_100 \
    --load-pretrained checkpoints/mnist_lenet_300_100_pre.pt \
    --pretrain-epochs 0 \
    --retrain-epochs 100 \
    --init-sigma 0.25 \
    --no-hyperpriors \
    --batch-size 128 \
    --num-workers 2 \
    --eval-every 1 \
    --log-mixture-every 1 \
    --make-gif \
    --gif-fps 2 \
    --quant-skip-last \
    --quant-assign map \
    --run-name lenet300_no_hp \
    --save-dir no_hp \
    --seed 42

---
## 2. Generate All Diagnostic Plots

In [None]:
# Set the run directory for plotting scripts
RUN_DIR = "no_hp/lenet300_no_hp"
print(f"Run directory: {RUN_DIR}")

### Training Curves (Accuracy, Loss, Compression)

In [None]:
!python scripts/plot_curves.py --run-dir {RUN_DIR}

### Mixture Evolution Over Epochs

In [None]:
!python scripts/plot_mixture_dynamics.py --run-dir {RUN_DIR}

### Weight Movement (Pretrained → Retrained)

In [None]:
!python scripts/plot_weights_scatter.py --run-dir {RUN_DIR} --sample 20000

### Final Mixture Components

In [None]:
!python scripts/plot_mixture.py --run-dir {RUN_DIR} --checkpoint prequant

---
## 3. Display Results

### Training Evolution GIF

In [None]:
from IPython.display import Image, display
import os

gif_path = f"{RUN_DIR}/figures/retraining.gif"
if os.path.exists(gif_path):
    display(Image(filename=gif_path))
    print(f"\n💡 Animation shows weight evolution over {100} epochs")
else:
    print(f"⚠️  GIF not found at {gif_path}")

### Compression Report

In [None]:
import json

# Load compression report
with open(f"{RUN_DIR}/report.json") as f:
    report = json.load(f)

# Load summary
with open(f"{RUN_DIR}/summary_paper_metrics.json") as f:
    summary = json.load(f)

print("="*60)
print("COMPRESSION RESULTS (No Hyperpriors)")
print("="*60)
print(f"Original bits:        {report['orig_bits']:,}")
print(f"Compressed bits:      {report['compressed_bits']:,}")
print(f"Compression Ratio:    {report['CR']:.2f}x")
print(f"Non-zero weights:     {report['nnz']:,} / 641,875")
print(f"Sparsity:             {100*(1 - report['nnz']/641875):.2f}%")
print()
print(f"Pretrain accuracy:    {summary['acc_pretrain']:.4f} ({summary['acc_pretrain']*100:.2f}%)")
print(f"Retrain accuracy:     {summary['acc_retrain']:.4f} ({summary['acc_retrain']*100:.2f}%)")
print(f"Quantized accuracy:   {summary['acc_quantized']:.4f} ({summary['acc_quantized']*100:.2f}%)")
print(f"Total accuracy drop:  {summary['Delta[%]']:.2f}%")
print("="*60)

### Layer-wise Compression Breakdown

In [None]:
print("\nLayer-wise Compression:")
print(f"{'Layer':<15} {'Shape':<20} {'Original (bits)':<18} {'Compressed (bits)':<20} {'CR':<8} {'Sparsity':<10}")
print("-" * 100)

import numpy as np

for layer_info in report['layers']:
    if layer_info.get('passthrough', False):
        cr_str = "N/A"
        sparsity = 0.0
    else:
        compressed = (layer_info['bits_IR'] + layer_info['bits_IC'] + 
                     layer_info['bits_A'] + layer_info['bits_codebook'])
        cr = layer_info['orig_bits'] / max(compressed, 1)
        cr_str = f"{cr:.2f}x"
        total_weights = np.prod(layer_info['shape'])
        sparsity = 100 * (1 - layer_info['nnz'] / total_weights)
    
    shape_str = 'x'.join(map(str, layer_info['shape']))
    orig_str = f"{layer_info['orig_bits']:,}"
    comp_str = f"{layer_info['bits_IR'] + layer_info['bits_IC'] + layer_info['bits_A'] + layer_info['bits_codebook']:,}"
    
    print(f"{layer_info['layer']:<15} {shape_str:<20} {orig_str:<18} {comp_str:<20} {cr_str:<8} {sparsity:.1f}%")

---
## Summary

Successfully ran SWS compression **without hyperpriors** on LeNet-300-100.

### Key Observations:
- Without hyperpriors, the mixture parameters evolve based purely on the data
- `init_sigma=0.25` controls initial variance of mixture components
- All diagnostics saved to `no_hp/lenet300_no_hp/figures/`

### Next Steps:
- Compare with hyperprior-enabled results
- Run Bayesian Optimization to find optimal hyperpriors (see `BO.ipynb`)
- Use best hyperparameters in `compression.ipynb`