In [62]:
!cd /kaggle/working/energy_aware_quantization && git pull origin main

remote: Enumerating objects: 8, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 5 (delta 3), reused 5 (delta 3), pack-reused 0 (from 0)[K
Unpacking objects: 100% (5/5), 3.68 KiB | 1.84 MiB/s, done.
From https://github.com/krishkc5/energy_aware_quantization
 * branch            main       -> FETCH_HEAD
   7e29686..f2e7dcf  main       -> origin/main
Updating 7e29686..f2e7dcf
Fast-forward
 MANUAL_POLLING_FIX.md | 219 [32m++++++++++++++++++++++++++++++++++++++++++++++++++[m
 src/power_logger.py   | 128 [32m+++++++++++[m[31m------------------[m
 2 files changed, 267 insertions(+), 80 deletions(-)
 create mode 100644 MANUAL_POLLING_FIX.md


# Energy-Aware Quantization Experiments
## Krishna's Complete Measurement Harness on Kaggle

This notebook runs FP32, FP16, and INT8 experiments with comprehensive energy measurements.

**Setup:**
1. Enable GPU: Settings ‚Üí Accelerator ‚Üí GPU P100/T4
2. Enable Internet: Settings ‚Üí Internet ‚Üí On (to clone GitHub repo)

## Step 1: Clone or Update Repository

In [63]:
import os
from pathlib import Path

repo_path = Path("/kaggle/working/energy_aware_quantization")

if repo_path.exists():
    print("Repository exists, pulling latest changes...")
    !cd /kaggle/working/energy_aware_quantization && git pull origin main
else:
    print("Cloning repository...")
    !cd /kaggle/working && git clone https://github.com/YOUR_USERNAME/energy_aware_quantization.git

print("\n‚úì Repository ready!")

# Change to repo directory
os.chdir("/kaggle/working/energy_aware_quantization")
print(f"Working directory: {os.getcwd()}")

Repository exists, pulling latest changes...
From https://github.com/krishkc5/energy_aware_quantization
 * branch            main       -> FETCH_HEAD
Already up to date.

‚úì Repository ready!
Working directory: /kaggle/working/energy_aware_quantization


## Step 2: Install Dependencies

In [64]:
# Install requirements (most should already be in Kaggle)
!pip install -q torch transformers numpy pandas tqdm

print("Dependencies installed")

Dependencies installed


## Step 3: Verify Installation and GPU

In [65]:
import sys
import torch

# Add src to path
sys.path.insert(0, '/kaggle/working/energy_aware_quantization')

print("="*70)
print("SYSTEM CHECK")
print("="*70)

print(f"\nPython: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print("\n‚úì GPU is ready!")
else:
    print("\n No GPU! Please enable GPU in Settings.")

print("="*70)

SYSTEM CHECK

Python: 3.11.13 (main, Jun  4 2025, 08:57:29) [GCC 11.4.0]
PyTorch: 2.6.0+cu124
CUDA available: True
CUDA version: 12.4
GPU: Tesla P100-PCIE-16GB
GPU memory: 17.06 GB

‚úì GPU is ready!


## Step 4: Check Datasets

The pre-tokenized datasets should be in the repo.

In [66]:
from pathlib import Path

datasets_dir = Path("/kaggle/working/energy_aware_quantization/datasets")

print("Checking datasets...\n")

variants = ["tokenized_data", "tokenized_data_small", "tokenized_data_large", "tokenized_data_standard"]

for variant in variants:
    variant_dir = datasets_dir / variant
    if variant_dir.exists():
        files = list(variant_dir.glob("*.pt")) + list(variant_dir.glob("*.json"))
        print(f"‚úì {variant}: {len(files)} files")
    else:
        print(f"‚ùå {variant}: NOT FOUND")

print("\n‚úì Datasets verified")

Checking datasets...

‚úì tokenized_data: 4 files
‚úì tokenized_data_small: 4 files
‚úì tokenized_data_large: 4 files
‚úì tokenized_data_standard: 4 files

‚úì Datasets verified


## Step 5: Test Import of Measurement Modules

In [67]:
# Test imports
print("Testing module imports...\n")

try:
    from src import load_pre_tokenized, warmup, PowerLogger, run_inference, compute_energy
    print("‚úì src modules imported")
except ImportError as e:
    print(f"‚ùå Failed to import src: {e}")

try:
    from models import load_model, get_model_info
    print("‚úì models module imported")
except ImportError as e:
    print(f"‚ùå Failed to import models: {e}")

print("\n‚úì All imports successful!")

Testing module imports...

‚úì src modules imported
‚úì models module imported

‚úì All imports successful!


## Step 6: Quick Test - Load Dataset and Model

In [68]:
from src import load_pre_tokenized
from models import load_model

print("Testing dataset and model loading...\n")

# Load small dataset for quick test
input_ids, mask, labels, metadata = load_pre_tokenized(
    "datasets/tokenized_data",
    device="cuda"
)

print(f"\nDataset loaded: {input_ids.shape[0]} samples")

# Load FP32 model
model = load_model(
    "distilbert-base-uncased-finetuned-sst-2-english",
    precision="fp32",
    device="cuda",
    verbose=True
)

print("\n‚úì Quick test passed!")

Testing dataset and model loading...

 Loaded 50 samples from datasets/tokenized_data
  - Input shape: torch.Size([50, 128])
  - Mask shape: torch.Size([50, 128])
  - Labels shape: torch.Size([50])
  - Device: cuda:0
  - Max sequence length: 128

Dataset loaded: 50 samples

Loading model: distilbert-base-uncased-finetuned-sst-2-english
Precision: fp32
Device: cuda
‚úì Model loaded successfully
  - Parameters: 66,955,010
  - Model size: 255.42 MB
  - Parameter dtype: torch.float32
  - Parameter device: cuda:0

‚úì Quick test passed!


## Option A: Run Experiments via Python Script

**Yes, you CAN run .py files in Kaggle notebooks using `!python`**

### Experiment 1: FP32 Baseline

In [69]:
# Run FP32 experiment
!python src/measure_energy.py \
    --precision fp32 \
    --dataset datasets/tokenized_data \
    --num_iters 1000 \
    --trial 1

ENERGY-AWARE QUANTIZATION EXPERIMENT
Precision: fp32
Model: distilbert-base-uncased-finetuned-sst-2-english
Dataset: datasets/tokenized_data
Trial: 1

Output will be saved to: results/fp32/trial_1_20251201_191929.csv

STEP 1: Checking GPU
 GPU ready for measurements
  - Device: Tesla P100-PCIE-16GB
  - Compute capability: 6.0
  - Total memory: 17.06 GB
  - Multi-processors: 56
  - Memory allocated: 0.00 GB
  - Memory reserved: 0.00 GB

STEP 2: Loading Dataset
 Loaded 50 samples from datasets/tokenized_data
  - Input shape: torch.Size([50, 128])
  - Mask shape: torch.Size([50, 128])
  - Labels shape: torch.Size([50])
  - Device: cuda:0
  - Max sequence length: 128
 Dataset validation passed
  - Batch size: 50
  - Sequence length: 128
  - Unique labels: [0, 1]

STEP 3: Loading Model

Loading model: distilbert-base-uncased-finetuned-sst-2-english
Precision: fp32
Device: cuda
2025-12-01 19:19:31.163048: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT

### Experiment 2: FP16

In [70]:
# Run FP16 experiment
!python src/measure_energy.py \
    --precision fp16 \
    --dataset datasets/tokenized_data \
    --num_iters 1000 \
    --trial 1

ENERGY-AWARE QUANTIZATION EXPERIMENT
Precision: fp16
Model: distilbert-base-uncased-finetuned-sst-2-english
Dataset: datasets/tokenized_data
Trial: 1

Output will be saved to: results/fp16/trial_1_20251201_192319.csv

STEP 1: Checking GPU
 GPU ready for measurements
  - Device: Tesla P100-PCIE-16GB
  - Compute capability: 6.0
  - Total memory: 17.06 GB
  - Multi-processors: 56
  - Memory allocated: 0.00 GB
  - Memory reserved: 0.00 GB

STEP 2: Loading Dataset
 Loaded 50 samples from datasets/tokenized_data
  - Input shape: torch.Size([50, 128])
  - Mask shape: torch.Size([50, 128])
  - Labels shape: torch.Size([50])
  - Device: cuda:0
  - Max sequence length: 128
 Dataset validation passed
  - Batch size: 50
  - Sequence length: 128
  - Unique labels: [0, 1]

STEP 3: Loading Model

Loading model: distilbert-base-uncased-finetuned-sst-2-english
Precision: fp16
Device: cuda
2025-12-01 19:23:20.729644: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT

### Experiment 3: INT8

In [71]:
# Run INT8 experiment
!python src/measure_energy.py \
    --precision int8 \
    --dataset datasets/tokenized_data \
    --num_iters 1000 \
    --trial 1

ENERGY-AWARE QUANTIZATION EXPERIMENT
Precision: int8
Model: distilbert-base-uncased-finetuned-sst-2-english
Dataset: datasets/tokenized_data
Trial: 1

Output will be saved to: results/int8/trial_1_20251201_192633.csv

STEP 1: Checking GPU
 GPU ready for measurements
  - Device: Tesla P100-PCIE-16GB
  - Compute capability: 6.0
  - Total memory: 17.06 GB
  - Multi-processors: 56
  - Memory allocated: 0.00 GB
  - Memory reserved: 0.00 GB

STEP 2: Loading Dataset
 Loaded 50 samples from datasets/tokenized_data
  - Input shape: torch.Size([50, 128])
  - Mask shape: torch.Size([50, 128])
  - Labels shape: torch.Size([50])
  - Device: cuda:0
  - Max sequence length: 128
 Dataset validation passed
  - Batch size: 50
  - Sequence length: 128
  - Unique labels: [0, 1]

STEP 3: Loading Model

Loading model: distilbert-base-uncased-finetuned-sst-2-english
Precision: int8
Device: cuda
2025-12-01 19:26:35.200064: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT

## Option B: Run Experiments Programmatically (Pure Notebook)

If you prefer to run everything inside the notebook without calling external scripts:

In [60]:
import time
from datetime import datetime
from pathlib import Path
import pandas as pd
import json

from src import (
    load_pre_tokenized,
    warmup,
    check_gpu_ready,
    PowerLogger,
    run_steady_state_benchmark,
    compute_energy_with_timing,
    get_memory_stats
)
from models import load_model, get_model_info

def run_experiment(precision="fp32", dataset_path="datasets/tokenized_data", num_iters=1000):
    """
    Run a complete experiment for a given precision.
    """
    print("="*70)
    print(f"RUNNING {precision.upper()} EXPERIMENT")
    print("="*70)
    
    # Check GPU
    check_gpu_ready(verbose=True)
    
    # Load dataset
    print("\nLoading dataset...")
    input_ids, mask, labels, metadata = load_pre_tokenized(dataset_path, device="cuda")
    
    # Load model
    print(f"\nLoading {precision} model...")
    model = load_model(
        "distilbert-base-uncased-finetuned-sst-2-english",
        precision=precision,
        device="cuda",
        verbose=True
    )
    model_info = get_model_info(model)
    
    # Warmup
    print("\nWarming up GPU...")
    warmup(model, input_ids, mask, num_steps=100, verbose=True)
    
    # Start power logger
    print("\nStarting power logger...")
    power_logger = PowerLogger(sample_interval_ms=100, gpu_id=0, verbose=False)
    power_logger.start()
    time.sleep(0.5)  # Let it stabilize
    
    # Run benchmark
    print("\nRunning benchmark...")
    results = run_steady_state_benchmark(
        model, input_ids, mask, labels,
        num_iters=num_iters,
        compute_accuracy=True,
        verbose=True
    )
    
    # Stop power logger
    time.sleep(0.5)
    power_logger.stop()
    power_samples = power_logger.read()
    
    print(f"\nCollected {len(power_samples)} power samples")
    
    # Compute energy
    energy_results = compute_energy_with_timing(power_samples, results)
    results.update(energy_results)
    results.update(model_info)
    results.update(get_memory_stats())
    
    # Add metadata
    results["precision"] = precision
    results["timestamp"] = datetime.now().isoformat()
    
    # Save results
    output_dir = Path(f"results/{precision}")
    output_dir.mkdir(parents=True, exist_ok=True)
    
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    csv_path = output_dir / f"trial_1_{timestamp}.csv"
    json_path = output_dir / f"trial_1_{timestamp}.json"
    
    df = pd.DataFrame([results])
    df.to_csv(csv_path, index=False)
    
    with open(json_path, "w") as f:
        json.dump(results, f, indent=2)
    
    print(f"\n‚úì Results saved to {csv_path}")
    
    # Print summary
    print("\n" + "="*70)
    print("RESULTS SUMMARY")
    print("="*70)
    print(f"Latency:  {results['mean_latency']*1000:.2f} ms")
    print(f"Power:    {results['mean_power_w']:.2f} W")
    print(f"Energy:   {results['energy_per_inference_mj']:.2f} mJ/inference")
    print(f"Accuracy: {results['accuracy']*100:.2f}%")
    print("="*70)
    
    return results

ImportError: cannot import name 'compute_energy_with_timing' from 'src' (/kaggle/working/energy_aware_quantization/src/__init__.py)

### Run FP32 Experiment

In [61]:
fp32_results = run_experiment(precision="fp32", num_iters=1000)

NameError: name 'run_experiment' is not defined

### Run FP16 Experiment

In [None]:
fp16_results = run_experiment(precision="fp16", num_iters=1000)

### Run INT8 Experiment

In [None]:
int8_results = run_experiment(precision="int8", num_iters=1000)

## Step 7: Analyze and Compare Results

In [None]:
import pandas as pd
import numpy as np

# Load all results
results_data = []

for precision in ["fp32", "fp16", "int8"]:
    results_dir = Path(f"results/{precision}")
    csv_files = list(results_dir.glob("*.csv"))
    
    if csv_files:
        df = pd.read_csv(csv_files[-1])  # Get latest
        results_data.append(df)

if results_data:
    all_results = pd.concat(results_data, ignore_index=True)
    
    # Create comparison table
    print("="*80)
    print("COMPARISON TABLE")
    print("="*80)
    
    comparison = all_results[[
        "precision",
        "mean_latency",
        "throughput",
        "mean_power_w",
        "energy_per_inference_mj",
        "accuracy",
        "model_size_mb"
    ]].copy()
    
    # Format columns
    comparison["mean_latency"] = comparison["mean_latency"] * 1000  # to ms
    comparison["accuracy"] = comparison["accuracy"] * 100  # to %
    
    comparison.columns = [
        "Precision",
        "Latency (ms)",
        "Throughput (samp/s)",
        "Power (W)",
        "Energy (mJ)",
        "Accuracy (%)",
        "Model Size (MB)"
    ]
    
    print(comparison.to_string(index=False))
    print("="*80)
    
    # Calculate improvements vs FP32
    fp32_row = comparison[comparison["Precision"] == "fp32"].iloc[0]
    
    print("\nIMPROVEMENTS vs FP32:")
    print("-"*80)
    
    for _, row in comparison.iterrows():
        if row["Precision"] != "fp32":
            speedup = fp32_row["Latency (ms)"] / row["Latency (ms)"]
            energy_reduction = (fp32_row["Energy (mJ)"] - row["Energy (mJ)"]) / fp32_row["Energy (mJ)"] * 100
            accuracy_delta = row["Accuracy (%)"] - fp32_row["Accuracy (%)"]
            
            print(f"{row['Precision'].upper()}:")
            print(f"  Speedup:          {speedup:.2f}x")
            print(f"  Energy reduction: {energy_reduction:.1f}%")
            print(f"  Accuracy delta:   {accuracy_delta:+.2f}%")
            print()
    
    print("="*80)
else:
    print("No results found. Run experiments first!")

## Step 8: Visualize Results (Optional)

In [None]:
import matplotlib.pyplot as plt

if len(results_data) >= 2:
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    
    precisions = all_results["precision"].tolist()
    
    # Latency
    axes[0, 0].bar(precisions, all_results["mean_latency"] * 1000)
    axes[0, 0].set_title("Latency")
    axes[0, 0].set_ylabel("ms")
    
    # Energy
    axes[0, 1].bar(precisions, all_results["energy_per_inference_mj"])
    axes[0, 1].set_title("Energy per Inference")
    axes[0, 1].set_ylabel("mJ")
    
    # Power
    axes[1, 0].bar(precisions, all_results["mean_power_w"])
    axes[1, 0].set_title("Mean Power")
    axes[1, 0].set_ylabel("Watts")
    
    # Accuracy
    axes[1, 1].bar(precisions, all_results["accuracy"] * 100)
    axes[1, 1].set_title("Accuracy")
    axes[1, 1].set_ylabel("%")
    axes[1, 1].set_ylim([80, 100])
    
    plt.tight_layout()
    plt.savefig("results/comparison_plots.png", dpi=150, bbox_inches="tight")
    plt.show()
    
    print("‚úì Plots saved to results/comparison_plots.png")
else:
    print("Need at least 2 precision modes to plot. Run more experiments!")

## Step 9: Export Results

Download results for your report.

In [None]:
# Create a zip of all results
!zip -r results.zip results/

print("‚úì Results zipped")
print("\nYou can download 'results.zip' from the output panel on the right.")

## Summary

This notebook provides two ways to run experiments:

1. **Option A**: Call the Python script with `!python src/measure_energy.py ...`
   - Easiest approach
   - Uses the production script
   - Good for multiple trials

2. **Option B**: Run everything programmatically in the notebook
   - More control
   - Better for debugging
   - Immediate access to results

**Both work on Kaggle!** Choose whichever you prefer.

### Next Steps:

1. Run all three precision modes (FP32, FP16, INT8)
2. Run multiple trials (3-5) for statistical significance
3. Analyze results and create plots
4. Export results for your report

Good luck! üöÄ