# Notebook to run learned kv compression scripts on colab

Download Repo and Install Requirements

In [20]:
%%capture

# Download Repo
%cd /content
!rm -rf learned-kv-compression
!git clone -b colab https://henro25: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@github.com/henro25/learned-kv-compression
%cd /content/learned-kv-compression/
%ls

# Install Requirements
%pip install -r colab_requirements.txt
%pip uninstall gcsfs -y
%pip install --upgrade fsspec==2025.3.2
%pip install gcsfs==2024.12.0
%pip install --upgrade datasets

## Training the Autoencoder

This trains an autoencoder that compresses each KV vector to a 16-dimensional latent representation using 1000 texts from WikiText-103.

In [21]:
!python -m src.dictionary_learning.train \
    --name distilgpt2 \
    --latent_dim 16 \
    --num_epochs 10 \
    --batch_size 32 \
    --output_dir models/distilgpt2_16 \
    --num_train_texts 1000

2025-04-08 05:49:17.395306: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744091357.417357   24026 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744091357.424720   24026 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-08 05:49:17.447735: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
{'batch_size': 32,
 'buffer_mult': 2,
 'config': 'src/configs/default_config.json',
 'device': 'cuda',
 'eval_interva

# Benchmarking

Run a quick test for KV Cache compression with minimal parameters

In [22]:
!./quick_test.sh

==== Quick Test: KV Cache Compression ====
Model: distilgpt2
Latent dimension: 16
Number of epochs: 1
Number of training texts: 10
Cache size: 1 MB
Batch size: 512
Number of runs: 3
Step 1: Training autoencoder...
2025-04-08 05:51:56.068042: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744091516.089612   24733 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744091516.096595   24733 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-08 05:51:56.119553: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions

Run experiments

In [30]:
!./run_experiment.sh

==== KV Cache Compression Experiment ====
Model: distilgpt2
Latent dimensions: 8
Cache sizes (MB): 1 10 100
Number of epochs: 5
Number of training texts: 1000
Batch size: 1024
Number of runs for timing: 5
Output directory: experiment_results_distilgpt2
./run_experiment.sh: line 40: venv/bin/activate: No such file or directory
Starting experiment at Tue Apr  8 06:21:55 AM UTC 2025

Training autoencoder with latent_dim=8
python -m src.dictionary_learning.train --config experiment_results_distilgpt2/distilgpt2_latent8/train_config.json
2025-04-08 06:21:59.684277: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744093319.721678   32605 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744093319.734650   32605 cuda_blas.cc:1418] Unable to register 

Eval with Perplexity and Longbench

In [24]:
!./src/evaluation/run_evaluation.sh

Starting evaluation...
2025-04-08 06:00:03.677330: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744092003.697675   26909 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744092003.703449   26909 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-08 06:00:03.728452: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Successfully loaded autoencoder from models/distilgpt2_16/autoencoder_final.pth
Loaded 100 eva