GitHub - stephenqz/OATS: Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition

OATS: Outlier-Aware Pruning Througn Sparse and Low Rank Decomposition

This repository contains the code for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition (ICLR'25).

Abstract

We present a novel approach to compressing large transformers, coined OATS, that compresses the model weights by approximating each weight matrix as the sum of a sparse matrix and a low-rank matrix. Prior to the decomposition, the weights are first scaled by the second moment of their input embeddings, so as to ensure the preservation of outlier features recently observed in large transformer models. Without retraining, OATS achieves state-of-the-art performance when compressing large language models, such as Llama-3 and Phi-3, and vision transformers, such as Google's ViT and DINOv2, by up to 60%, all while speeding up the model's inference on a CPU by up to 1.37x compared to prior pruning methods.

Dependencies

The dependencies used to run the experiments in our paper are:

    accelerate==0.29.3
    datasets==2.19.0
    lm_eval==0.4.2
    ml_collections==0.1.1
    torch==2.3.0
    transformers==4.44.1

Details and How to Run

Hyperparameter and experiment specifications are passed via a list of dictionaries in OATS_configs.py with each dictionary representing a specific experiment. The compress variable for OATS should be set to False only if the sparse and low-rank terms do not need to be accessed individually (i.e. if evaluating model performance only). If this is the case, the sparse plus low-rank matrices are summed and saved as a single dense matrix. File to run is main.py.

Codebases Utilized

Our code utilizes and takes inspiration from the codebases found at the following GitHub Repos:

    SliceGPT: https://github.com/microsoft/TransformerCompression
    SparseGPT: https://github.com/IST-DASLab/sparsegpt
    Wanda: https://github.com/locuslab/wanda

Citation

@inproceedings{
zhang2025oats,
title={{OATS}: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition},
author={Stephen Zhang and Vardan Papyan},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=DLDuVbxORA}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
OATS		OATS
figures		figures
models		models
vit		vit
OATS_configs.py		OATS_configs.py
README.md		README.md
config.py		config.py
data_utils.py		data_utils.py
gpu_utils.py		gpu_utils.py
hf_utils.py		hf_utils.py
main.py		main.py
vit_main.py		vit_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OATS: Outlier-Aware Pruning Througn Sparse and Low Rank Decomposition

Abstract

Dependencies

Details and How to Run

Codebases Utilized

Citation

About

Uh oh!

Releases

Packages

Languages

stephenqz/OATS

Folders and files

Latest commit

History

Repository files navigation

OATS: Outlier-Aware Pruning Througn Sparse and Low Rank Decomposition

Abstract

Dependencies

Details and How to Run

Codebases Utilized

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages