Skip to content

junhuihe-hjh/CHESS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

📃Junhui He, Shangyu Wu, Weidong Wen, Chun Jason Xue, Qingan Li: CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification. EMNLP 2024 Main: 18658-18668 🔗Link

Requirements

  • Miniconda (recommended)
  • NVIDIA GPU with at least 24GB VRAM for threshold computation

Install Dependencies

  1. Clone the repository:

    git clone https://github.com/junhuihe-hjh/CHESS.git --recursive
    cd CHESS
  2. Install dependencies:

    # Create miniconda environment (recommended)
    conda create --name CHESS "python<3.13"
    conda activate CHESS
    
    # Install
    pip install -r requirements.txt
    pip install -e ./lm-evaluation-harness

Run Performance Benchmarks

  1. Download the C4 dataset and the Llama-3.1-8B model:

    huggingface-cli download allenai/c4 --local-dir huggingface-datasets/c4 --include en/c4-train.00000-of-01024.json.gz --repo-type dataset
    huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir huggingface-models/Llama-3.1-8B-Instruct --exclude original/*
  2. Compute and save thresholds:

    cd ./CHESS/notebooks
    python thresholds.py
    cd ../..

    Thresholds will be written to ./CHESS/thresholds/0_5.pt

  3. Run benchmarks on downstream tasks:

    cd ./benchmark
    ./run.sh

About

[EMNLP 2024] CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors