UniComp: A Unified Evaluation of LLM Compression via Pruning, Quantization & Distillation

I. Evaluate Performance

Setup

conda create -n performance python=3.10
conda activate performance
pip install -e . -r ./setup/requirements_performance.txt

conda create -n light python=3.10
conda activate light
pip install -e . r ./setup/requirements_light.txt

Run a single eval

sbatch run_performance.sh llama_8b_wanda_50 knowledge

Reproduce all paper results

bash sweep.sh

Local pruned/distilled models must be generated first — see compress/README.md

II. Evaluate Reliability

Setup — two environments required

conda create -n vllm python=3.10
conda activate vllm
pip install -r ./setup/requirements_vllm.txt

conda create -n trustllm python=3.10
conda activate trustllm
pip install -r ./setup/requirements_trust.txt

Step 1 — Generate responses

cd TrustLLM

Edit generate_all.py — set MODEL_PATH to your model path
Register the model in config.py — add "/path/to/model": "model_name" to model_map and append "model_name" to the openai_model array
Serve the model on GPU:

conda activate vllm
vllm serve "/path/to/model" \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --api-key localtoken \
  --served-model-name model_name

In config.py set openai_key="localtoken" and openai_api_base="http://localhost:8000/v1"
SSH to the same compute node and run generation:

conda activate trustllm
python generate_all.py

Responses are saved to UniComp/TrustLLM/generation_results/{model_name}/

Step 2 — Evaluate responses

Responses are evaluated by GPT-4 Turbo — an OpenAI API key is required.

Swap to OpenAI in config.py: set openai_key="YOUR_KEY" and openai_api_base="https://api.openai.com/v1"
Run evaluation:

conda activate trustllm
python evaluate.py --model_name "path/to/model"

Or via SLURM (uncomment the trustllm lines in run.sh):

sbatch run.sh

Results are printed to stdout.

III. Evaluate Efficiency

sbatch run_performance.sh

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
TrustLLM @ fe6c614		TrustLLM @ fe6c614
calibration_data		calibration_data
data		data
efficiency		efficiency
figures		figures
lm-evaluation-harness @ 5d7dc4c		lm-evaluation-harness @ 5d7dc4c
pruning		pruning
quantization		quantization
reproduce_minitron_distillation		reproduce_minitron_distillation
scripts		scripts
setup		setup
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
evaluate_wiki2.py		evaluate_wiki2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniComp: A Unified Evaluation of LLM Compression via Pruning, Quantization & Distillation

I. Evaluate Performance

Setup

Run a single eval

Reproduce all paper results

II. Evaluate Reliability

Setup — two environments required

Step 1 — Generate responses

Step 2 — Evaluate responses

III. Evaluate Efficiency

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UniComp: A Unified Evaluation of LLM Compression via Pruning, Quantization & Distillation

I. Evaluate Performance

Setup

Run a single eval

Reproduce all paper results

II. Evaluate Reliability

Setup — two environments required

Step 1 — Generate responses

Step 2 — Evaluate responses

III. Evaluate Efficiency

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages