GitHub - kssteven418/SqueezeLLM-gradients

Gradient Computation for SqueezeLLM

SqueezeLLM utilizes the Fisher Information matrix as a sensitivity metric. This repository, which builds on top of Huggingface's transformer library, is designed to calculate the Fisher sensitivity score (gradient square). This score can be employed in the quantization pipeline of our official SqueezeLLM library.

Prerequisite

You will need to have your own Huggingface-compatible LLaMA checkpoint saved at [MODEL_PATH].

Run the following command for setup:

conda create -n sqllm-grad python=3.9 -y
conda activate sqllm-grad
pip install -e .
pip install -r requirements.txt

Command

Run the following command:

CUDA_VISIBLE_DEVICES=0 python run.py --output_dir [OUTPUT_PATH] --model_name [MODEL_PATH]   # single GPU
CUDA_VISIBLE_DEVICES=0,1 python run.py --output_dir [OUTPUT_PATH] --model_name [MODEL_PATH]   # multi GPU

This command performs the following steps

Loads the model from [MODEL_PATH]. Currently, we support LLaMA and Mistral models.
Computes the gradient square using a subset of the C4 training dataset as a calibration set. You can define and use your own calibration dataset.
Outputs the gradient square at [OUTPUT_PATH]. The output format will be identical to the loaded Huggingface model checkpoint, with the only difference being that the weight values are replaced by the gradient square.

If the model size exceeds the capacity of a single GPU, our framework provides an option to distribute the model across multiple GPUs. This is automated by configuring multiple CUDA visible devices. To be specific, the model is partitioned into multiple chunks of consecutive layers, and each segment is assigned to an individual GPU device.

You can also use the --num_examples argument to change the number of calibration examples. This defaults to 100.

Name		Name	Last commit message	Last commit date
Latest commit History 14,585 Commits
.circleci		.circleci
.github		.github
docker		docker
docs		docs
examples		examples
model_cards		model_cards
notebooks		notebooks
scripts		scripts
src/transformers		src/transformers
templates		templates
tests		tests
utils		utils
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ISSUES.md		ISSUES.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_es.md		README_es.md
README_hd.md		README_hd.md
README_ja.md		README_ja.md
README_ko.md		README_ko.md
README_pt-br.md		README_pt-br.md
README_ru.md		README_ru.md
README_te.md		README_te.md
README_zh-hans.md		README_zh-hans.md
README_zh-hant.md		README_zh-hant.md
SECURITY.md		SECURITY.md
awesome-transformers.md		awesome-transformers.md
conftest.py		conftest.py
datautils.py		datautils.py
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

License

kssteven418/SqueezeLLM-gradients

Folders and files

Latest commit

History

Repository files navigation

Gradient Computation for SqueezeLLM

Prerequisite

Command

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages