Match GPTQ state dict #2188

rahul-tuli · 2024-03-19T14:28:46Z

Conversion script:

from sparseml.transformers.utils.vllm_export_helpers import export_vllm_checkpoint
from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer

path = "/home/rahul/projects/sparseml/local/local_output/sparsegpt-autogptq-emulation-checkpoint/stage_compression"
sparse_gpt_model = SparseAutoModelForCausalLM.from_pretrained(path)
tokenizer = SparseAutoTokenizer.from_pretrained(path)

export_vllm_checkpoint(
    model=sparse_gpt_model,
    tokenizer=tokenizer,
)

024-03-21 01:58:33 sparseml.pytorch.model_load.helpers INFO     Reloaded model state after SparseML recipe structure modifications from /home/rahul/projects/sparseml/local/local_output/sparsegpt-autogptq-emulation-checkpoint/stage_compression
2024-03-21 01:58:33 __main__     INFO     Adding exllama quantization info to config
2024-03-21 01:58:33 __main__     INFO     Translating state dict to exllama format.
2024-03-21 01:58:33 sparseml.transformers.utils.transformations INFO     Applying transformation: TRANSFORM_NAMES
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Transformation: TRANSFORM_NAMES complete
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Applying transformation: ADD_TENSORS
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Transformation: ADD_TENSORS complete
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Applying transformation: TRANSFORM_TENSORS
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Transformation: TRANSFORM_TENSORS complete
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Applying transformation: REMOVE_UNWANTED_TENSORS
2024-03-21 02:00:46 sparseml.transformers.utils.transformations INFO     Transformation: REMOVE_UNWANTED_TENSORS complete
2024-03-21 02:00:50 __main__     INFO     Model and config saved to /nm/drive0/rahul/projects/sparseml/exllama_model
2024-03-21 02:00:50 __main__     INFO     tokenizer saved to /nm/drive0/rahul/projects/sparseml/exllama_model

$ tree ./exllama_model 
./exllama_model
├── config.json
├── generation_config.json
├── model.safetensors
├── special_tokens_map.json
├── tokenizer_config.json
└── tokenizer.json

0 directories, 6 files

config.json

{
  "_name_or_path": "/home/rahul/projects/sparseml/local/local_output/sparsegpt-autogptq-emulation-checkpoint/stage_compression",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5632,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 22,
  "num_key_value_heads": 4,
  "pretraining_tp": 1,
  "quantization_config": {
    "bits": 4,
    "desc_act": false,
    "group_size": -1,
    "is_marlin_format": false,
    "quant_method": "gptq",
    "sym": true
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "1.7.0.20240321",
  "use_cache": true,
  "vocab_size": 32000
}

Usage Script: (needs vLLM)

import argparse
from vllm import LLM, SamplingParams


parser = argparse.ArgumentParser()
parser.add_argument("--model", type=str)

args = parser.parse_args()


prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=1, max_tokens=100)

# Create an LLM.
llm = LLM(args.model)

# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"\nGenerated text: {prompt}{generated_text}\n")

bfineran · 2024-03-26T17:42:25Z

src/sparseml/transformers/utils/transformations.py

+    return wrapper
+
+
+def is_quantization_target(key: str) -> bool:


should move file to gptq_helpers.py or make sure these functions are named with gptq specifically since these assumptions are specific to how this algorithm is applied, not all quantization

bfineran · 2024-03-26T17:42:44Z

src/sparseml/transformers/utils/transformations.py

+def _log_call(func):
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        _LOGGER.info("Applying transformation: %s", func.__name__.upper())


let's move to debug, users won't necessarily know the internal transformation names

bfineran · 2024-03-26T17:48:57Z

src/sparseml/transformers/utils/transformations.py

+    intweight = []
+    infeatures = weight.shape[1]
+    for idx in range(infeatures):
+        intweight.append(


after we fix the accuracy issue - let's see what we can do to speed this up - or at least time it. with grouping vectorizing might be tricky but could at least pre-allocate the final tensor

could maybe try moving the model to GPU before running the transformations (ie model.to("cuda:0"))

bfineran · 2024-03-26T17:53:27Z

src/sparseml/transformers/utils/transformations.py

+        - Reshape the zero points tensor to [1, x] of type int32 and fill with zeros
+            (it is assumed that quantization was symmetric)
+
+    :param state_dict: The state_dict to be transformed


specify that keys should already have been updated

remove src. from imports

Update names Some Cleanup

Add docstring to QuantizationConfig

rahul-tuli changed the title ~~Add Translation Structure~~ Match GPTQ state dict Mar 19, 2024

rahul-tuli force-pushed the match-quant-state-dict-to-gptq branch from d442b66 to 29f83bb Compare March 21, 2024 02:18

rahul-tuli marked this pull request as ready for review March 21, 2024 02:22

rahul-tuli force-pushed the match-quant-state-dict-to-gptq branch from 8227bfe to 1b1567d Compare March 26, 2024 14:21

bfineran reviewed Mar 28, 2024

View reviewed changes

bfineran previously approved these changes Mar 28, 2024

View reviewed changes

rahul-tuli added 3 commits April 3, 2024 13:18

Add state dict translation methods

9ba4e90

remove src. from imports

Move to gptq_utils

c59ef95

Update names Some Cleanup

Freeze Quantization Config once instantiated

8257040

Add docstring to QuantizationConfig

rahul-tuli dismissed bfineran’s stale review via 8257040 April 3, 2024 13:18

rahul-tuli force-pushed the match-quant-state-dict-to-gptq branch from 3082f63 to 8257040 Compare April 3, 2024 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match GPTQ state dict #2188

Match GPTQ state dict #2188

rahul-tuli commented Mar 19, 2024 •

edited

Loading

bfineran Mar 26, 2024

bfineran Mar 26, 2024

bfineran Mar 26, 2024

bfineran Mar 26, 2024

bfineran Mar 26, 2024

Match GPTQ state dict #2188

Are you sure you want to change the base?

Match GPTQ state dict #2188

Conversation

rahul-tuli commented Mar 19, 2024 • edited Loading

bfineran Mar 26, 2024

Choose a reason for hiding this comment

bfineran Mar 26, 2024

Choose a reason for hiding this comment

bfineran Mar 26, 2024

Choose a reason for hiding this comment

bfineran Mar 26, 2024

Choose a reason for hiding this comment

bfineran Mar 26, 2024

Choose a reason for hiding this comment

rahul-tuli commented Mar 19, 2024 •

edited

Loading