-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Closed
Labels
Description
System Info
transformersversion: 4.47.1- Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.27.0
- Safetensors version: 0.4.5
- Accelerate version: 1.2.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA GeForce RTX 3060 Laptop GPU
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Error:
Traceback (most recent call last):
File "/mnt/d/Python_Projects/Jupyter/other/call-center-prompter/debug/quant/check-quantizations.py", line 31, in <module>
quantize_gptq(model_id=model_id, quant_config=gptq_config, prefix_dir=prefix_dir)
File "/mnt/d/Python_Projects/Jupyter/other/call-center-prompter/debug/quant/gptq_quantize.py", line 32, in quantize_gptq
model.save_pretrained(prefix_dir + quant_path)
File "/mnt/d/Python_Projects/Jupyter/other/call-center-prompter/debug/quant/venv-wsl2/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3034, in save_pretrained
safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
File "/mnt/d/Python_Projects/Jupyter/other/call-center-prompter/debug/quant/venv-wsl2/lib/python3.12/site-packages/safetensors/torch.py", line 286, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 5, kind: Uncategorized, message: "Input/output error" })
Code:
import os
import logging
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
from huggingface_hub import login, snapshot_download
logger = logging.getLogger(__name__)
logger.info("Logging in HF")
login(token=<mytoken>)
def quantize_gptq(model_id: str, quant_config: dict, prefix_dir: str = './') -> str:
prefix_dir += '/' if prefix_dir[-1] != '/' else ''
model_path = prefix_dir + model_id.split('/')[1] if os.path.exists(prefix_dir + model_id.split('/')[1]) else model_id
quant_path = model_id.split('/')[1] + f"-GPTQ-{quant_config['bits']}bit"
if os.path.exists(prefix_dir + quant_path):
logger.info("Skipping GPTQ quantization because it already exists")
else:
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
config = GPTQConfig(**quant_config, dataset="c4", tokenizer=tokenizer) # exllama_config={"version":2}
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
trust_remote_code=False,
quantization_config=config,
revision="main"
)
logger.info("Save GPTQ quantized model")
os.makedirs(prefix_dir + quant_path, exist_ok=True)
model.save_pretrained(prefix_dir + quant_path)
tokenizer.save_pretrained(prefix_dir + quant_path)
logger.info("Push to hub GPTQ quantized model")
model.push_to_hub(quant_path)
tokenizer.push_to_hub(quant_path)
return prefix_dir + quant_path
Expected behavior
Model saving without errors