How to quantize fine-tune LLM into GGUF format #7299

dibyendubiswas1998 · 2024-05-15T10:40:25Z

Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.

yentur · 2024-05-15T12:21:13Z

You can use convert.py.

ngxson · 2024-05-15T13:57:17Z

You can firstly merge the qlora into the model (that will produce a new set of .safetensors files)

Then either use convert.py or convert-hf-to-gguf.py to convert the safetensors model into gguf

P/s: convert-lora-to-ggml.py is removed a while ago, so the only way to run qlora currently is to merge & convert

github-actions · 2024-06-29T01:06:44Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Jun 15, 2024

github-actions bot closed this as completed Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantize fine-tune LLM into GGUF format #7299

How to quantize fine-tune LLM into GGUF format #7299

dibyendubiswas1998 commented May 15, 2024

yentur commented May 15, 2024

ngxson commented May 15, 2024 •

edited

Loading

github-actions bot commented Jun 29, 2024

How to quantize fine-tune LLM into GGUF format #7299

How to quantize fine-tune LLM into GGUF format #7299

Comments

dibyendubiswas1998 commented May 15, 2024

yentur commented May 15, 2024

ngxson commented May 15, 2024 • edited Loading

github-actions bot commented Jun 29, 2024

ngxson commented May 15, 2024 •

edited

Loading