Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to quantize fine-tune LLM into GGUF format #7299

Closed
dibyendubiswas1998 opened this issue May 15, 2024 · 3 comments
Closed

How to quantize fine-tune LLM into GGUF format #7299

dibyendubiswas1998 opened this issue May 15, 2024 · 3 comments
Labels

Comments

@dibyendubiswas1998
Copy link

Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.

@yentur
Copy link

yentur commented May 15, 2024

You can use convert.py.

@ngxson
Copy link
Collaborator

ngxson commented May 15, 2024

You can firstly merge the qlora into the model (that will produce a new set of .safetensors files)

Then either use convert.py or convert-hf-to-gguf.py to convert the safetensors model into gguf

P/s: convert-lora-to-ggml.py is removed a while ago, so the only way to run qlora currently is to merge & convert

@github-actions github-actions bot added the stale label Jun 15, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants