Tutorial: How to convert HuggingFace model to GGUF format #2948
Replies: 22 comments 41 replies
-
You might want to add a small note that requantizing to other formats from |
Beta Was this translation helpful? Give feedback.
-
I have a model trained using Qlora and I can only convert it to min. 8-bit quantization using GGUF. What about q4_K_S quantization why are they not available? |
Beta Was this translation helpful? Give feedback.
-
Can anyone help me debug this? |
Beta Was this translation helpful? Give feedback.
-
Is there a way to directly do this on colab? |
Beta Was this translation helpful? Give feedback.
-
This way i can only get one file such ass gguf. Is it available to convert model in reproducable format like TheBloke in huggingface? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Hi @samos123 I'm only used to working with .gguf kind of files for LLM, I have no idea what to do with this kind of models and so did a search and found your post. Am I right to assume all models structured this way are hf models? Is there any where I can read more about this? It seems all Youtube go straight to the quantized version .gguf. Are hf models considered the raw models that can be further tuned into something else? I have lots of assumptions but hard to verify. |
Beta Was this translation helpful? Give feedback.
-
Please tell me the difference between the roles of the following files.
My predictions are as follows.
Why aren't Also, only |
Beta Was this translation helpful? Give feedback.
-
Improved the download.py script:
This way you can just pass the model name on huggingface in the command line. It will remove the slash and replace it with a dash when creating the directory. Example:
|
Beta Was this translation helpful? Give feedback.
-
I'm having a |
Beta Was this translation helpful? Give feedback.
-
Hi, I ran into an odd error and was really struggling to find any relevant information online. Hoping someone here can help. I know almost nothing about the technical side of things, just an average AI text gen user. I'm trying to convert GGUFs for models and checked out instructions both here and this guide on Reddit: I managed to get convert.py working, can do FP16 and Q8 converts without issue, but ran into the same mysterious error repeatedly when trying to use quantize.exe to convert pretty much anything. I've tried with both this model Mixtral Erotic and this model CatPPT The error message is always the same:
The processing always gets stuck on "line: 1 char:19", I'm not sure why and I can't really see what character it is specifically. BtW, I'm running in Powershell, just right clicked on the quantize.exe under Explorer and chose the option to auto navigate to that location. I'm not sure if that makes a difference. I'm wondering if the error is because I don't have Llama.cpp installed correctly. Running quantize.exe through CMD gives an error about cudart64_12.dll missing, but downloading and putting the cudart files into the same folder doesn't stop the error. If I'm only using convert.py and quantize .exe, do I still need to follow the Cmake instructions on the Llama.cpp main page to "build Llama" from the source code? I've already ran the requirements.txt through pythonnkich is why convert.py is working for me, I think. It's just for some reason quantize.exe doesn't work. Edit (Update): |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
As the errors state, you are mixing multiple models Please properly download files from HF microsoft/phi-2. Note: you can directly download GGUF quantized Microsoft Phi-2 models from HF with hf.sh, example for a Q4_K_M: ./scripts/hf.sh --repo TheBloke/phi-2-GGUF --file phi-2.Q4_K_M.gguf |
Beta Was this translation helpful? Give feedback.
-
This might be useful. If anyone wants to help improving it, it's always welcome. https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script |
Beta Was this translation helpful? Give feedback.
-
While converting a bigcode/starcoder2-7b into q8_0 quantization using convert.py, I got the following error.
Can anybody please help? |
Beta Was this translation helpful? Give feedback.
-
Can i make GGUF of model that contains custom code |
Beta Was this translation helpful? Give feedback.
-
Hi @samos123, maintainer of Given the popularity of this post I think it'd be good to update it to showcase the Downloading a HuggingFace modelTo download the full model to local folder:
Or only a file:
Pushing the GGUF model to HuggingFaceTo upload a folder to the Hub:
Finally, to set an HF token on a machine - it's best to set Hope this will help you (and future readers) using the Hub in a more convenient way! 🤗 (thanks @julien-c for the friendly ping) |
Beta Was this translation helpful? Give feedback.
-
When specifying
It looks like only |
Beta Was this translation helpful? Give feedback.
-
This
Helped me a ton. It downloaded my LoRA combined with the base model correctly. I was able to make my guff easily. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone, I hope someone can help me with this error I am getting. I try running the below: However, I get an error of : Anyone have any ideas? Is this a problem with the model I am trying to convert? |
Beta Was this translation helpful? Give feedback.
-
Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/
I published this on our blog but though others here might benefit as well, so sharing the raw blog here on Github too. Hope it's helpful to folks here and feedback is welcome.
Downloading a HuggingFace model
There are various ways to download models, but in my experience the
huggingface_hub
library has been the most reliable. The
git clone
method occasionally results inOOM errors for large models.
Install the
huggingface_hub
library:Create a Python script named
download.py
with the following content:Run the Python script:
You should now have the model downloaded to a directory called
vicuna-hf
. Verify by running:Converting the model
Now it's time to convert the downloaded HuggingFace model to a GGUF model.
Llama.cpp comes with a converter script to do this.
Get the script by cloning the llama.cpp repo:
Install the required python libraries:
Verify the script is there and understand the various options:
Convert the HF model to GGUF model:
In this case we're also quantizing the model to 8 bit by setting
--outtype q8_0
. Quantizing helps improve inference speed, but it cannegatively impact quality.
You can use
--outtype f16
(16 bit) or--outtype f32
(32 bit) to preserve originalquality.
Verify the GGUF model was created:
Pushing the GGUF model to HuggingFace
You can optionally push back the GGUF model to HuggingFace.
Create a Python script with the filename
upload.py
thathas the following content:
Get a HuggingFace Token that has write permission from here:
https://huggingface.co/settings/tokens
Set your HuggingFace token:
Run the
upload.py
script:Beta Was this translation helpful? Give feedback.
All reactions