I tried converting to Q4_K a 12B model: https://huggingface.co/google/translategemma-12b-it
I have a 9800X3D and 64GB of RAM.
The converter spent about 50 minutes calculating something, fully loading the processor. Then a "dumping" message appeared, with a progress of 0.6%, and a completion time of 511000 seconds.
I tried converting a 4B model: https://huggingface.co/google/translategemma-4b-it
...and after an hour, the result looks like this:
Loading vocab file C:\Models\translategemma-4b-it\tokenizer.model
vocab_size 262144
loading C:\Models\translategemma-4b-it\model-00001-of-00002.safetensors ...
Dumping ... |███████-----------------------------------------------------| 11.7% (73/630) 48.24s/it rem: 26822.97sss
To test this, I converted the 12B model to GGUF F16 using the converter from llama.cpp, then used llama-quantize to convert to Q4_K_M, and it took less than 2 minutes.
I think something's wrong.
I tried converting to Q4_K a 12B model: https://huggingface.co/google/translategemma-12b-it
I have a 9800X3D and 64GB of RAM.
The converter spent about 50 minutes calculating something, fully loading the processor. Then a "dumping" message appeared, with a progress of 0.6%, and a completion time of 511000 seconds.
I tried converting a 4B model: https://huggingface.co/google/translategemma-4b-it
...and after an hour, the result looks like this:
To test this, I converted the 12B model to GGUF F16 using the converter from llama.cpp, then used llama-quantize to convert to Q4_K_M, and it took less than 2 minutes.
I think something's wrong.