Converting TranslateGemma to Q4_K takes an incredibly long time.

I tried converting to Q4_K a 12B model: https://huggingface.co/google/translategemma-12b-it
I have a 9800X3D and 64GB of RAM.
The converter spent about 50 minutes calculating something, fully loading the processor. Then a "dumping" message appeared, with a progress of 0.6%, and a completion time of 511000 seconds.
I tried converting a 4B model: https://huggingface.co/google/translategemma-4b-it
...and after an hour, the result looks like this:
```
Loading vocab file C:\Models\translategemma-4b-it\tokenizer.model
vocab_size  262144
loading C:\Models\translategemma-4b-it\model-00001-of-00002.safetensors ...
Dumping ... |███████-----------------------------------------------------| 11.7% (73/630) 48.24s/it rem: 26822.97sss
```
To test this, I converted the 12B model to GGUF F16 using the converter from llama.cpp, then used llama-quantize to convert to Q4_K_M, and it took less than 2 minutes.
**I think something's wrong.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting TranslateGemma to Q4_K takes an incredibly long time. #129

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Converting TranslateGemma to Q4_K takes an incredibly long time. #129

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions