GGUF as directory and quantization parameters GUI #7251

Nexesenex · 2024-05-13T09:05:33Z

Nexesenex
May 13, 2024

Morning coffee thoughts.

Due to the multiplications of different models architectures, what could be interesting is to have a maximum amount of granularity to test quant strategies for a maximum amount of people on a maximum amount of supported models, and for the tests to occur in a minimum amount of time.
That could mean:

GGUF as directory, in which each tensor of each layer would be quantized as a file.
Partial requant, in which, beyond a whole quant, only the specified tensors of the specified layers would be requantized as well.
A GUI to decides easily which tensors to requant, with a possibility to do so by unit (ex : ffn.down for layer x) or per range (ex : ffn.down for layer range x-y), as well as to decide so for a chosen number of ranges.

This would allow to quickly check the impact of a quantization change while sparing compute power and time by not requantizing the same tensors identically over and over again.

Ultimately, when a satisfactory size/quality is found, a "compacting feature), turning the directory into a single .gguf file would be useless also. And why not a decompacting feature, to start the tests from an already quantized model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF as directory and quantization parameters GUI #7251

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

GGUF as directory and quantization parameters GUI #7251

Nexesenex May 13, 2024

Replies: 0 comments

Nexesenex
May 13, 2024