-
|
v0.3.1 changelog:
127b08e commit:
Are oQe quantizations already uploaded to HuggingFace considered obsolete? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
|
Thanks for raising this, and thanks for all the oQ test data you've been sharing. It's been really useful input. Short answer: the oQe quants already on HuggingFace aren't broken. They still load and run fine. But I'm not going to be re-uploading or refreshing them while the enhanced path is paused, so consider them frozen rather than actively maintained. On why it's paused: in the few weeks before I disabled it, I tried several variants of the enhanced approach. GPTQ-style, AWQ-style, the original formulations, plus a few MoE-optimized variations of each. Across a handful of models and short benchmarks, the results were not consistent. Some models clearly improved with the enhanced path. Others actually regressed compared to the plain oQ baseline. I haven't found a single method that reliably improves quality across model architectures, so I'd rather keep it off in the UI than ship something that silently makes some models worse. Once I find an approach that holds up across architectures, I'll bring enhanced quantization back. On benchmarking more broadly: from my own testing, MMLU alone isn't enough to judge quantization quality. A single benchmark can flatter or punish a method depending on the model, so proving it properly means running diverse benchmarks across many models, and that takes a lot of time. As a solo developer I have to prioritize where my time goes, and right now bug fixes, new features, and new model support take precedence. I'd love to keep digging into oQe, but I just can't put as much time into it as I'd like to right now. Thanks for understanding. |
Beta Was this translation helpful? Give feedback.
Thanks for raising this, and thanks for all the oQ test data you've been sharing. It's been really useful input.
Short answer: the oQe quants already on HuggingFace aren't broken. They still load and run fine. But I'm not going to be re-uploading or refreshing them while the enhanced path is paused, so consider them frozen rather than actively maintained.
On why it's paused: in the few weeks before I disabled it, I tried several variants of the enhanced approach. GPTQ-style, AWQ-style, the original formulations, plus a few MoE-optimized variations of each. Across a handful of models and short benchmarks, the results were not consistent. Some models clearly improved with the enhanced path.…