Release V0.13.0 release · intel/auto-round

Highlights

Support MTP quantization by @xin3he in #1758
Support one cpu/xpu backend by @Zhenzhong1 in #1723, @luoyu-intel in #1806, @Copilot in #1819
Support model_free WOQ quantization by @xin3he in #1699, @xin3he in #1809
Expanded support for new model architectures across LLMs, VLMs, diffusion, audio, and MoE models, including Gemma4 (#1655), MiMo-V2-Flash (#1718), BAGEL-7B-MoT (#1633), WAN2.2 (#1678), Ovis-Image-7B (#1616), Qwen-TTS and MiMo-Audio (#1810), and Qwen3.6-35B-A3B (#1705).
Add compressed-tensors format export support for W4A16 and W8A16 by @thuang6 in #1669
New architecture for auto_round by @n1ck-guo in #1542, #1761, #1781, #1796, in #1807, #1765, #1808, #1817, #1832
Support AWQ based smoothing algorithm by @WeiweiZhang1 in #1749

Bug fixes and Improvements

Reduce XPU memory usage in CI by @xin3he in #1812
Reduce flux CUDA CI tuning memory from 30G to 5GB by @xin3he in #1694
Support MXINT4 scheme by @mengniwang95 in #1666
Remove IPEX related code, doc, and test by @xin3he in #1787
Enhance dataset preprocessing memory management and fix hash failure by @xin3he in #1621
Enhance quantization configuration support for mixed precision and schemes in utils and tests by @xin3he in #1643
Force auto-scheme low_gpu to True in CLI by @wenhuach21 in #1653
Enhance performance test by @XuehaoSun in #1610
Fix auto-scheme accuracy drop bug w/o low_gpu, add CI test by @WeiweiZhang1 in #1658
Add gptqmodel 6.0 compatible change and MTP quantization skip warning. by @xin3he in #1663
Import gptqmodel in confest to avoid error in transformers by @xin3he in #1668
Fix missing extra_config export for unsupported ignore_layers like mlp.gate by @lvliang-intel in #1660
Update unit test requirements for compressed-tensors and transformers… by @XuehaoSun in #1670
Fix omni model test CI issue by @lvliang-intel in #1667
Refactor module access to use PyTorch get_submodule / set_submodule by @scopophobic in #1590
Support longcat_next by @xin3he in #1637
Stabilize Qwen3-Omni MoE weight fidelity test for matching NaNs by @lvliang-intel in #1681
Fix hadamard transform weight dtype, using float32 as default and in-place transformed weight . by @lkk12014402 in #1665
fp8_block bug fix by @mengniwang95 in #1693
Support diffusion model AIDC-AI/Ovis-Image-7B quantization by @lvliang-intel in #1616
Gate large FP4 packing test by GPU memory by @yiliu30 in #1696
Add Claude skills for AutoRound by @lvliang-intel in #1686
Add missing run_mllm entry point alias by @JGSphaela in #1695
Rename scheme INT8_W8A8 to INT8 by @thuang6 in #1687
Update mtp quant for special cases by @xin3he in #1691
Update gaudi-docker to v1.24.0 & fix CUDA UT by @XuehaoSun in #1708
Add support for gemma4 model by @n1ck-guo in #1655
Ignore mtp.fc for qwen3_5 due to vllm failure by @xin3he in #1710
Introduce INT4 support at the algorithm level by @wenhuach21 in #1641
Refine int4 doc by @wenhuach21 in #1720
Revert "ignore mtp.fc for qwen3_5 due to vllm failure (#1710)" by @xin3he in #1730
Skip quantizing mtp.fc since vLLM doesn't support by @xin3he in #1731
Create model_support_request.yml by @xin3he in #1738
Remove threaded packing from exporters by @yiliu30 in #1719
Reduce XPU memory usage with patch_xpu_sdpa_drop_causal_mask by @xin3he in #1716
Add MLX format export support and AutoScheme for vlm support by @wenhuach21 in #1732
Add warnings for lm_head activation scale fallback by @n1ck-guo in #1728
Add support for MiMo-V2-Flash by @n1ck-guo in #1718
Fix vllm CUDA CI by @XuehaoSun in #1750
Fix hpu error by @n1ck-guo in #1766
MTP split gate_up_proj and fix accu gap in rtn quantization by @xin3he in #1758
Support gptqmodel 7.0.0 and fix bug in CI by @xin3he in #1772
Optimize CUDA CI and Code Scan workflows by @XuehaoSun in #1770
Fix accuracy regression and check it in CUDA CI by @xin3he in #1785
Fix amp by @wenhuach21 in #1768, @wenhuach21 in #1767
Fix incompatible weight names by @mengniwang95 in #1759
Fix CT export metadata for KV cache and attention by @yiliu30 in #1752, @yiliu30 in #1861
Enhance AutoRound Lib test workflow by @chensuyue in #1805, @chensuyue in #1801
Fix mixed-precision accuracy regression when AutoScheme runs with CPU offloading and Hadamard rotation enabled by @lvliang-intel in #1753
Reduce XPU memory usage in CI by @xin3he in #1812
Add shared agent config layout by @yiliu30 in #1700
Support ByteDance-Seed/BAGEL-7B-MoT quantization in w4a16 format by @lvliang-intel in #1633
Adjust gguf tuning algorithm by @wenhuach21 in #1649, @wenhuach21 in #1824
Feats: Quantize/save/evaluate the Wan-AI/WAN2.2 models in w4a16 format by @lvliang-intel in #1678
Fix layer name mismatch of VLM(qwen3.5-2B) in hf loading by @xin3he in #1823
Update dependencies in CI and installation scripts for cuda compatibility by @XuehaoSun in #1825
Reduce RAM usage of quantizing VLM models and fix some issues of quantizing gemma4 by @lvliang-intel in #1791
Add mimo-audio, Qwen-TTS model backbone quantization by @WeiweiZhang1 in #1810
Reduce VRAM usage of quantizing VLM models by @lvliang-intel in #1777
Support quarot/spinquant rotation before quantization by @lkk12014402 in #1797
Support Exporting Block-Wise FP8 AR Format by @Zhenzhong1 in #1798
Fix Gemma4 KeyError sliding_attention issue by @lvliang-intel in #1839
Fix gpt-j-6b RTN RuntimeError by @lvliang-intel in #1848
Sage fast sfm by @luoyu-intel in #1843
Security: HTTP requests are performed without timeout safeguards by @tomaioo in #1683
Dynamic map checkpoint naming based on model objective. by @xin3he in #1840
Refine/fix gptq format by @wenhuach21 in #1853
Fix save_quantized log conflict by @WeiweiZhang1 in #1845
Fix Qwen Omni quantization model issue for long form audio generation by @lvliang-intel in #1698
Fix bug of qwen and gguf export by @n1ck-guo in #1846
Refactor quarot/spinquant rotation with simplying code. by @lkk12014402 in #1849
Fix gemma4 crash issue during quantizing by @lvliang-intel in #1860
Fix SDPA bug by @luoyu-intel in #1862
Fix special-model predefined ignore layer filtering by @lvliang-intel in #1863
Fix packing format in quantization config and update variable assignment in tests by @xin3he in #1857
Fix unsupported dtype by @wenhuach21 in #1868
Support WOQ model input, such as kimi2.5 by @xin3he in #1642
Enable low_cpu_mem_usage for mxfp/nvfp by @Kaihui-intel in #1648

New Contributors

@JGSphaela made their first contribution in #1695
@tomaioo made their first contribution in #1683

Full Changelog: v0.12.3...v0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V0.13.0 release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Bug fixes and Improvements

New Contributors

Contributors

Uh oh!