Release V0.9.3 Release · intel/auto-round

Highlights

Added ark backend by @Zhenzhong1 in #1075
reduce vram usage for optimized RTN mode by @wenhuach21 in #1043
Support alg_ext on windows by @chensuyue in #1082
adjust 2/3 bits hyperparameters at auto-round-best by @wenhuach21 in #1081
fix bug of outside block layers do not use best_params by @n1ck-guo in #1128
Fix nvfp4&fp8 packing ram issue, refine all exporting format ram release for origin layer by @yiliu30 in #1129

What's Changed

fix accuracy regression by @wenhuach21 in #1041
add environment.md and remove AutoRoundMLLM usage in readme by @xin3he in #1042
add memory monitor and import auto-scheme on demand by @wenhuach21 in #1049
Update get_block_names func by @mengniwang95 in #1047
set add_bos_token=True for llama model by @n1ck-guo in #1046
Announce llmc integration by @yiliu30 in #1055
[High Risk]reduce vram usage for optimized RTN mode by @wenhuach21 in #1043
Add static FP8 attention support by @yiliu30 in #1045
Enhance tokenizer saving by @mengniwang95 in #1057
Revert "Add static FP8 attention support" by @yiliu30 in #1060
refine readme by @wenhuach21 in #1063
Fix transformers==4.57.1 in CI by @XuehaoSun in #1066
Add static FP8 attention support by @yiliu30 in #1061
Remove tbb by @yiliu30 in #1069
support for gguf mixed q2_k_s by @n1ck-guo in #1059
Add compatibility test for ARM by @XuehaoSun in #1073
Optimize CPU CI pipelines by @XuehaoSun in #1071
Export KV Scheme in LLMC config by @yiliu30 in #1068
Add LLMC integration test by @yiliu30 in #1053
Support transformers loading quantized moe model by @mengniwang95 in #1067
update alg_ext and add ut by @n1ck-guo in #1064
simplify what's new and add publication_list by @xin3he in #1070
Add MXFP8 MOE/Linear and MXFP4 Linear by @yiliu30 in #1034
improve accuracy for 2bit with auto-round-best by @wenhuach21 in #1078
Support mxfp nvfp lmhead quant by @WeiweiZhang1 in #1051
refine sampler by @wenhuach21 in #1077
fix bf16 option in AutoScheme by @wenhuach21 in #1079
adjust 3 bits hyperparameters at auto-round-best by @wenhuach21 in #1081
Support alg_ext on windows by @chensuyue in #1082
[vLLM Ext]Fix MXFP4 Quant by @yiliu30 in #1088
remove numpy restriction for gptq kernel by @wenhuach21 in #1084
Fix MXFP/NVFP + FP8 Attn/KV by @yiliu30 in #1086
Remove accelerate version limitation by @chensuyue in #1090
bump version to v0.9.3 by @chensuyue in #1091
add system checker in backend by @wenhuach21 in #1097
Refactor input normalization by replaying inputs for consistent preprocessing by @yiliu30 in #1094
fix gguf acc and oom bug when iters > 0 by @n1ck-guo in #1098
Add Python 3.14 to compatibility test pipeline by @XuehaoSun in #1096
Fix typo in README.md by @xin3he in #1102
refine lmhead ut by @WeiweiZhang1 in #1106
Move packed res to cpu by @yiliu30 in #1104
remove non-essential requirements by @n1ck-guo in #1103
fix auto-scheme/alg-ext multiple devices issue by @wenhuach21 in #1107
fix release version cuda ut fail by @n1ck-guo in #1110
fix gguf packing device by @n1ck-guo in #1105
fix quant fp8 model with iters=0 and scheme=nvfp4 by @n1ck-guo in #1114
Move quantized block to cpu by @yiliu30 in #1115
fix gguf extension bug by @wenhuach21 in #1116
fix bug of triton pow wrong data_type when enable torch compile by @n1ck-guo in #1120
Use modelscope cache in CPU UT by @XuehaoSun in #1124
fix bug of outside block layers do not use best_params by @n1ck-guo in #1128
Fix nvfp4&fp8 packing ram issue, refine all exporting format ram release for origin layer by @yiliu30 in #1129
fix regression by @xin3he in #1135
Upgrade llmc to main and add cuda UT by @yiliu30 in #1111
Enable load MXFP4/MXFP8 + FP8 KV by @yiliu30 in #1095
Remove duplicate packages by @XuehaoSun in #1139
fix bug of auto scheme with user layer config by @n1ck-guo in #1133
Add AutoRound binary build and publish workflow by @chensuyue in #1132
update readme by @wenhuach21 in #1141
update document for eval by @xin3he in #1140
update windows binary for alg_ext by @chensuyue in #1142
Fix device mismatch of nvfp fuse scale by @WeiweiZhang1 in #1143
add low_cpu_mem_usage in cli by @n1ck-guo in #1146
fix cuda ut fail by @n1ck-guo in #1144
add llmc for cuda ut by @yiliu30 in #1145
Added ark backend by @Zhenzhong1 in #1075

New Contributors

@Zhenzhong1 made their first contribution in #1075

Full Changelog: v0.9.2...v0.9.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V0.9.3 Release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!