From 91e61a4d22c595416550526b33772a8f12a6d7b0 Mon Sep 17 00:00:00 2001 From: Wenhua Cheng Date: Mon, 27 Oct 2025 20:57:46 +0800 Subject: [PATCH] update readme --- README.md | 2 ++ docs/auto_scheme_acc.md | 3 +-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index bdcd922ac..7303e77b4 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,8 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions, ## 🆕 What's New +[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization. + [2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please refer to the documentation for accuracy [results](./docs/auto_scheme_acc.md) and [this guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for usage instructions. diff --git a/docs/auto_scheme_acc.md b/docs/auto_scheme_acc.md index b058838f6..cdf481d69 100644 --- a/docs/auto_scheme_acc.md +++ b/docs/auto_scheme_acc.md @@ -6,8 +6,7 @@ to stabilize accuracy during evaluation. All other settings follow the default c We ignore the scale and zp bits in the tables below. The accuracy may change a little as we modified a little of the implementation. We will rerun all the experiments. -For mxfp experiment, we use fake model while for weight only model we use real model. **No tuning is applied unless explicit stated. -** +For mxfp experiment, we use fake model while for weight only model we use real model. **No tuning is applied unless explicit stated.** *Average accuracy across `lambada_openai`, `hellaswag`, `piqa`, `winogrande`, and `mmlu`.*