From cd83ea1014f91ce76717a2bb466a9065ca5f2f76 Mon Sep 17 00:00:00 2001
From: "He, Xin3" <xin3.he@intel.com>
Date: Tue, 25 Nov 2025 04:30:34 -0500
Subject: [PATCH 1/3] simplify what's new and add publication_list

Signed-off-by: He, Xin3 <xin3.he@intel.com>
---
 README.md                | 36 ++++++++++++++----------------------
 docs/publication_list.md | 14 ++++++++++++++
 2 files changed, 28 insertions(+), 22 deletions(-)
 create mode 100644 docs/publication_list.md

diff --git a/README.md b/README.md
index b12b62731..fd9b0dc6e 100644
--- a/README.md
+++ b/README.md
@@ -30,30 +30,25 @@ See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage in
 
 
 ## 🆕 What's New
-[2025/11] AutoRound has now landed in **LLM-Compressor**! You can apply AutoRound algorithm using `AutoRoundModifier`. Check out the [example](https://github.com/vllm-project/llm-compressor/tree/main/examples/autoround/README.md) to get started!
 
-[2025/11] AutoRound now offers preliminary support for an enhanced GGUF quantization algorithm via `--enable_alg_ext`. For detailed accuracy benchmarks, please refer to the [documentation](./docs/gguf_alg_ext_acc.md).
+* [2025/11] AutoRound has landed in [**LLM-Compressor**](https://github.com/vllm-project/llm-compressor/tree/main/examples/autoround/README.md)!
 
-[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the SGLang versions newer than v0.5.4.
+* [2025/11] An enhanced GGUF quantization algorithm improves [accuracy](./docs/gguf_alg_ext_acc.md).
 
-[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
+* [2025/10] AutoRound has been integrated into **SGLang**: [Blog](https://lmsys.org/blog/2025-11-13-AutoRound/), [X post](https://x.com/lmsysorg/status/1991977019220148650?s=20), [Linkedin](https://www.linkedin.com/feed/update/urn:li:activity:7397742859354857472).
 
-[2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please
-refer to the documentation for accuracy [results](./docs/auto_scheme_acc.md) and [this guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for usage instructions.
+* [2025/10] A fast algorithm to generate [**mixed bits/datatypes**](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) schemes in minutes with good [results](./docs/auto_scheme_acc.md)
 
-[2025/09] AutoRound now includes experimental support for the **mxfp4 and nvfp4 dtypes**. For accuracy results, see the [documentation](./docs/mxnv_acc.md)
-. We currently recommend exporting to the LLM-Compressor format.
+* [2025/09] **MXFP4 and NVFP4 dtypes** is supported with [accuracy data](./docs/mxnv_acc.md).
 
-[2025/08] AutoRound now provides experimental support for **an improved INT2 algorithm** via `--enable_alg_ext`. See this [documentation](./docs/alg_202508.md)
- for some accuracy results. 
+* [2025/08] **An improved INT2 algorithm** via `--enable_alg_ext` with good [accuracy](./docs/alg_202508.md)
 
-[2025/07] AutoRound now offers experimental support for **GGUF** format, and recommends using optimized RTN mode (--iters 0) for
+* [2025/07] **GGUF** format is supported and optimized RTN mode is suggested (--iters 0) for
   all bits other than 3 bits. 
 
-[2025/05] AutoRound has been integrated into **Transformers** and **vLLM**. 
+* [2025/05] AutoRound has been integrated into **Transformers** and **vLLM**: [Blog](). 
 
-[2025/03] The INT2-mixed **DeepSeek-R1** model (~200GB) retains 97.9% accuracy. Check
-  out [OPEA/DeepSeek-R1-int2-mixed-sym-inc](https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc).
+* [2025/03] The INT2-mixed [**DeepSeek-R1**](https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc) model (~200GB) retains 97.9% accuracy.
 
 
 ## ✨ Key Features
@@ -319,7 +314,6 @@ for prompt, output in zip(prompts, outputs):
 
 ### Transformers (CPU/Intel GPU/Gaudi/CUDA)
 
-
 AutoRound supports 10+ backends and automatically selects the best available backend based on the installed libraries and prompts the user to
 install additional libraries when a better backend is found.
 
@@ -327,6 +321,7 @@ install additional libraries when a better backend is found.
 this may cause unexpected exceptions.
 
 The support for Gaudi device is limited.
+
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 
@@ -337,15 +332,12 @@ text = "There is a girl who likes adventure,"
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
 print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))
 ```
+
 ## Acknowledgement
 Special thanks to open-source low precision libraries such as AutoGPTQ, AutoAWQ, GPTQModel, Triton, Marlin, and ExLLaMAV2 for providing low-precision CUDA kernels, which are leveraged in AutoRound.
 
+> **Note**:
+> For all publications/blogs, please view [Publication List](./docs/publication_list.md).
+
 ## 🌟 Support Us
 If you find AutoRound helpful, please ⭐ star the repo and share it with your community!
-
-
-
-
-
-
-
diff --git a/docs/publication_list.md b/docs/publication_list.md
new file mode 100644
index 000000000..92d1f8860
--- /dev/null
+++ b/docs/publication_list.md
@@ -0,0 +1,14 @@
+Full Publications/Events
+==========
+
+## 2025 (3)
+
+* Blog in LMSYS: [AutoRound Meets SGLang: Enabling Quantized Model Inference with AutoRound](https://lmsys.org/blog/2025-11-13-AutoRound/) (Nov 2025)
+
+* Blog in Medium: [Accelerating vLLM and SGLang Deployment using AutoRound](https://medium.com/@NeuralCompressor/accelerating-vllm-and-sglang-deployment-using-autoround-45fdc0b2683e) (Oct 2025)
+
+* Blog in HuggingFace: [What is AutoRound?](https://huggingface.co/blog/autoround) (April 2025)
+
+## 2024 (1)
+
+arXiv: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLM](https://arxiv.org/pdf/2309.05516) (Oct 2024)

From a2f0899be958953452f832476831fc943611507f Mon Sep 17 00:00:00 2001
From: "He, Xin3" <xin3.he@intel.com>
Date: Tue, 25 Nov 2025 21:27:07 -0500
Subject: [PATCH 2/3] update per review comments and optimize expression

Signed-off-by: He, Xin3 <xin3.he@intel.com>
---
 README.md                | 23 ++++++++++++-----------
 docs/publication_list.md |  2 +-
 docs/step_by_step.md     |  4 +++-
 3 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index fd9b0dc6e..a6fc79c49 100644
--- a/README.md
+++ b/README.md
@@ -31,24 +31,25 @@ See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage in
 
 ## 🆕 What's New
 
-* [2025/11] AutoRound has landed in [**LLM-Compressor**](https://github.com/vllm-project/llm-compressor/tree/main/examples/autoround/README.md)!
+* [2025/11] AutoRound has landed in **LLM-Compressor**: [*Usage*](https://github.com/vllm-project/llm-compressor/tree/main/examples/autoround/README.md).
 
-* [2025/11] An enhanced GGUF quantization algorithm improves [accuracy](./docs/gguf_alg_ext_acc.md).
+* [2025/11] An **enhanced GGUF** quantization algorithm is available via `--enable_alg_ext`: [*Accuracy*](./docs/gguf_alg_ext_acc.md).
 
-* [2025/10] AutoRound has been integrated into **SGLang**: [Blog](https://lmsys.org/blog/2025-11-13-AutoRound/), [X post](https://x.com/lmsysorg/status/1991977019220148650?s=20), [Linkedin](https://www.linkedin.com/feed/update/urn:li:activity:7397742859354857472).
+* [2025/10] AutoRound has been integrated into **SGLang**: [*Usage*](), [*LMSYS Blog*](https://lmsys.org/blog/2025-11-13-AutoRound/), [*X post*](https://x.com/lmsysorg/status/1991977019220148650?s=20), [*Linkedin*](https://www.linkedin.com/feed/update/urn:li:activity:7397742859354857472).
 
-* [2025/10] A fast algorithm to generate [**mixed bits/datatypes**](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) schemes in minutes with good [results](./docs/auto_scheme_acc.md)
+* [2025/10] A **mix precision** algorithm is available to generate schemes in minutes: [*Usage*](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme),  [*Accuracy*](./docs/auto_scheme_acc.md).
 
-* [2025/09] **MXFP4 and NVFP4 dtypes** is supported with [accuracy data](./docs/mxnv_acc.md).
+* [2025/09] **MXFP4** and **NVFP4** dtypes is available: [*Accuracy*](./docs/mxnv_acc.md).
 
-* [2025/08] **An improved INT2 algorithm** via `--enable_alg_ext` with good [accuracy](./docs/alg_202508.md)
+* [2025/08] An **improved INT2** algorithm is available via `--enable_alg_ext`: [*Accuracy*](./docs/alg_202508.md)
 
-* [2025/07] **GGUF** format is supported and optimized RTN mode is suggested (--iters 0) for
-  all bits other than 3 bits. 
+* [2025/07] **GGUF** format is supported: [*Usage*](./docs/step_by_step.md#gguf-format). 
 
-* [2025/05] AutoRound has been integrated into **Transformers** and **vLLM**: [Blog](). 
+* [2025/05] AutoRound has been integrated into **vLLM**: [*Usage*](https://docs.vllm.ai/en/latest/features/quantization/auto_round/), [*Blog*](https://medium.com/@NeuralCompressor/accelerating-vllm-and-sglang-deployment-using-autoround-45fdc0b2683e).
 
-* [2025/03] The INT2-mixed [**DeepSeek-R1**](https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc) model (~200GB) retains 97.9% accuracy.
+* [2025/05] AutoRound has been integrated into **Transformers**: [*Blog*](https://huggingface.co/blog/autoround).
+
+* [2025/03] The INT2-mixed **DeepSeek-R1** model (~200GB) retains 97.9% accuracy: [*Model*]((https://huggingface.co/OPEA/DeepSeek-R1-int2-mixed-sym-inc)).
 
 
 ## ✨ Key Features
@@ -337,7 +338,7 @@ print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))
 Special thanks to open-source low precision libraries such as AutoGPTQ, AutoAWQ, GPTQModel, Triton, Marlin, and ExLLaMAV2 for providing low-precision CUDA kernels, which are leveraged in AutoRound.
 
 > **Note**:
-> For all publications/blogs, please view [Publication List](./docs/publication_list.md).
+> For all publications/events, please view [Publication List](./docs/publication_list.md).
 
 ## 🌟 Support Us
 If you find AutoRound helpful, please ⭐ star the repo and share it with your community!
diff --git a/docs/publication_list.md b/docs/publication_list.md
index 92d1f8860..ae07de135 100644
--- a/docs/publication_list.md
+++ b/docs/publication_list.md
@@ -11,4 +11,4 @@ Full Publications/Events
 
 ## 2024 (1)
 
-arXiv: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLM](https://arxiv.org/pdf/2309.05516) (Oct 2024)
+* EMNLP: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLM](https://aclanthology.org/2024.findings-emnlp.662/) (Oct 2024)
diff --git a/docs/step_by_step.md b/docs/step_by_step.md
index c33ca3701..2a842dcc2 100644
--- a/docs/step_by_step.md
+++ b/docs/step_by_step.md
@@ -408,7 +408,9 @@ ar.quantize_and_save(output_dir, format="auto_round")
 
 ### GGUF format
 Experimental feature. This format is well-suited for CPU devices and is widely adopted by the community. 
-This format is well-suited for CPU devices and is widely adopted by the community.
+
+The optimized RTN mode is suggested (--iters 0) for all bits other than 3 bits.
+
 ```python
 from auto_round import AutoRound
 

From 26a5d686d32b46fe653a0055fbe8f2f81ad6b2e9 Mon Sep 17 00:00:00 2001
From: "He, Xin3" <xin3.he@intel.com>
Date: Wed, 26 Nov 2025 04:04:34 -0500
Subject: [PATCH 3/3] update publications

Signed-off-by: He, Xin3 <xin3.he@intel.com>
---
 docs/publication_list.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/publication_list.md b/docs/publication_list.md
index ae07de135..3b6f88848 100644
--- a/docs/publication_list.md
+++ b/docs/publication_list.md
@@ -12,3 +12,9 @@ Full Publications/Events
 ## 2024 (1)
 
 * EMNLP: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLM](https://aclanthology.org/2024.findings-emnlp.662/) (Oct 2024)
+
+# 2023 (2)
+
+* arXiv: [TEQ: Trainable Equivalent Transformation for Quantization of LLMs](https://arxiv.org/abs/2310.10944) (Oct 2023)
+
+* Blog in Medium: [Effective Post-Training Quantization for Large Language Models](https://medium.com/intel-analytics-software/effective-post-training-quantization-for-large-language-models-with-enhanced-smoothquant-approach-93e9d104fb98) (Apr 2023)