From 1d28589f6984ed365433d11f5b6ebe3d6161a13f Mon Sep 17 00:00:00 2001
From: "Zhang, Weiwei1" <weiwei1.zhang@intel.com>
Date: Tue, 28 Oct 2025 15:06:47 +0800
Subject: [PATCH 1/3] update readme for sglang support

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
---
 README.md | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/README.md b/README.md
index 7303e77b4..a0c353057 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,8 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,
 
 
 ## 🆕 What's New
+[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang master branch.
+
 [2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
 
 [2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please
@@ -287,6 +289,26 @@ for output in outputs:
     print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
 
+
+### SGLang (Intel GPU/CUDA)
+Please note that support for the MoE models and visual language models is currently limited.
+
+```python
+import sglang as sgl
+
+llm = sgl.Engine(model_path="Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound")
+prompts = [
+    "Hello, my name is",
+]
+sampling_params = {"temperature": 0.6, "top_p": 0.95}
+
+outputs = llm.generate(prompts, sampling_params)
+for prompt, output in zip(prompts, outputs):
+    print("===============================")
+    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
+```
+
+
 ### Transformers (CPU/Intel GPU/Gaudi/CUDA)
 
 
@@ -318,3 +340,4 @@ If you find AutoRound helpful, please ⭐ star the repo and share it with your c
 
 
 
+

From 7ed2dc148e9b67c4df72a7ca6374a15670d6aad2 Mon Sep 17 00:00:00 2001
From: "Zhang, Weiwei1" <weiwei1.zhang@intel.com>
Date: Tue, 28 Oct 2025 15:43:35 +0800
Subject: [PATCH 2/3] refine doc

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
---
 README.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/README.md b/README.md
index a0c353057..cee99cb6d 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,
 
 
 ## 🆕 What's New
-[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang master branch.
+[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang(at least this commit (caa4819bfcdc1b0e081d2b93500ea3d4d2cb8e00) or later).
 
 [2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
 
@@ -270,7 +270,6 @@ ar.quantize_and_save(output_dir)
 ## Model Inference
 
 ### vLLM (CPU/Intel GPU/CUDA)
-Please note that support for the MoE models and visual language models is currently limited.
 ```python
 from vllm import LLM, SamplingParams
 

From c2063023d173d3ff7c4f980f9b3c7a22fd45081e Mon Sep 17 00:00:00 2001
From: Wenhua Cheng <wenhua.cheng@intel.com>
Date: Tue, 28 Oct 2025 15:51:47 +0800
Subject: [PATCH 3/3] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index cee99cb6d..a321f3934 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,
 
 
 ## 🆕 What's New
-[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang(at least this commit (caa4819bfcdc1b0e081d2b93500ea3d4d2cb8e00) or later).
+[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang later than v0.5.4.
 
 [2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.