Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,


## 🆕 What's New
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang later than v0.5.4.

[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.

[2025/10] We proposed a fast algorithm to generate **mixed bits/datatypes** schemes in minutes. Please
Expand Down Expand Up @@ -268,7 +270,6 @@ ar.quantize_and_save(output_dir)
## Model Inference

### vLLM (CPU/Intel GPU/CUDA)
Please note that support for the MoE models and visual language models is currently limited.
```python
from vllm import LLM, SamplingParams

Expand All @@ -287,6 +288,26 @@ for output in outputs:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```


### SGLang (Intel GPU/CUDA)
Please note that support for the MoE models and visual language models is currently limited.

```python
import sglang as sgl

llm = sgl.Engine(model_path="Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound")
prompts = [
"Hello, my name is",
]
sampling_params = {"temperature": 0.6, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print("===============================")
print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
```


### Transformers (CPU/Intel GPU/Gaudi/CUDA)


Expand Down Expand Up @@ -318,3 +339,4 @@ If you find AutoRound helpful, please ⭐ star the repo and share it with your c