[VitisAI] AMD NPU LLM Quantization - Add Windows + CUDA support for Quark Quantizer by poganesh · Pull Request #2269 · microsoft/Olive

poganesh · 2025-11-20T03:19:44Z

Describe your changes

This PR extends the Quark quantization support to enable Windows + CUDA workflows and improves quark quantization stability:
Key Changes:

Added Windows + CUDA platform support for Quark quantization pass
Added bfloat16 dtype support in quark quantization pass for improved AWQ quantization stability
Tested on Windows 11 Pro with NVIDIA RTX 3090 + CUDA 13.0

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

poganesh and others added 4 commits November 6, 2025 15:11

update device check - check cuda in linux and win

45bc6d8

Update quantize_quark.py

daaf31f

update data-type to bfloat16

3f1f2b0

Merge branch 'microsoft:main' into win_cuda

c50314d

thiagocrepaldi approved these changes Nov 20, 2025

View reviewed changes

jambayk approved these changes Nov 20, 2025

View reviewed changes

jambayk enabled auto-merge (squash) November 20, 2025 18:07

jambayk merged commit 8b44cf4 into microsoft:main Nov 21, 2025
15 checks passed