Add Qwen 3.6 MoE model and switch CI to Qwen3.6-35B-A3B-HQQ-INT4#18955
Add Qwen 3.6 MoE model and switch CI to Qwen3.6-35B-A3B-HQQ-INT4#18955mergennachin merged 1 commit intomainfrom
Conversation
Qwen 3.6 MoE shares architecture and runner with Qwen 3.5 MoE. Add a stub README pointing to the existing qwen3_5_moe example. Update CI scripts and cuda.yml to use the Qwen 3.6 prequantized checkpoint. Improve qwen3_5_moe README: add quick-start section for prequantized weights, list available prequantized checkpoints, and clean up terminology.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18955
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 5 New Failures, 1 Cancelled Job, 3 Unrelated FailuresAs of commit 655fa02 with merge base 75ba558 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Adds Qwen 3.6 MoE documentation and switches CUDA CI/export scripts to use the Qwen3.6-35B-A3B-HQQ-INT4 prequantized checkpoint, leveraging the existing Qwen 3.5 MoE runner/export pipeline.
Changes:
- Add a stub
qwen3_6_moeREADME that points to theqwen3_5_moeexample and links the Qwen 3.6 prequantized INT4 checkpoint. - Update CUDA workflow + CI scripts to use
SocialLocalMobile/Qwen3.6-35B-A3B-HQQ-INT4. - Improve
qwen3_5_moeREADME with a prequantized quick-start and clearer “prequantized checkpoint” terminology.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/models/qwen3_6_moe/README.md | Adds a minimal pointer README for Qwen 3.6 MoE and links prequantized weights. |
| examples/models/qwen3_5_moe/README.md | Adds quick-start + clarifies prequantized export terminology and checkpoints list. |
| .github/workflows/cuda.yml | Switches CUDA CI model matrix and runner selection to Qwen 3.6 INT4 checkpoint. |
| .ci/scripts/test_model_e2e.sh | Updates supported HF model ID mapping to Qwen 3.6 checkpoint. |
| .ci/scripts/export_model_artifact.sh | Updates supported HF model ID mapping to Qwen 3.6 checkpoint for export. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - [SocialLocalMobile/Qwen3.6-35B-A3B-HQQ-INT4](https://huggingface.co/SocialLocalMobile/Qwen3.6-35B-A3B-HQQ-INT4) | ||
|
|
||
| ```bash | ||
| python export.py --prequantized <path-to-bundle> |
There was a problem hiding this comment.
In the quick-start snippet, the placeholder still says <path-to-bundle>, but the README now consistently describes --prequantized as a prequantized checkpoint directory. Update the placeholder wording to match (e.g., <path-to-prequantized-checkpoint>), to avoid confusion about what path is expected.
| python export.py --prequantized <path-to-bundle> | |
| python export.py --prequantized <path-to-prequantized-checkpoint> |
| @@ -194,11 +194,11 @@ jobs: | |||
| # Qwen3.5 MoE uses a prequantized checkpoint, only tile-packed | |||
There was a problem hiding this comment.
This comment still refers to “Qwen3.5 MoE”, but the matrix entry below is now Qwen3.6-35B-A3B-HQQ-INT4. Update the comment to reflect Qwen 3.6 (or both 3.5/3.6) so the exclusion rationale stays accurate.
| # Qwen3.5 MoE uses a prequantized checkpoint, only tile-packed | |
| # Qwen3.6 MoE uses a prequantized checkpoint, only tile-packed |
| @@ -324,11 +324,11 @@ jobs: | |||
| # Qwen3.5 MoE uses a prequantized checkpoint, only tile-packed | |||
There was a problem hiding this comment.
This exclusion-block comment still says “Qwen3.5 MoE uses a prequantized checkpoint”, but the excluded model is now Qwen3.6-35B-A3B-HQQ-INT4. Update the comment to match the new model name (or make it version-agnostic) to keep the workflow self-explanatory.
| # Qwen3.5 MoE uses a prequantized checkpoint, only tile-packed | |
| # Qwen3.6-35B-A3B-HQQ-INT4 uses a prequantized checkpoint, only tile-packed |
Qwen 3.6 MoE shares architecture and runner with Qwen 3.5 MoE. Add a stub README pointing to the existing qwen3_5_moe example. Update CI scripts and cuda.yml to use the Qwen 3.6 prequantized checkpoint. Improve qwen3_5_moe README: add quick-start section for prequantized weights, list available prequantized checkpoints, and clean up terminology.