MAF-19231: feat(preset): add new InferenceServiceTemplates for Qwen#49
Merged
MAF-19231: feat(preset): add new InferenceServiceTemplates for Qwen#49
Conversation
…nd vllm-meta-llama models - Introduced InferenceServiceTemplates for Qwen/Qwen3-1.7B and vllm-meta-llama-3.2-1B-Instruct across AMD MI250 and MI300x configurations. - Configured environment variables and resource requests/limits for optimal performance. - Added support for different roles (consumer, producer) in the extra arguments for each template. - Ensured consistent naming conventions and labels across all new templates.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds new InferenceServiceTemplate preset files for Qwen and meta-llama models, supporting both AMD MI300X and MI250 accelerators with different configurations (base, prefill, and decode variants).
Changes:
- Added 6 new InferenceServiceTemplate files for Qwen/Qwen3-1.7B model (base, prefill, and decode variants for both MI300X and MI250)
- Added 6 new InferenceServiceTemplate files for meta-llama/Llama-3.2-1B-Instruct model (base, prefill, and decode variants for both MI300X and MI250)
Reviewed changes
Copilot reviewed 6 out of 12 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi300x-tp2.helm.yaml | Adds Qwen3-1.7B prefill configuration for AMD MI300X with tensor parallelism 2 |
| quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi250-tp2.helm.yaml | Adds Qwen3-1.7B prefill configuration for AMD MI250 with tensor parallelism 2 |
| quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi300x-tp2.helm.yaml | Adds Qwen3-1.7B decode configuration for AMD MI300X with tensor parallelism 2 |
| quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi250-tp2.helm.yaml | Adds Qwen3-1.7B decode configuration for AMD MI250 with tensor parallelism 2 |
| quickstart-vllm-qwen-qwen3-1.7b-amd-mi300x-tp2.helm.yaml | Adds Qwen3-1.7B base configuration for AMD MI300X with tensor parallelism 2 |
| quickstart-vllm-qwen-qwen3-1.7b-amd-mi250-tp2.helm.yaml | Adds Qwen3-1.7B base configuration for AMD MI250 with tensor parallelism 2 |
| quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml | Adds Llama-3.2-1B-Instruct prefill configuration for AMD MI300X with tensor parallelism 2 |
| quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml | Adds Llama-3.2-1B-Instruct prefill configuration for AMD MI250 with tensor parallelism 2 |
| quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml | Adds Llama-3.2-1B-Instruct decode configuration for AMD MI300X with tensor parallelism 2 |
| quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml | Adds Llama-3.2-1B-Instruct decode configuration for AMD MI250 with tensor parallelism 2 |
| quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml | Adds Llama-3.2-1B-Instruct base configuration for AMD MI300X with tensor parallelism 2 |
| quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.helm.yaml | Adds Llama-3.2-1B-Instruct base configuration for AMD MI250 with tensor parallelism 2 |
…s for consistency across AMD configurations
hhk7734
approved these changes
Feb 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
meta-llama의 preset 파일 이름에 누락된 quickstart prefix를 추가했습니다.