Skip to content

MAF-19231: feat(preset): add new InferenceServiceTemplates for Qwen#49

Merged
hhk7734 merged 2 commits intomainfrom
MAF-19231_add_qwen_preset
Feb 3, 2026
Merged

MAF-19231: feat(preset): add new InferenceServiceTemplates for Qwen#49
hhk7734 merged 2 commits intomainfrom
MAF-19231_add_qwen_preset

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Feb 3, 2026

meta-llama의 preset 파일 이름에 누락된 quickstart prefix를 추가했습니다.

…nd vllm-meta-llama models

- Introduced InferenceServiceTemplates for Qwen/Qwen3-1.7B and vllm-meta-llama-3.2-1B-Instruct across AMD MI250 and MI300x configurations.
- Configured environment variables and resource requests/limits for optimal performance.
- Added support for different roles (consumer, producer) in the extra arguments for each template.
- Ensured consistent naming conventions and labels across all new templates.
@ghost ghost requested a review from hhk7734 February 3, 2026 02:43
@ghost ghost self-assigned this Feb 3, 2026
@ghost ghost self-requested a review as a code owner February 3, 2026 02:43
@ghost ghost requested review from bongwoobak, Copilot and jinwoopark-moreh February 3, 2026 02:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds new InferenceServiceTemplate preset files for Qwen and meta-llama models, supporting both AMD MI300X and MI250 accelerators with different configurations (base, prefill, and decode variants).

Changes:

  • Added 6 new InferenceServiceTemplate files for Qwen/Qwen3-1.7B model (base, prefill, and decode variants for both MI300X and MI250)
  • Added 6 new InferenceServiceTemplate files for meta-llama/Llama-3.2-1B-Instruct model (base, prefill, and decode variants for both MI300X and MI250)

Reviewed changes

Copilot reviewed 6 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi300x-tp2.helm.yaml Adds Qwen3-1.7B prefill configuration for AMD MI300X with tensor parallelism 2
quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi250-tp2.helm.yaml Adds Qwen3-1.7B prefill configuration for AMD MI250 with tensor parallelism 2
quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi300x-tp2.helm.yaml Adds Qwen3-1.7B decode configuration for AMD MI300X with tensor parallelism 2
quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi250-tp2.helm.yaml Adds Qwen3-1.7B decode configuration for AMD MI250 with tensor parallelism 2
quickstart-vllm-qwen-qwen3-1.7b-amd-mi300x-tp2.helm.yaml Adds Qwen3-1.7B base configuration for AMD MI300X with tensor parallelism 2
quickstart-vllm-qwen-qwen3-1.7b-amd-mi250-tp2.helm.yaml Adds Qwen3-1.7B base configuration for AMD MI250 with tensor parallelism 2
quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml Adds Llama-3.2-1B-Instruct prefill configuration for AMD MI300X with tensor parallelism 2
quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml Adds Llama-3.2-1B-Instruct prefill configuration for AMD MI250 with tensor parallelism 2
quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml Adds Llama-3.2-1B-Instruct decode configuration for AMD MI300X with tensor parallelism 2
quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml Adds Llama-3.2-1B-Instruct decode configuration for AMD MI250 with tensor parallelism 2
quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml Adds Llama-3.2-1B-Instruct base configuration for AMD MI300X with tensor parallelism 2
quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.helm.yaml Adds Llama-3.2-1B-Instruct base configuration for AMD MI250 with tensor parallelism 2

@hhk7734 hhk7734 merged commit 40cd5df into main Feb 3, 2026
3 checks passed
@hhk7734 hhk7734 deleted the MAF-19231_add_qwen_preset branch February 3, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants