MAF-19231: feat(preset): add new InferenceServiceTemplates · Pull Request #45 · moreh-dev/mif

ghost · 2026-02-02T08:25:04Z

--max-model-len 옵션을 해제하면 meta-llama/Llama-3.2-1B-Instruct 모델의 경우 131072값이 default인 것으로 보이고,
--max-num-batched-tokens 옵션을 해제하면 max(max_model_len, 2048)값이 default인 것으로 보입니다.

…eta-llama-3.2-1b-instruct across multiple AMD configurations - Introduced templates for vllm-meta-llama-3.2-1b-instruct with support for AMD MI250 and MI300x GPUs. - Configured environment variables and resource requests/limits for optimal performance. - Added support for different roles (consumer, producer) in the extra arguments for each template.

…rt-' prefix for consistency across vllm-meta-llama-3.2-1b-instruct templates for AMD MI250 and MI300x configurations.

Copilot

Pull request overview

This PR adds new InferenceServiceTemplate configurations for the Llama-3.2-1B-Instruct model to support disaggregated prefill/decode architectures and removes unnecessary vLLM configuration options to rely on defaults. According to the description, removing the --max-model-len option allows the model to use its default value of 131072, and removing --max-num-batched-tokens uses the default of max(max_model_len, 2048).

Changes:

Added 5 new InferenceServiceTemplate files for different configurations (prefill/decode/combined variants for mi300x and mi250 GPUs)
Removed explicit vLLM configuration options (--quantization, --max-model-len, --max-num-batched-tokens, --no-enable-prefix-caching) from the existing mi250 template

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml	New prefill template for mi300x with kv_producer role
vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml	New prefill template for mi250 with kv_producer role
vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml	New decode template for mi300x with kv_consumer role
vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml	New decode template for mi250 with kv_consumer role
vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml	New combined template for mi300x with kv_both role
vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.yaml	Removed explicit vLLM configuration options to use defaults

…meta-llama-llama-3.2-1b-instruct presets across AMD MI250 and MI300x configurations.

hhk7734

--max-model-len 16384
--max-num-batched-tokens 8192

이렇게 하시죠

…s to include new arguments for maximum model length and batched tokens across AMD MI250 and MI300x configurations.

…ama-3.2-1b-instruct Helm templates for AMD MI250 and MI300x configurations.

ghost · 2026-02-03T00:56:03Z

--max-model-len 16384
--max-num-batched-tokens 8192

이렇게 하시죠

f89dba5

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

…lama-llama-3.2-1b-instruct

ghost self-assigned this Feb 2, 2026

ghost marked this pull request as ready for review February 2, 2026 09:45

ghost self-requested a review as a code owner February 2, 2026 09:45

ghost requested review from a user, Copilot, hhk7734 and jinwoopark-moreh February 2, 2026 09:45

Copilot started reviewing on behalf of ghost February 2, 2026 09:46 View session

MAF-19231: Rename InferenceServiceTemplate names to include 'quicksta…

e4ff060

…rt-' prefix for consistency across vllm-meta-llama-3.2-1b-instruct templates for AMD MI250 and MI300x configurations.

Copilot AI reviewed Feb 2, 2026

View reviewed changes

MAF-19231: Update Helm templates to use new label inclusion for vllm-…

ee40704

…meta-llama-llama-3.2-1b-instruct presets across AMD MI250 and MI300x configurations.

hhk7734 requested changes Feb 2, 2026

View reviewed changes

Comment thread .../templates/presets/quickstart/vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml Outdated

Comment thread ...preset/templates/presets/quickstart/vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.yaml Outdated

seongsu-dev added 2 commits February 3, 2026 09:55

MAF-19231: Update vllm-meta-llama-llama-3.2-1b-instruct Helm template…

f89dba5

…s to include new arguments for maximum model length and batched tokens across AMD MI250 and MI300x configurations.

MAF-19231: Remove kv-transfer-config argument from vllm-meta-llama-ll…

8ad8218

…ama-3.2-1b-instruct Helm templates for AMD MI250 and MI300x configurations.

Copilot AI review requested due to automatic review settings February 3, 2026 00:55

Copilot started reviewing on behalf of ghost February 3, 2026 00:56 View session

Merge branch 'main' into MAF-19231_add_preset

3bc78ce

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Comment thread ...t/templates/presets/quickstart/vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.helm.yaml

hhk7734 previously approved these changes Feb 3, 2026

View reviewed changes

MAF-19231: rename InferenceServiceTemplate for quickstart vllm-meta-l…

f8e8712

…lama-llama-3.2-1b-instruct

ghost dismissed hhk7734’s stale review via f8e8712 February 3, 2026 01:32

ghost requested a review from hhk7734 February 3, 2026 01:33

hhk7734 approved these changes Feb 3, 2026

View reviewed changes

hhk7734 merged commit 1d97e58 into main Feb 3, 2026
3 checks passed

hhk7734 deleted the MAF-19231_add_preset branch February 3, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAF-19231: feat(preset): add new InferenceServiceTemplates#45

MAF-19231: feat(preset): add new InferenceServiceTemplates#45
hhk7734 merged 7 commits intomainfrom
MAF-19231_add_preset

ghost commented Feb 2, 2026 •

edited by ghost

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hhk7734 left a comment

Uh oh!

Uh oh!

Uh oh!

ghost commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ghost commented Feb 2, 2026 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hhk7734 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ghost commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ghost commented Feb 2, 2026 •

edited by ghost

Loading