Skip to content

MAF-19231: feat(preset): add new InferenceServiceTemplates#45

Merged
hhk7734 merged 7 commits intomainfrom
MAF-19231_add_preset
Feb 3, 2026
Merged

MAF-19231: feat(preset): add new InferenceServiceTemplates#45
hhk7734 merged 7 commits intomainfrom
MAF-19231_add_preset

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Feb 2, 2026

--max-model-len 옵션을 해제하면 meta-llama/Llama-3.2-1B-Instruct 모델의 경우 131072값이 default인 것으로 보이고,
--max-num-batched-tokens 옵션을 해제하면 max(max_model_len, 2048)값이 default인 것으로 보입니다.

…eta-llama-3.2-1b-instruct across multiple AMD configurations

- Introduced templates for vllm-meta-llama-3.2-1b-instruct with support for AMD MI250 and MI300x GPUs.
- Configured environment variables and resource requests/limits for optimal performance.
- Added support for different roles (consumer, producer) in the extra arguments for each template.
@ghost ghost self-assigned this Feb 2, 2026
@ghost ghost marked this pull request as ready for review February 2, 2026 09:45
@ghost ghost self-requested a review as a code owner February 2, 2026 09:45
@ghost ghost requested review from a user, Copilot, hhk7734 and jinwoopark-moreh February 2, 2026 09:45
…rt-' prefix for consistency across vllm-meta-llama-3.2-1b-instruct templates for AMD MI250 and MI300x configurations.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds new InferenceServiceTemplate configurations for the Llama-3.2-1B-Instruct model to support disaggregated prefill/decode architectures and removes unnecessary vLLM configuration options to rely on defaults. According to the description, removing the --max-model-len option allows the model to use its default value of 131072, and removing --max-num-batched-tokens uses the default of max(max_model_len, 2048).

Changes:

  • Added 5 new InferenceServiceTemplate files for different configurations (prefill/decode/combined variants for mi300x and mi250 GPUs)
  • Removed explicit vLLM configuration options (--quantization, --max-model-len, --max-num-batched-tokens, --no-enable-prefix-caching) from the existing mi250 template

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml New prefill template for mi300x with kv_producer role
vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml New prefill template for mi250 with kv_producer role
vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml New decode template for mi300x with kv_consumer role
vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml New decode template for mi250 with kv_consumer role
vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml New combined template for mi300x with kv_both role
vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.yaml Removed explicit vLLM configuration options to use defaults

…meta-llama-llama-3.2-1b-instruct presets across AMD MI250 and MI300x configurations.
Copy link
Copy Markdown
Member

@hhk7734 hhk7734 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--max-model-len 16384
--max-num-batched-tokens 8192

이렇게 하시죠

…s to include new arguments for maximum model length and batched tokens across AMD MI250 and MI300x configurations.
…ama-3.2-1b-instruct Helm templates for AMD MI250 and MI300x configurations.
Copilot AI review requested due to automatic review settings February 3, 2026 00:55
@ghost
Copy link
Copy Markdown
Author

ghost commented Feb 3, 2026

--max-model-len 16384
--max-num-batched-tokens 8192

이렇게 하시죠

f89dba5

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

hhk7734
hhk7734 previously approved these changes Feb 3, 2026
@ghost ghost requested a review from hhk7734 February 3, 2026 01:33
@hhk7734 hhk7734 merged commit 1d97e58 into main Feb 3, 2026
3 checks passed
@hhk7734 hhk7734 deleted the MAF-19231_add_preset branch February 3, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants