Skip to content

MAF-19399: docs(website): add multi-model serving operations guide#88

Merged
hhk7734 merged 1 commit intomainfrom
MAF-19399
Mar 12, 2026
Merged

MAF-19399: docs(website): add multi-model serving operations guide#88
hhk7734 merged 1 commit intomainfrom
MAF-19399

Conversation

@hhk7734
Copy link
Copy Markdown
Member

@hhk7734 hhk7734 commented Mar 12, 2026

Summary

  • Add website/docs/operations/multi-model.mdx documenting how to serve multiple models through a single gateway endpoint using BBR and Heimdall schedulers
  • Guide covers: Gateway, Body-Based Router (BBR), per-model Heimdall instances with gateway.bbr.models, and InferenceService resources using vllm-hf-hub-offline templates
  • Reorder sidebar positions in the operations section

Test plan

  • Verified multi-model routing on p-cluster (hyeonki namespace) with Llama 3.2 1B and Qwen 3 1.7B on MI250
  • Confirmed BBR body-based routing works (model extracted from request body)
  • Confirmed direct X-Gateway-Model-Name header routing works without BBR
  • Review doc formatting renders correctly in Docusaurus

🤖 Generated with Claude Code

Add documentation for serving multiple models through a single gateway
endpoint using BBR and Heimdall schedulers. The guide covers deploying
Gateway, Body-Based Router, per-model Heimdall instances, and
InferenceService resources with vllm-hf-hub-offline templates.

Also reorder sidebar positions in the operations section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new operations guide for multi-model serving through a single gateway endpoint, using Body-Based Router (BBR) and Heimdall schedulers in the MoAI Inference Framework. It also reorders sidebar positions in the operations section.

Changes:

  • Add multi-model.mdx documenting full setup of multi-model routing with BBR, Heimdall, and InferenceService resources across GPU types
  • Reorder sidebar positions: latest-release moved to -999, container-image-caching-with-harbor moved to 30, new guide placed at 20

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
website/docs/operations/multi-model.mdx New guide covering architecture, gateway, BBR, Heimdall, InferenceService deployment, request examples, and cleanup for multi-model serving
website/docs/operations/latest-release.mdx Sidebar position changed from 1 to -999 to pin it at the top
website/docs/operations/container-image-caching-with-harbor/index.mdx Sidebar position changed from 4 to 30 to accommodate new ordering

You can also share your feedback on Copilot code review. Take the survey.

@hhk7734
Copy link
Copy Markdown
Member Author

hhk7734 commented Mar 12, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

@hhk7734 hhk7734 merged commit 91b8cf0 into main Mar 12, 2026
8 checks passed
@hhk7734 hhk7734 deleted the MAF-19399 branch March 12, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants