v1.3.0
π Shepherd Model Gateway v1.3.0 Released
We're excited to announce Shepherd Model Gateway v1.3.0 β a major release bringing native Messages API support and expanding our agentic workload capabilities.
π― Messages API: First-Class Implementation
Native Messages API implementation with core protocol support: credits to @CatherineSue
- True first-class support β Direct protocol implementation, not a translation layer
- Extended thinking β Native ThinkingConfig with per-model reasoning activation and streaming thinking_delta events
- Full streaming + non-streaming β Complete Anthropic SSE event protocol
- Tool use β Custom tool definitions, tool_choice, structured tool output
- Works across all backends β SGLang, vLLM, TensorRT-LLM via gRPC
- Drop-in Anthropic SDK compatibility β Same API shape, your infrastructure
Why first-class matters: Wiring Messages API through chat completion (the common approach) silently drops thinking blocks β both in conversation history and model output β because the chat completion protocol has no concept of reasoning content. SMG's native implementation preserves thinking blocks end-to-end: ThinkingConfig activates model-specific reasoning, the streaming state machine emits proper thinking_delta events, and interleaved reasoning + text + tool use content blocks are assembled in correct order. No translation layer, no silent data loss.
π Expanding Agentic Workload Support
SMG now supports three major agentic APIs:
- Chat Completions API (OpenAI) - Standard conversational interface
- Responses API (OpenAI) - Still the only gateway supporting this for open-source models and third party vendor
- Messages API (Anthropic) - NEW - First-class native implementation with reasoning support
Plus routing to all major 3rd party providers: OpenAI, Anthropic, Gemini and more.
Impact: SMG sits behind any agent framework (Claude Code, Codex, OpenClaw, OpenCode) and routes to any model. Run agentic workflows designed for Claude on Llama 4, Qwen 3, DeepSeek, Kimi-K2.5βyour infrastructure, full protocol fidelity including reasoning.
π Unified /v1/models Across All Providers
Consistent model discovery experience across Anthropic, OpenAI, and Gemini:
- Unified /v1/models response format across all routers
- Consistent schema regardless of backend provider
- Single API surface for model enumeration
β‘ High Availability Mesh Improvements
Sync cache-aware policy state across mesh HA nodes:
- Cache policy state replicated across all mesh nodes
- Automatic failover with consistent routing decisions
- Zero-downtime deployments with state continuity
π οΈ smg-grpc-servicer Enhancements
- Native SGLang backend support
- Multi-backend extras for flexible deployment
- Improved vLLM GetModelInfo response with served_model_name
π Bug Fixes
- Mesh: Plumbed --router-selector through CLI and Python bindings
- Realtime API: Fixed worker health tracking in WebSocket session
- gRPC servicer: Return served_model_name in vLLM GetModelInfo response
- Dependencies: Pinned gRPC packages to 1.78.0 in SGLang install script
ποΈ Infrastructure
- TensorRT-LLM default base image bumped to 1.3.0rc7
- Added DeepWiki badge to README
- Docker CI improvements
Full Changelog: v1.2.0...v1.3.0
Upgrade now: pip install smg --upgrade
π Shepherd your LLM infrastructure with confidence.
Built for speed. Engineered for scale. Production-proven.
What's Changed
- fix(ci): add packages:write permission to engine docker release workflows by @slin1237 in #697
- refactor(gateway): unify /v1/models response across all routers by @slin1237 in #692
- refactor(gateway): route realtime API through RouterTrait by @CatherineSue in #690
- chore(deps): bump docker/setup-qemu-action from 3 to 4 by @dependabot[bot] in #704
- chore(deps): update tokio-tungstenite requirement from 0.26 to 0.28 by @dependabot[bot] in #706
- chore(deps): bump docker/login-action from 3 to 4 by @dependabot[bot] in #703
- chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #702
- chore(deps): bump docker/setup-buildx-action from 3 to 4 by @dependabot[bot] in #701
- chore(deps): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #700
- Update max_concurrent_jobs from upstream by @ekzhang in #711
- chore: add gongwei to code owner of docker and installation by @slin1237 in #715
- chore: add gongwei to code owner of python binding by @slin1237 in #716
- chore(ci): bump trtllm default base image to 1.3.0rc7 by @slin1237 in #717
- docs: add DeepWiki badge to README by @slin1237 in #718
- fix(deps): pin gRPC packages to 1.78.0 in sglang install script by @YouNeedCryDear in #719
- feat(gateway): API-key-aware /v1/models with upstream fan-out by @slin1237 in #698
- chore: fix lint by @slin1237 in #720
- refactor(gateway): extract shared worker selection module by @slin1237 in #721
- fix(mesh): plumb --router-selector through CLI and Python bindings by @slin1237 in #724
- fix(grpc_servicer): return served_model_name in vLLM GetModelInfo response by @CatherineSue in #727
- refactor(gateway): split OpenAI router.rs into chat and health modules by @slin1237 in #726
- fix(realtime-api): worker health tracking in websocket session by @pallasathena92 in #725
- refactor(gateway): extract MCP module from OpenAI responses by @slin1237 in #730
- feat(realtime-api): WebRTC Router trait interface + HTTP route regist⦠by @pallasathena92 in #731
- feat(realtime-api): WebRTC config plumbing through AppContext and CLI by @pallasathena92 in #729
- refactor(gateway): extract history loading and storage queries from router.rs by @slin1237 in #732
- feat: sync cache-aware policy state across mesh HA nodes by @llfl in #655
- fix: update CODEOWNERS paths after crate relocation by @slin1237 in #734
- refactor(gateway): extract route_responses orchestration into responses/route.rs by @slin1237 in #735
- refactor(gateway): simplify openai router internals by @slin1237 in #737
- feat(gateway): add Messages API type scaffolding to gRPC router by @slin1237 in #739
- feat(gateway): add message_utils and MessagePreparationStage for Messages API by @slin1237 in #741
- feat(gateway): add MessageRequestBuildingStage for Messages API by @slin1237 in #744
- feat(grpc_servicer): add sglang support with multi-backend extras by @slin1237 in #745
- fix(ci): prevent upload-servicer from being skipped by @slin1237 in #746
- docs(template): add slack link by @lightseek-bot in #749
- feat(gateway): add MessageResponseProcessingStage for Messages API (non-streaming) by @slin1237 in #747
- feat(gateway): wire Messages API pipeline into gRPC routers by @slin1237 in #753
- feat(gateway): add Messages API streaming support to gRPC router by @slin1237 in #758
- chore: bump versions for v1.3.0 release by @slin1237 in #760
Full Changelog: v1.2.0...v1.3.0