Release v1.3.0 · lightseekorg/smg

🚀 Shepherd Model Gateway v1.3.0 Released

We're excited to announce Shepherd Model Gateway v1.3.0 – a major release bringing native Messages API support and expanding our agentic workload capabilities.

🎯 Messages API: First-Class Implementation

Native Messages API implementation with core protocol support: credits to @CatherineSue

True first-class support — Direct protocol implementation, not a translation layer
Extended thinking — Native ThinkingConfig with per-model reasoning activation and streaming thinking_delta events
Full streaming + non-streaming — Complete Anthropic SSE event protocol
Tool use — Custom tool definitions, tool_choice, structured tool output
Works across all backends — SGLang, vLLM, TensorRT-LLM via gRPC
Drop-in Anthropic SDK compatibility — Same API shape, your infrastructure

Why first-class matters: Wiring Messages API through chat completion (the common approach) silently drops thinking blocks — both in conversation history and model output — because the chat completion protocol has no concept of reasoning content. SMG's native implementation preserves thinking blocks end-to-end: ThinkingConfig activates model-specific reasoning, the streaming state machine emits proper thinking_delta events, and interleaved reasoning + text + tool use content blocks are assembled in correct order. No translation layer, no silent data loss.

🔗 Expanding Agentic Workload Support

SMG now supports three major agentic APIs:

Chat Completions API (OpenAI) - Standard conversational interface
Responses API (OpenAI) - Still the only gateway supporting this for open-source models and third party vendor
Messages API (Anthropic) - NEW - First-class native implementation with reasoning support

Plus routing to all major 3rd party providers: OpenAI, Anthropic, Gemini and more.

Impact: SMG sits behind any agent framework (Claude Code, Codex, OpenClaw, OpenCode) and routes to any model. Run agentic workflows designed for Claude on Llama 4, Qwen 3, DeepSeek, Kimi-K2.5—your infrastructure, full protocol fidelity including reasoning.

🌐 Unified /v1/models Across All Providers

Consistent model discovery experience across Anthropic, OpenAI, and Gemini:

Unified /v1/models response format across all routers
Consistent schema regardless of backend provider
Single API surface for model enumeration

⚡ High Availability Mesh Improvements

Sync cache-aware policy state across mesh HA nodes:

Cache policy state replicated across all mesh nodes
Automatic failover with consistent routing decisions
Zero-downtime deployments with state continuity

🛠️ smg-grpc-servicer Enhancements

Native SGLang backend support
Multi-backend extras for flexible deployment
Improved vLLM GetModelInfo response with served_model_name

🐛 Bug Fixes

Mesh: Plumbed --router-selector through CLI and Python bindings
Realtime API: Fixed worker health tracking in WebSocket session
gRPC servicer: Return served_model_name in vLLM GetModelInfo response
Dependencies: Pinned gRPC packages to 1.78.0 in SGLang install script

🏗️ Infrastructure

TensorRT-LLM default base image bumped to 1.3.0rc7
Added DeepWiki badge to README
Docker CI improvements

Full Changelog: v1.2.0...v1.3.0

Upgrade now: pip install smg --upgrade

🐑 Shepherd your LLM infrastructure with confidence.

Built for speed. Engineered for scale. Production-proven.

What's Changed

fix(ci): add packages:write permission to engine docker release workflows by @slin1237 in #697
refactor(gateway): unify /v1/models response across all routers by @slin1237 in #692
refactor(gateway): route realtime API through RouterTrait by @CatherineSue in #690
chore(deps): bump docker/setup-qemu-action from 3 to 4 by @dependabot[bot] in #704
chore(deps): update tokio-tungstenite requirement from 0.26 to 0.28 by @dependabot[bot] in #706
chore(deps): bump docker/login-action from 3 to 4 by @dependabot[bot] in #703
chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #702
chore(deps): bump docker/setup-buildx-action from 3 to 4 by @dependabot[bot] in #701
chore(deps): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #700
Update max_concurrent_jobs from upstream by @ekzhang in #711
chore: add gongwei to code owner of docker and installation by @slin1237 in #715
chore: add gongwei to code owner of python binding by @slin1237 in #716
chore(ci): bump trtllm default base image to 1.3.0rc7 by @slin1237 in #717
docs: add DeepWiki badge to README by @slin1237 in #718
fix(deps): pin gRPC packages to 1.78.0 in sglang install script by @YouNeedCryDear in #719
feat(gateway): API-key-aware /v1/models with upstream fan-out by @slin1237 in #698
chore: fix lint by @slin1237 in #720
refactor(gateway): extract shared worker selection module by @slin1237 in #721
fix(mesh): plumb --router-selector through CLI and Python bindings by @slin1237 in #724
fix(grpc_servicer): return served_model_name in vLLM GetModelInfo response by @CatherineSue in #727
refactor(gateway): split OpenAI router.rs into chat and health modules by @slin1237 in #726
fix(realtime-api): worker health tracking in websocket session by @pallasathena92 in #725
refactor(gateway): extract MCP module from OpenAI responses by @slin1237 in #730
feat(realtime-api): WebRTC Router trait interface + HTTP route regist… by @pallasathena92 in #731
feat(realtime-api): WebRTC config plumbing through AppContext and CLI by @pallasathena92 in #729
refactor(gateway): extract history loading and storage queries from router.rs by @slin1237 in #732
feat: sync cache-aware policy state across mesh HA nodes by @llfl in #655
fix: update CODEOWNERS paths after crate relocation by @slin1237 in #734
refactor(gateway): extract route_responses orchestration into responses/route.rs by @slin1237 in #735
refactor(gateway): simplify openai router internals by @slin1237 in #737
feat(gateway): add Messages API type scaffolding to gRPC router by @slin1237 in #739
feat(gateway): add message_utils and MessagePreparationStage for Messages API by @slin1237 in #741
feat(gateway): add MessageRequestBuildingStage for Messages API by @slin1237 in #744
feat(grpc_servicer): add sglang support with multi-backend extras by @slin1237 in #745
fix(ci): prevent upload-servicer from being skipped by @slin1237 in #746
docs(template): add slack link by @lightseek-bot in #749
feat(gateway): add MessageResponseProcessingStage for Messages API (non-streaming) by @slin1237 in #747
feat(gateway): wire Messages API pipeline into gRPC routers by @slin1237 in #753
feat(gateway): add Messages API streaming support to gRPC router by @slin1237 in #758
chore: bump versions for v1.3.0 release by @slin1237 in #760

Full Changelog: v1.2.0...v1.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.0

Choose a tag to compare

Sorry, something went wrong.