Skip to content

v1.3.0

Choose a tag to compare

@slin1237 slin1237 released this 15 Mar 00:26
· 621 commits to main since this release
92effcc

πŸš€ Shepherd Model Gateway v1.3.0 Released

We're excited to announce Shepherd Model Gateway v1.3.0 – a major release bringing native Messages API support and expanding our agentic workload capabilities.

🎯 Messages API: First-Class Implementation

Native Messages API implementation with core protocol support: credits to @CatherineSue

  • True first-class support β€” Direct protocol implementation, not a translation layer
  • Extended thinking β€” Native ThinkingConfig with per-model reasoning activation and streaming thinking_delta events
  • Full streaming + non-streaming β€” Complete Anthropic SSE event protocol
  • Tool use β€” Custom tool definitions, tool_choice, structured tool output
  • Works across all backends β€” SGLang, vLLM, TensorRT-LLM via gRPC
  • Drop-in Anthropic SDK compatibility β€” Same API shape, your infrastructure

Why first-class matters: Wiring Messages API through chat completion (the common approach) silently drops thinking blocks β€” both in conversation history and model output β€” because the chat completion protocol has no concept of reasoning content. SMG's native implementation preserves thinking blocks end-to-end: ThinkingConfig activates model-specific reasoning, the streaming state machine emits proper thinking_delta events, and interleaved reasoning + text + tool use content blocks are assembled in correct order. No translation layer, no silent data loss.

πŸ”— Expanding Agentic Workload Support

SMG now supports three major agentic APIs:

  • Chat Completions API (OpenAI) - Standard conversational interface
  • Responses API (OpenAI) - Still the only gateway supporting this for open-source models and third party vendor
  • Messages API (Anthropic) - NEW - First-class native implementation with reasoning support

Plus routing to all major 3rd party providers: OpenAI, Anthropic, Gemini and more.

Impact: SMG sits behind any agent framework (Claude Code, Codex, OpenClaw, OpenCode) and routes to any model. Run agentic workflows designed for Claude on Llama 4, Qwen 3, DeepSeek, Kimi-K2.5β€”your infrastructure, full protocol fidelity including reasoning.

🌐 Unified /v1/models Across All Providers

Consistent model discovery experience across Anthropic, OpenAI, and Gemini:

  • Unified /v1/models response format across all routers
  • Consistent schema regardless of backend provider
  • Single API surface for model enumeration

⚑ High Availability Mesh Improvements

Sync cache-aware policy state across mesh HA nodes:

  • Cache policy state replicated across all mesh nodes
  • Automatic failover with consistent routing decisions
  • Zero-downtime deployments with state continuity

πŸ› οΈ smg-grpc-servicer Enhancements

  • Native SGLang backend support
  • Multi-backend extras for flexible deployment
  • Improved vLLM GetModelInfo response with served_model_name

πŸ› Bug Fixes

  • Mesh: Plumbed --router-selector through CLI and Python bindings
  • Realtime API: Fixed worker health tracking in WebSocket session
  • gRPC servicer: Return served_model_name in vLLM GetModelInfo response
  • Dependencies: Pinned gRPC packages to 1.78.0 in SGLang install script

πŸ—οΈ Infrastructure

  • TensorRT-LLM default base image bumped to 1.3.0rc7
  • Added DeepWiki badge to README
  • Docker CI improvements

Full Changelog: v1.2.0...v1.3.0

Upgrade now: pip install smg --upgrade

πŸ‘ Shepherd your LLM infrastructure with confidence.

Built for speed. Engineered for scale. Production-proven.

What's Changed

  • fix(ci): add packages:write permission to engine docker release workflows by @slin1237 in #697
  • refactor(gateway): unify /v1/models response across all routers by @slin1237 in #692
  • refactor(gateway): route realtime API through RouterTrait by @CatherineSue in #690
  • chore(deps): bump docker/setup-qemu-action from 3 to 4 by @dependabot[bot] in #704
  • chore(deps): update tokio-tungstenite requirement from 0.26 to 0.28 by @dependabot[bot] in #706
  • chore(deps): bump docker/login-action from 3 to 4 by @dependabot[bot] in #703
  • chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #702
  • chore(deps): bump docker/setup-buildx-action from 3 to 4 by @dependabot[bot] in #701
  • chore(deps): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #700
  • Update max_concurrent_jobs from upstream by @ekzhang in #711
  • chore: add gongwei to code owner of docker and installation by @slin1237 in #715
  • chore: add gongwei to code owner of python binding by @slin1237 in #716
  • chore(ci): bump trtllm default base image to 1.3.0rc7 by @slin1237 in #717
  • docs: add DeepWiki badge to README by @slin1237 in #718
  • fix(deps): pin gRPC packages to 1.78.0 in sglang install script by @YouNeedCryDear in #719
  • feat(gateway): API-key-aware /v1/models with upstream fan-out by @slin1237 in #698
  • chore: fix lint by @slin1237 in #720
  • refactor(gateway): extract shared worker selection module by @slin1237 in #721
  • fix(mesh): plumb --router-selector through CLI and Python bindings by @slin1237 in #724
  • fix(grpc_servicer): return served_model_name in vLLM GetModelInfo response by @CatherineSue in #727
  • refactor(gateway): split OpenAI router.rs into chat and health modules by @slin1237 in #726
  • fix(realtime-api): worker health tracking in websocket session by @pallasathena92 in #725
  • refactor(gateway): extract MCP module from OpenAI responses by @slin1237 in #730
  • feat(realtime-api): WebRTC Router trait interface + HTTP route regist… by @pallasathena92 in #731
  • feat(realtime-api): WebRTC config plumbing through AppContext and CLI by @pallasathena92 in #729
  • refactor(gateway): extract history loading and storage queries from router.rs by @slin1237 in #732
  • feat: sync cache-aware policy state across mesh HA nodes by @llfl in #655
  • fix: update CODEOWNERS paths after crate relocation by @slin1237 in #734
  • refactor(gateway): extract route_responses orchestration into responses/route.rs by @slin1237 in #735
  • refactor(gateway): simplify openai router internals by @slin1237 in #737
  • feat(gateway): add Messages API type scaffolding to gRPC router by @slin1237 in #739
  • feat(gateway): add message_utils and MessagePreparationStage for Messages API by @slin1237 in #741
  • feat(gateway): add MessageRequestBuildingStage for Messages API by @slin1237 in #744
  • feat(grpc_servicer): add sglang support with multi-backend extras by @slin1237 in #745
  • fix(ci): prevent upload-servicer from being skipped by @slin1237 in #746
  • docs(template): add slack link by @lightseek-bot in #749
  • feat(gateway): add MessageResponseProcessingStage for Messages API (non-streaming) by @slin1237 in #747
  • feat(gateway): wire Messages API pipeline into gRPC routers by @slin1237 in #753
  • feat(gateway): add Messages API streaming support to gRPC router by @slin1237 in #758
  • chore: bump versions for v1.3.0 release by @slin1237 in #760

Full Changelog: v1.2.0...v1.3.0