v1.1.0
π Shepherd Model Gateway v1.1.0 Released!
We're excited to announce Shepherd Model Gateway v1.1.0 β a major feature release bringing universal multimodal support, Messages API MCP integration, and critical production hardening across the entire stack!
π¨ Universal Multimodal Support π₯
Industry-leading multimodal processing across all major inference engines:
- SGLang gRPC - Full multimodal pipeline with vision processing
- vLLM gRPC - Fetch + preprocess pipeline with multimodal support
- TensorRT-LLM gRPC - Complete multimodal integration
- Llama 4 Vision - First-class support with model spec and processor registration
Impact: Deploy vision-language models across SGLang, vLLM, and TensorRT-LLM with unified processing. Data URI detection, 4D pixel values, and i64 aspect ratios for production-grade image handling.
π Messages API Gets MCP
Complete MCP tool integration for Anthropic Messages API:
- Streaming and non-streaming MCP tool use
- Unified tool allowlist enforcement across OpenAI and gRPC routers
- Server binding architecture with session lifecycle management
- Built-in server filtering from tool listings
- Unique server_label requirements for tool collision prevention
E2E tested with comprehensive MCP tool use coverage.
β¨ Major New Features
π vLLM HTTP Backend Support
Auto-detection and support for vLLM HTTP endpoints via DetectBackendStep β seamlessly switch between gRPC and HTTP workers.
π― smg serve Engine Args Pass-through
Pass arbitrary engine-specific arguments directly through smg serve to your inference engines. Maximum flexibility for custom configurations.
π§ Tiktoken Hub Model Support
Unified chat template API with tiktoken hub integration. Improved OpenAI o-series model detection and error handling.
π NanoV3 Reasoning Parser
Native support for Nemotron Nano V3 reasoning output parsing.
π¨ Startup Banner
Beautiful braille art shepherd motif on startup β because production systems deserve aesthetics.
β‘ Performance Optimizations
WASM Runtime Enhancements:
- Optimized component cache lookup
- Reduced per-request cloning overhead
- SHA-256 cache keys for efficient middleware
π‘οΈ Critical Production Hardening
Responses API Fixes:
- Fixed data loss and panic risks
- Sanitized upstream error bodies
- Improved structural integrity
Middleware Reliability:
- Fixed extension loss in request pipeline
- Eliminated auth timing leak
- Corrected streaming body buffering
Tokenizer Robustness:
- Cache correctness fixes
- Streaming reliability improvements
- Chat template error handling
Data Connector Hardening:
- Eliminated deadlock, block_on, and triple pool bugs
- Storage backend protection against data corruption
- Race condition fixes
Concurrency Safety:
- Fixed tokio mutex release before awaiting in LoadMonitor
- Improved SSE event processing and buffer management
Multimodal Correctness:
- Proper data URI detection
- 4D pixel_values output
- i64 aspect_ratios for large images
ποΈ Architectural Improvements
Worker Infrastructure:
- Consolidated DPAwareWorker into BasicWorker
- Moved DP fields to WorkerSpec
- Unified worker metadata discovery
- Cleaner registration workflow
Code Quality:
- Enforced strict clippy linting workspace-wide
- Added
clippy::absolute_pathsandsingle_component_path_importslints - Improved error handling across all modules
π§ Developer Experience
CI/DevOps:
- DCO check with probot app
- Mergify automation for PR management
- Branch naming enforcement
- Docker image release workflow
- Auto-trigger benchmark workflows on code changes
Tooling:
- Workspace version checker script
- PyPI proto version validation
- Remote dev workflow for proto testing
Python Support:
- Lowered minimum Python version from 3.12 to 3.9
π Bug Fixes
- Fixed worker health config, bootstrap parsing, and model card cloning issues
- Improved serve CLI arg filtering and config error handling
- Better pre-commit hook configuration
- Corrected labeler workflow for fork PRs
π Interactions API
Added comprehensive validations for the Interactions API protocol.
π Full Changelog: v1.0.1...v1.1.0
Upgrade now: pip install smg --upgrade
π Shepherd your LLM infrastructure with confidence.
β‘ Built for speed. Engineered for scale. Production-proven.
What's Changed
- fix: render README images on PyPI/crates.io and bump version to 1.0.1 by @slin1237 in #420
- chore(ci): Change nightly benchmark schedule to midnight PST by @key4ng in #422
- ci: add DCO check, Mergify automation, and branch naming enforcement by @CatherineSue in #424
- ci: temporarily disable auto-close for branch naming violations by @CatherineSue in #426
- ci: add needs-rebase label management to Mergify by @CatherineSue in #427
- fix(ci): use correct Mergify syntax for negated regex condition by @CatherineSue in #429
- ci: improve label management with router-specific and feature labels by @CatherineSue in #428
- ci: add Docker image release workflow by @slin1237 in #431
- feat(message api): MCP tool use with streaming and non-streaming support by @key4ng in #352
- refactor(core): consolidate DPAwareWorker into BasicWorker by @slin1237 in #434
- fix: pre-existing issues in worker health config, bootstrap parsing, and model card cloning by @slin1237 in #415
- refactor(core): move DP fields to WorkerSpec and remove default_model_type by @slin1237 in #436
- chore: fix main log by @slin1237 in #437
- test(e2e): add MCP tool use tests for Anthropic Messages API by @key4ng in #433
- feat(core): add DetectBackendStep for vLLM HTTP support by @slin1237 in #438
- feat(tokenizer): add tiktoken hub model support and unify chat template API by @slin1237 in #439
- perf(wasm): optimize WASM component cache lookup and reduce per-request cloning by @ppraneth in #440
- refactor(core): unify worker metadata discovery and clean up registration by @slin1237 in #447
- feat(version): add startup banner with braille art shepherd motif by @slin1237 in #448
- fix(python): lower minimum Python version from 3.12 to 3.9 by @slin1237 in #449
- ci(mergify): enable auto-close for non-conforming branch names by @CatherineSue in #454
- ci(mergify): allow multi-segment branch names for dependabot by @CatherineSue in #456
- ci(dco): switch DCO check from GitHub Actions to probot DCO app by @CatherineSue in #462
- fix(openai): fix data loss, panic risk, and structural issues in Responses API by @slin1237 in #468
- fix(mcp): filter builtin servers from mcp_list_tools output by @key4ng in #450
- feat(interactions): Add validations for interactions api by @XinyueZhang369 in #399
- feat(mcp): enforce allowed_tools filtering across openai and grpc routers by @zhaowenzi in #467
- fix(openai): sanitize upstream error bodies in Responses API by @slin1237 in #473
- fix(middleware): fix extension loss, auth timing leak, and streaming body buffering by @slin1237 in #472
- fix(concurrency): release tokio mutex before awaiting task in LoadMonitor::stop() by @slin1237 in #475
- fix(tokenizer): correctness and robustness fixes for cache and streaming by @slin1237 in #474
- fix(ci): use pull_request_target for labeler to support fork PRs by @CatherineSue in #477
- fix(data-connector): fix deadlock, block_on, triple pool, and DDL type bugs by @slin1237 in #471
- feat(reasoning-parser): add NanoV3 reasoning parser by @slin1237 in #480
- refactor(anthropic): simplify worker lifecycle in Anthropic router by @key4ng in #476
- feat(realtime api): realtime api session and transcription_session protocols by @pallasathena92 in #364
- fix(protocols): require unique server_label for MCP tools by @zhaowenzi in #479
- feat: smg serve pass through engine args to engine by @gongwei-130 in #460
- fix(serve): harden CLI arg filtering and config error handling by @slin1237 in #483
- feat(scripts): replace release notes generator with workspace version checker by @slin1237 in #484
- fix(ci): match probot DCO app check name in Mergify rule by @CatherineSue in #486
- refactor(mcp): introduce McpServerBinding, unify ensure functions, fixethrowaway session by @key4ng in #482
- fix: enforce strict clippy linting across entire workspace by @slin1237 in #489
- docs(grpc-proto): add remote dev workflow for testing proto changes by @CatherineSue in #492
- fix(multimodal): detect data: URIs as DataUrl in tracker by @CatherineSue in #493
- feat(multimodal): add Llama 4 model spec and vision processor registration by @CatherineSue in #494
- feat(grpc): integrate multimodal processing into gRPC sglang chat pipeline by @CatherineSue in #495
- refactor(anthropic): improve SSE event processing and buffer management by @key4ng in #478
- refactor(mcp): decouple tool allowlist from OpenAI protocol types by @key4ng in #488
- fix(pre-commit): exclude Rust files from shebang check by @CatherineSue in #498
- lint: add
clippy::absolute_pathsandsingle_component_path_importslints by @CatherineSue in #499 - fix(ci): auto-trigger benchmark workflows on code changes by @slin1237 in #500
- fix(multimodal): output 4D pixel_values and use i64 aspect_ratios by @CatherineSue in #496
- fix: address PR #489 review follow-ups by @slin1237 in #501
- feat(grpc): add vLLM multimodal support and split pipeline into fetch + preprocess by @CatherineSue in #497
- fix(ci): enable ephemeral runners to prevent dead runner accumulation by @slin1237 in #503
- feat(gprc): add TensorRT-LLM multimodal support by @CatherineSue in #504
- ci(docker): use H200 runner for builds and add nightly GHCR workflow by @slin1237 in #507
- fix(tokenizer): chat template error handling + OpenAI o-series detection by @slin1237 in #506
- fix(data-connector): harden storage backends against data corruption and races by @slin1237 in #505
- feat(scripts): add PyPI proto version check to release version script by @slin1237 in #510
- chore(release): bump workspace versions for v1.1.0 by @slin1237 in #512
New Contributors
- @gongwei-130 made their first contribution in #460
Full Changelog: v1.0.1...v1.1.0