Release v1.1.0 · lightseekorg/smg

🚀 Shepherd Model Gateway v1.1.0 Released!

We're excited to announce Shepherd Model Gateway v1.1.0 – a major feature release bringing universal multimodal support, Messages API MCP integration, and critical production hardening across the entire stack!

🎨 Universal Multimodal Support 🔥

Industry-leading multimodal processing across all major inference engines:

SGLang gRPC - Full multimodal pipeline with vision processing
vLLM gRPC - Fetch + preprocess pipeline with multimodal support
TensorRT-LLM gRPC - Complete multimodal integration
Llama 4 Vision - First-class support with model spec and processor registration

Impact: Deploy vision-language models across SGLang, vLLM, and TensorRT-LLM with unified processing. Data URI detection, 4D pixel values, and i64 aspect ratios for production-grade image handling.

🔌 Messages API Gets MCP

Complete MCP tool integration for Anthropic Messages API:

Streaming and non-streaming MCP tool use
Unified tool allowlist enforcement across OpenAI and gRPC routers
Server binding architecture with session lifecycle management
Built-in server filtering from tool listings
Unique server_label requirements for tool collision prevention

E2E tested with comprehensive MCP tool use coverage.

✨ Major New Features

🌐 vLLM HTTP Backend Support
Auto-detection and support for vLLM HTTP endpoints via DetectBackendStep – seamlessly switch between gRPC and HTTP workers.

🎯 smg serve Engine Args Pass-through
Pass arbitrary engine-specific arguments directly through smg serve to your inference engines. Maximum flexibility for custom configurations.

🧠 Tiktoken Hub Model Support
Unified chat template API with tiktoken hub integration. Improved OpenAI o-series model detection and error handling.

🔍 NanoV3 Reasoning Parser
Native support for Nemotron Nano V3 reasoning output parsing.

🎨 Startup Banner
Beautiful braille art shepherd motif on startup – because production systems deserve aesthetics.

⚡ Performance Optimizations

WASM Runtime Enhancements:

Optimized component cache lookup
Reduced per-request cloning overhead
SHA-256 cache keys for efficient middleware

🛡️ Critical Production Hardening

Responses API Fixes:

Fixed data loss and panic risks
Sanitized upstream error bodies
Improved structural integrity

Middleware Reliability:

Fixed extension loss in request pipeline
Eliminated auth timing leak
Corrected streaming body buffering

Tokenizer Robustness:

Cache correctness fixes
Streaming reliability improvements
Chat template error handling

Data Connector Hardening:

Eliminated deadlock, block_on, and triple pool bugs
Storage backend protection against data corruption
Race condition fixes

Concurrency Safety:

Fixed tokio mutex release before awaiting in LoadMonitor
Improved SSE event processing and buffer management

Multimodal Correctness:

Proper data URI detection
4D pixel_values output
i64 aspect_ratios for large images

🏗️ Architectural Improvements

Worker Infrastructure:

Consolidated DPAwareWorker into BasicWorker
Moved DP fields to WorkerSpec
Unified worker metadata discovery
Cleaner registration workflow

Code Quality:

Enforced strict clippy linting workspace-wide
Added clippy::absolute_paths and single_component_path_imports lints
Improved error handling across all modules

🔧 Developer Experience

CI/DevOps:

DCO check with probot app
Mergify automation for PR management
Branch naming enforcement
Docker image release workflow
Auto-trigger benchmark workflows on code changes

Tooling:

Workspace version checker script
PyPI proto version validation
Remote dev workflow for proto testing

Python Support:

Lowered minimum Python version from 3.12 to 3.9

🐛 Bug Fixes

Fixed worker health config, bootstrap parsing, and model card cloning issues
Improved serve CLI arg filtering and config error handling
Better pre-commit hook configuration
Corrected labeler workflow for fork PRs

📚 Interactions API

Added comprehensive validations for the Interactions API protocol.

🔗 Full Changelog: v1.0.1...v1.1.0

Upgrade now: pip install smg --upgrade

🐑 Shepherd your LLM infrastructure with confidence.

⚡ Built for speed. Engineered for scale. Production-proven.

What's Changed

fix: render README images on PyPI/crates.io and bump version to 1.0.1 by @slin1237 in #420
chore(ci): Change nightly benchmark schedule to midnight PST by @key4ng in #422
ci: add DCO check, Mergify automation, and branch naming enforcement by @CatherineSue in #424
ci: temporarily disable auto-close for branch naming violations by @CatherineSue in #426
ci: add needs-rebase label management to Mergify by @CatherineSue in #427
fix(ci): use correct Mergify syntax for negated regex condition by @CatherineSue in #429
ci: improve label management with router-specific and feature labels by @CatherineSue in #428
ci: add Docker image release workflow by @slin1237 in #431
feat(message api): MCP tool use with streaming and non-streaming support by @key4ng in #352
refactor(core): consolidate DPAwareWorker into BasicWorker by @slin1237 in #434
fix: pre-existing issues in worker health config, bootstrap parsing, and model card cloning by @slin1237 in #415
refactor(core): move DP fields to WorkerSpec and remove default_model_type by @slin1237 in #436
chore: fix main log by @slin1237 in #437
test(e2e): add MCP tool use tests for Anthropic Messages API by @key4ng in #433
feat(core): add DetectBackendStep for vLLM HTTP support by @slin1237 in #438
feat(tokenizer): add tiktoken hub model support and unify chat template API by @slin1237 in #439
perf(wasm): optimize WASM component cache lookup and reduce per-request cloning by @ppraneth in #440
refactor(core): unify worker metadata discovery and clean up registration by @slin1237 in #447
feat(version): add startup banner with braille art shepherd motif by @slin1237 in #448
fix(python): lower minimum Python version from 3.12 to 3.9 by @slin1237 in #449
ci(mergify): enable auto-close for non-conforming branch names by @CatherineSue in #454
ci(mergify): allow multi-segment branch names for dependabot by @CatherineSue in #456
ci(dco): switch DCO check from GitHub Actions to probot DCO app by @CatherineSue in #462
fix(openai): fix data loss, panic risk, and structural issues in Responses API by @slin1237 in #468
fix(mcp): filter builtin servers from mcp_list_tools output by @key4ng in #450
feat(interactions): Add validations for interactions api by @XinyueZhang369 in #399
feat(mcp): enforce allowed_tools filtering across openai and grpc routers by @zhaowenzi in #467
fix(openai): sanitize upstream error bodies in Responses API by @slin1237 in #473
fix(middleware): fix extension loss, auth timing leak, and streaming body buffering by @slin1237 in #472
fix(concurrency): release tokio mutex before awaiting task in LoadMonitor::stop() by @slin1237 in #475
fix(tokenizer): correctness and robustness fixes for cache and streaming by @slin1237 in #474
fix(ci): use pull_request_target for labeler to support fork PRs by @CatherineSue in #477
fix(data-connector): fix deadlock, block_on, triple pool, and DDL type bugs by @slin1237 in #471
feat(reasoning-parser): add NanoV3 reasoning parser by @slin1237 in #480
refactor(anthropic): simplify worker lifecycle in Anthropic router by @key4ng in #476
feat(realtime api): realtime api session and transcription_session protocols by @pallasathena92 in #364
fix(protocols): require unique server_label for MCP tools by @zhaowenzi in #479
feat: smg serve pass through engine args to engine by @gongwei-130 in #460
fix(serve): harden CLI arg filtering and config error handling by @slin1237 in #483
feat(scripts): replace release notes generator with workspace version checker by @slin1237 in #484
fix(ci): match probot DCO app check name in Mergify rule by @CatherineSue in #486
refactor(mcp): introduce McpServerBinding, unify ensure functions, fixethrowaway session by @key4ng in #482
fix: enforce strict clippy linting across entire workspace by @slin1237 in #489
docs(grpc-proto): add remote dev workflow for testing proto changes by @CatherineSue in #492
fix(multimodal): detect data: URIs as DataUrl in tracker by @CatherineSue in #493
feat(multimodal): add Llama 4 model spec and vision processor registration by @CatherineSue in #494
feat(grpc): integrate multimodal processing into gRPC sglang chat pipeline by @CatherineSue in #495
refactor(anthropic): improve SSE event processing and buffer management by @key4ng in #478
refactor(mcp): decouple tool allowlist from OpenAI protocol types by @key4ng in #488
fix(pre-commit): exclude Rust files from shebang check by @CatherineSue in #498
lint: add clippy::absolute_paths and single_component_path_imports lints by @CatherineSue in #499
fix(ci): auto-trigger benchmark workflows on code changes by @slin1237 in #500
fix(multimodal): output 4D pixel_values and use i64 aspect_ratios by @CatherineSue in #496
fix: address PR #489 review follow-ups by @slin1237 in #501
feat(grpc): add vLLM multimodal support and split pipeline into fetch + preprocess by @CatherineSue in #497
fix(ci): enable ephemeral runners to prevent dead runner accumulation by @slin1237 in #503
feat(gprc): add TensorRT-LLM multimodal support by @CatherineSue in #504
ci(docker): use H200 runner for builds and add nightly GHCR workflow by @slin1237 in #507
fix(tokenizer): chat template error handling + OpenAI o-series detection by @slin1237 in #506
fix(data-connector): harden storage backends against data corruption and races by @slin1237 in #505
feat(scripts): add PyPI proto version check to release version script by @slin1237 in #510
chore(release): bump workspace versions for v1.1.0 by @slin1237 in #512

New Contributors

@gongwei-130 made their first contribution in #460

Full Changelog: v1.0.1...v1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0

Choose a tag to compare

Sorry, something went wrong.