Skip to content

v1.1.0

Choose a tag to compare

@slin1237 slin1237 released this 23 Feb 06:40
· 814 commits to main since this release
b6f9bb5

πŸš€ Shepherd Model Gateway v1.1.0 Released!

We're excited to announce Shepherd Model Gateway v1.1.0 – a major feature release bringing universal multimodal support, Messages API MCP integration, and critical production hardening across the entire stack!

🎨 Universal Multimodal Support πŸ”₯

Industry-leading multimodal processing across all major inference engines:

  • SGLang gRPC - Full multimodal pipeline with vision processing
  • vLLM gRPC - Fetch + preprocess pipeline with multimodal support
  • TensorRT-LLM gRPC - Complete multimodal integration
  • Llama 4 Vision - First-class support with model spec and processor registration

Impact: Deploy vision-language models across SGLang, vLLM, and TensorRT-LLM with unified processing. Data URI detection, 4D pixel values, and i64 aspect ratios for production-grade image handling.

πŸ”Œ Messages API Gets MCP

Complete MCP tool integration for Anthropic Messages API:

  • Streaming and non-streaming MCP tool use
  • Unified tool allowlist enforcement across OpenAI and gRPC routers
  • Server binding architecture with session lifecycle management
  • Built-in server filtering from tool listings
  • Unique server_label requirements for tool collision prevention

E2E tested with comprehensive MCP tool use coverage.

✨ Major New Features

🌐 vLLM HTTP Backend Support
Auto-detection and support for vLLM HTTP endpoints via DetectBackendStep – seamlessly switch between gRPC and HTTP workers.

🎯 smg serve Engine Args Pass-through
Pass arbitrary engine-specific arguments directly through smg serve to your inference engines. Maximum flexibility for custom configurations.

🧠 Tiktoken Hub Model Support
Unified chat template API with tiktoken hub integration. Improved OpenAI o-series model detection and error handling.

πŸ” NanoV3 Reasoning Parser
Native support for Nemotron Nano V3 reasoning output parsing.

🎨 Startup Banner
Beautiful braille art shepherd motif on startup – because production systems deserve aesthetics.

⚑ Performance Optimizations

WASM Runtime Enhancements:

  • Optimized component cache lookup
  • Reduced per-request cloning overhead
  • SHA-256 cache keys for efficient middleware

πŸ›‘οΈ Critical Production Hardening

Responses API Fixes:

  • Fixed data loss and panic risks
  • Sanitized upstream error bodies
  • Improved structural integrity

Middleware Reliability:

  • Fixed extension loss in request pipeline
  • Eliminated auth timing leak
  • Corrected streaming body buffering

Tokenizer Robustness:

  • Cache correctness fixes
  • Streaming reliability improvements
  • Chat template error handling

Data Connector Hardening:

  • Eliminated deadlock, block_on, and triple pool bugs
  • Storage backend protection against data corruption
  • Race condition fixes

Concurrency Safety:

  • Fixed tokio mutex release before awaiting in LoadMonitor
  • Improved SSE event processing and buffer management

Multimodal Correctness:

  • Proper data URI detection
  • 4D pixel_values output
  • i64 aspect_ratios for large images

πŸ—οΈ Architectural Improvements

Worker Infrastructure:

  • Consolidated DPAwareWorker into BasicWorker
  • Moved DP fields to WorkerSpec
  • Unified worker metadata discovery
  • Cleaner registration workflow

Code Quality:

  • Enforced strict clippy linting workspace-wide
  • Added clippy::absolute_paths and single_component_path_imports lints
  • Improved error handling across all modules

πŸ”§ Developer Experience

CI/DevOps:

  • DCO check with probot app
  • Mergify automation for PR management
  • Branch naming enforcement
  • Docker image release workflow
  • Auto-trigger benchmark workflows on code changes

Tooling:

  • Workspace version checker script
  • PyPI proto version validation
  • Remote dev workflow for proto testing

Python Support:

  • Lowered minimum Python version from 3.12 to 3.9

πŸ› Bug Fixes

  • Fixed worker health config, bootstrap parsing, and model card cloning issues
  • Improved serve CLI arg filtering and config error handling
  • Better pre-commit hook configuration
  • Corrected labeler workflow for fork PRs

πŸ“š Interactions API

Added comprehensive validations for the Interactions API protocol.

πŸ”— Full Changelog: v1.0.1...v1.1.0

Upgrade now: pip install smg --upgrade

πŸ‘ Shepherd your LLM infrastructure with confidence.

⚑ Built for speed. Engineered for scale. Production-proven.

What's Changed

  • fix: render README images on PyPI/crates.io and bump version to 1.0.1 by @slin1237 in #420
  • chore(ci): Change nightly benchmark schedule to midnight PST by @key4ng in #422
  • ci: add DCO check, Mergify automation, and branch naming enforcement by @CatherineSue in #424
  • ci: temporarily disable auto-close for branch naming violations by @CatherineSue in #426
  • ci: add needs-rebase label management to Mergify by @CatherineSue in #427
  • fix(ci): use correct Mergify syntax for negated regex condition by @CatherineSue in #429
  • ci: improve label management with router-specific and feature labels by @CatherineSue in #428
  • ci: add Docker image release workflow by @slin1237 in #431
  • feat(message api): MCP tool use with streaming and non-streaming support by @key4ng in #352
  • refactor(core): consolidate DPAwareWorker into BasicWorker by @slin1237 in #434
  • fix: pre-existing issues in worker health config, bootstrap parsing, and model card cloning by @slin1237 in #415
  • refactor(core): move DP fields to WorkerSpec and remove default_model_type by @slin1237 in #436
  • chore: fix main log by @slin1237 in #437
  • test(e2e): add MCP tool use tests for Anthropic Messages API by @key4ng in #433
  • feat(core): add DetectBackendStep for vLLM HTTP support by @slin1237 in #438
  • feat(tokenizer): add tiktoken hub model support and unify chat template API by @slin1237 in #439
  • perf(wasm): optimize WASM component cache lookup and reduce per-request cloning by @ppraneth in #440
  • refactor(core): unify worker metadata discovery and clean up registration by @slin1237 in #447
  • feat(version): add startup banner with braille art shepherd motif by @slin1237 in #448
  • fix(python): lower minimum Python version from 3.12 to 3.9 by @slin1237 in #449
  • ci(mergify): enable auto-close for non-conforming branch names by @CatherineSue in #454
  • ci(mergify): allow multi-segment branch names for dependabot by @CatherineSue in #456
  • ci(dco): switch DCO check from GitHub Actions to probot DCO app by @CatherineSue in #462
  • fix(openai): fix data loss, panic risk, and structural issues in Responses API by @slin1237 in #468
  • fix(mcp): filter builtin servers from mcp_list_tools output by @key4ng in #450
  • feat(interactions): Add validations for interactions api by @XinyueZhang369 in #399
  • feat(mcp): enforce allowed_tools filtering across openai and grpc routers by @zhaowenzi in #467
  • fix(openai): sanitize upstream error bodies in Responses API by @slin1237 in #473
  • fix(middleware): fix extension loss, auth timing leak, and streaming body buffering by @slin1237 in #472
  • fix(concurrency): release tokio mutex before awaiting task in LoadMonitor::stop() by @slin1237 in #475
  • fix(tokenizer): correctness and robustness fixes for cache and streaming by @slin1237 in #474
  • fix(ci): use pull_request_target for labeler to support fork PRs by @CatherineSue in #477
  • fix(data-connector): fix deadlock, block_on, triple pool, and DDL type bugs by @slin1237 in #471
  • feat(reasoning-parser): add NanoV3 reasoning parser by @slin1237 in #480
  • refactor(anthropic): simplify worker lifecycle in Anthropic router by @key4ng in #476
  • feat(realtime api): realtime api session and transcription_session protocols by @pallasathena92 in #364
  • fix(protocols): require unique server_label for MCP tools by @zhaowenzi in #479
  • feat: smg serve pass through engine args to engine by @gongwei-130 in #460
  • fix(serve): harden CLI arg filtering and config error handling by @slin1237 in #483
  • feat(scripts): replace release notes generator with workspace version checker by @slin1237 in #484
  • fix(ci): match probot DCO app check name in Mergify rule by @CatherineSue in #486
  • refactor(mcp): introduce McpServerBinding, unify ensure functions, fixethrowaway session by @key4ng in #482
  • fix: enforce strict clippy linting across entire workspace by @slin1237 in #489
  • docs(grpc-proto): add remote dev workflow for testing proto changes by @CatherineSue in #492
  • fix(multimodal): detect data: URIs as DataUrl in tracker by @CatherineSue in #493
  • feat(multimodal): add Llama 4 model spec and vision processor registration by @CatherineSue in #494
  • feat(grpc): integrate multimodal processing into gRPC sglang chat pipeline by @CatherineSue in #495
  • refactor(anthropic): improve SSE event processing and buffer management by @key4ng in #478
  • refactor(mcp): decouple tool allowlist from OpenAI protocol types by @key4ng in #488
  • fix(pre-commit): exclude Rust files from shebang check by @CatherineSue in #498
  • lint: add clippy::absolute_paths and single_component_path_imports lints by @CatherineSue in #499
  • fix(ci): auto-trigger benchmark workflows on code changes by @slin1237 in #500
  • fix(multimodal): output 4D pixel_values and use i64 aspect_ratios by @CatherineSue in #496
  • fix: address PR #489 review follow-ups by @slin1237 in #501
  • feat(grpc): add vLLM multimodal support and split pipeline into fetch + preprocess by @CatherineSue in #497
  • fix(ci): enable ephemeral runners to prevent dead runner accumulation by @slin1237 in #503
  • feat(gprc): add TensorRT-LLM multimodal support by @CatherineSue in #504
  • ci(docker): use H200 runner for builds and add nightly GHCR workflow by @slin1237 in #507
  • fix(tokenizer): chat template error handling + OpenAI o-series detection by @slin1237 in #506
  • fix(data-connector): harden storage backends against data corruption and races by @slin1237 in #505
  • feat(scripts): add PyPI proto version check to release version script by @slin1237 in #510
  • chore(release): bump workspace versions for v1.1.0 by @slin1237 in #512

New Contributors

Full Changelog: v1.0.1...v1.1.0