Skip to content

v1.0.1

Choose a tag to compare

@slin1237 slin1237 released this 13 Feb 16:35
· 876 commits to main since this release

πŸŽ‰ Introducing Shepherd Model Gateway v1.0.1!

We're thrilled to announce Shepherd Model Gateway v1.0.1 – formerly SGLang Model Gateway. This major release marks a new chapter with a complete architectural overhaul, new enterprise features, and production-grade improvements!

πŸ‘ Welcome to Shepherd

SGLang Model Gateway is now Shepherd Model Gateway (SMG).

Truly Engine-Agnostic Architecture: Shepherd is your universal gateway supporting all major inference engines – SGLang, vLLM, and TensorRT-LLM – plus complete 3rd party model provider integration including OpenAI, Anthropic, and Gemini. One gateway to route them all.

Universal API Support: Native implementation of Chat Completions, Responses API, Messages API, Interactions API, and Realtime API. Whether you're running open-source models on your infrastructure or routing to cloud providers, Shepherd handles it seamlessly.

Same powerful technology, new identity focused on guiding and managing your entire LLM infrastructure at scale – regardless of where your models run.

✨ Major New Features

⚑ TensorRT-LLM Backend Support - Native gRPC integration for NVIDIA TensorRT-LLM

πŸ”„ vLLM Prefill-Decode-Disaggregation Support
Mooncake and NIXL-based KV transfer for disaggregated inference:

  • Auto-discovery for seamless integration
  • Massive scalability improvements for large deployments
  • Efficient KV cache sharing across workers

🎯 smg serve - Unified Worker Management
New serve subcommand with complete worker lifecycle orchestration:

  • Multi-worker data parallelism with GPU assignment
  • ServeOrchestrator for automated worker management
  • Two-pass argument parsing for flexible configuration
  • One command to rule them all

πŸ€– Anthropic Messages API Support
Full implementation of Anthropic's Messages API with streaming and non-streaming support. Deploy Claude models alongside your existing inference fleet.

πŸ”Œ Industry-First: Universal Built-in Tools via MCP πŸ”₯

Turn any MCP server into built-in tools for all models – an industry-first capability that brings OpenAI-style built-in tools (FileSearch, WebSearch, CodeInterpreter) to every LLM, not just proprietary models.

Complete MCP Orchestration Stack:

  • McpOrchestrator with YAML policy configuration
  • Built-in tool routing infrastructure with qualified names – seamlessly integrate any MCP server as a native capability
  • ResponseFormat transformation pipeline - expose MCP servers as built-in tools (FileSearch, WebSearch, CodeInterpreter, and custom tools)
  • Auth-aware connection pooling for scalable multi-tenant deployments
  • Batch tool execution API for efficient processing
  • Approval system for controlled tool execution
  • Automatic reconnection manager for reliability
  • Graceful shutdown support
  • HTTP header forwarding to MCP servers

Impact: Deploy Llama, Qwen, DeepSeek, or any open-source model with the same built-in tool capabilities as GPT-4. Your infrastructure, your models, OpenAI-grade tooling.

πŸ“‘ Realtime API Foundation
Event types and protocol support for real-time streaming applications.

πŸ—οΈ Architectural Revolution

Workspace Modularization
Complete extraction into standalone, publishable crates:

  • smg-auth - JWT/OIDC authentication
  • smg-mesh - High availability mesh networking
  • smg-mcp - Model Context Protocol orchestration
  • smg-wasm - WebAssembly middleware
  • smg-grpc-client - gRPC client infrastructure
  • smg-grpc-proto - Protocol definitions (published to PyPI!)
  • smg-kv-index - Cache-aware routing engine
  • llm-tokenizer - Tokenization logic
  • llm-multimodal - Multimodal processing
  • openai-protocol - OpenAI API specifications
  • wfaas - Workflow-as-a-Service engine
  • And more...

Result: Faster builds, independent evolution, better maintainability, and easy integration into your own projects.

⚑ Performance Optimizations

Zero-Copy & Algorithm Improvements:

  • Zero-copy multimodal payload handling
  • Aho-Corasick algorithm for stop sequence and special token search
  • WASM Linker reuse across executions
  • Optimized consistent hashing with zero allocations

πŸ› οΈ Production Enhancements

High Availability:

  • Mesh service refactoring and cleanup
  • State synchronization improvements
  • Oracle external auth support for enterprise backends

Observability:

  • Nightly benchmark workflow for comprehensive model performance tracking
  • gRPC vs HTTP comparison benchmarks
  • GetLoads RPC for load metrics

Developer Experience:

  • Comprehensive documentation restructure (concept-centric)
  • Issue templates and PR templates
  • Pre-commit hooks with Ruff + mypy Python linting
  • Automated crate publishing workflows
  • Dependabot integration

Testing Infrastructure:

  • Kubernetes-based CI runners
  • Service containers for Oracle and Brave
  • vLLM and TensorRT-LLM gRPC E2E tests
  • Thread-safe test fixtures with proper resource management

πŸ› Critical Bug Fixes

  • Fixed synthetic "empty" tenant pollution in radix tree
  • Prevented resource leaks causing GPU starvation
  • Fixed STDIO MCP server triggering
  • Aligned multi-server MCP output handling across routers
  • Fixed completion token counting for vLLM harmony streaming
  • Corrected proto definitions (logprobs token_ids uint32)

πŸ“š Documentation

  • Complete restructure from configuration-centric to concept-centric
  • Architecture diagrams and gradient mesh homepage
  • Comprehensive README with features overview
  • Admin API reference
  • Getting started guides

πŸ”§ Tool Parser Support

New model support:

  • Cohere Command models (tool parser + reasoning parser)
  • Qwen Coder (XML format for Qwen3 Coder and MicroThinker)

πŸ”— Repository: https://github.com/lightseekorg/smg

Install now: pip install smg --upgrade

πŸ‘ Shepherd your LLM infrastructure with confidence.

⚑ Built for speed. Engineered for scale. Production-proven.

What's Changed

  • fix: render README images on PyPI/crates.io and bump version to 1.0.1 by Simo Lin
  • fix(ci): fix H200 nightly benchmark model path, worker logs and CUDA errors (#411) by @key4ng in #411
  • fix(ci): use single Python interpreter for Windows/macOS PyPI builds (#418) by @slin1237 in #418
  • chore(mesh): bump smg-mesh version to 1.1.0 (#419) by @slin1237 in #419
  • chore: unify workspace dependency management and bump crate versions (#344) by @slin1237 in #344
  • refactor: remove remaining pub use re-export aliases from lib.rs (#416) by @slin1237 in #416
  • refactor: remove pub use re-export aliases from lib.rs (#413) by @slin1237 in #413
  • refactor(protocols,gateway): redesign worker type hierarchy and consolidate protocol layer (#412) by @slin1237 in #412
  • fix(grpc-proto): bump grpcio minimum to >=1.78.0 (#409) by @CatherineSue in #409
  • chore(ci): increase chat-completions-trtllm timeout to 60 minutes (#408) by @CatherineSue in #408
  • fix(trtllm): tokenize and inject user stop sequences for TRT-LLM requests (#346) by @ppraneth in #346
  • fix(e2e): migrate genai-bench to Docker and fix router pipe hang (#403) by @key4ng in #403
  • chore(deps): update kube requirement from 1.1.0 to 3.0.1 (#397) by @app/dependabot in #397
  • chore(deps): update opentelemetry-proto requirement from 0.27 to 0.31 (#398) by @app/dependabot in #398
  • chore(deps): update ndarray requirement from 0.16 to 0.17 (#394) by @app/dependabot in #394
  • feat: support oracle external auth for oracle backend (#404) by @zhaowenzi in #404
  • fix(grpc-proto): reorder authors in pyproject.toml (#400) by @CatherineSue in #400
  • chore[ci]: upgrade oracle image (#393) by @key4ng in #393
  • chore(e2e): overhaul nightly benchmark summary and trim model list (#392) by @slin1237 in #392
  • feat: Implement ReconnectionManager for automatic MCP server recovery (#265) by @ppraneth in #265
  • perf(multimodal): optimize payload handling with zero-copy (#391) by @ppraneth in #391
  • refactor(mcp): standardize output injection ordering across routers (#388) by @slin1237 in #388
  • ci(grpc): add proto package publishing and codegen checks (#386) by @CatherineSue in #386
  • feat(grpc): add smg-grpc-proto Python package for proto definitions (#385) by @CatherineSue in #385
  • chore(e2e): include model size in gpt-oss nightly benchmark slug (#384) by @CatherineSue in #384
  • refactor(mcp): remove requested_servers and introduce ResponsesCallContext (#382) by @CatherineSue in #382
  • refactor(mcp): use imports instead of fully-qualified paths in McpToolSession (#383) by @CatherineSue in #383
  • e2e: rewrite nightly summary with gRPC vs HTTP comparison (#381) by @slin1237 in #381
  • feat(realtime api): realtime api event types (#349) by @pallasathena92 in #349
  • refactor(mcp): add session-level tool builder convenience methods (#380) by @slin1237 in #380
  • fix+refactor: harden OpenAI router and unify error handling (#379) by @slin1237 in #379
  • docs: add Slack invite links (#377) by @slin1237 in #377
  • docs: add release badge and Discord links (#375) by @slin1237 in #375
  • chore: fix nightly benchmark, prevent execution when pushing to main (#374) by @slin1237 in #374
  • e2e: canonical HF model IDs and nightly benchmark runner split (#373) by @slin1237 in #373
  • mesh: refactor and cleanup mesh code (#281) by @llfl in #281
  • mcp: unify session-mapped tool execution and payload builders (#370) by @slin1237 in #370
  • docs: fix broken getting started link in logging guide (#372) by @slin1237 in #372
  • docs: consolidate onboarding into getting-started and align API docs to code (#371) by @slin1237 in #371
  • refactor(mcp): simplify ensure_request_mcp_client and remove McpLoopConfig (#368) by @CatherineSue in #368
  • chore[ci]: use service containers for Oracle and Brave in E2E tests (#369) by @key4ng in #369
  • fix(ci): Use H100 runner for Specific Tests (#357) by @XinyueZhang369 in #357
  • refactor(mcp): introduce McpToolSession to bundle MCP execution state (#360) by @CatherineSue in #360
  • docs: update logos and fix mobile rendering (#365) by @slin1237 in #365
  • feat: add Anthropic Messages API e2e tests (#358) by @key4ng in #358
  • feat(interactions): Create protocol for interactions api (#336) by @XinyueZhang369 in #336
  • chore(ci): optimize Docker setup for ephemeral K8s pod runners (#348) by @key4ng in #348
  • refactor(routers): reduce IGW router registration boilerplate (#362) by @slin1237 in #362
  • fix(docs): restore gradient text rendering in light mode (#359) by @slin1237 in #359
  • fix(grpc): align MCP multi-server outputs for responses (#277) by @zhaowenzi in #277
  • fix(proto): change logprobs token_ids from int32 to uint32 (#354) by @CatherineSue in #354
  • fix(docs): add light mode support for custom CSS theme (#355) by @slin1237 in #355
  • fix: fail responses tests on missing API keys instead of silently skipping (#343) by @key4ng in #343
  • chore(deps): update criterion requirement from 0.5 to 0.8 (#203) by @app/dependabot in #203
  • docs: fix Rust lint/fmt commands to match CI and pre-commit config (#353) by @CatherineSue in #353
  • chore(deps): update wasm-encoder requirement from 0.242 to 0.244 (#303) by @app/dependabot in #303
  • chore(ci): apply uv for sglang installation (#347) by @key4ng in #347
  • docs: restructure Getting Started section and fix SVG paths (#350) by @slin1237 in #350
  • chore: simplify SGLang CI install with explicit dependencies (#333) by @key4ng in #333
  • fix(mcp): serialize env-mutating proxy config tests (#345) by @CatherineSue in #345
  • refactor(protocols): use serde_with::skip_serializing_none to reduce boilerplate (#342) by @CatherineSue in #342
  • fix(ci): correct working-directory paths in PyPI release workflow (#341) by @CatherineSue in #341
  • chore: add ruff + mypy Python linting with pre-commit and CI (#340) by @CatherineSue in #340
  • chore: enable unused_qualifications lint workspace-wide (#339) by @CatherineSue in #339
  • refactor(grpc): unify shared utilities across harmony and regular routers (#337) by @CatherineSue in #337
  • refactor(grpc): extract duplicated MCP streaming helpers (#334) by @CatherineSue in #334
  • feat(serve): add unified --connection-mode and fix gRPC health checks (#335) by @slin1237 in #335
  • feat(messages api): support streaming (#280) by @key4ng in #280
  • fix(ci): use latest tag for sglang dependency (#332) by @key4ng in #332
  • chore(ci): reduce CI test timeout to surface failures sooner (#328) by @key4ng in #328
  • refactor(ci): split nightly benchmark into per-model jobs (#327) by @slin1237 in #327
  • fix(e2e): release leaked worker refs causing GPU starvation (#326) by @CatherineSue in #326
  • fix(ci): remove ineffective caching from vLLM and SGLang actions (#323) by @CatherineSue in #323
  • chore(deps): bump actions/setup-go from 5 to 6 (#298) by @app/dependabot in #298
  • chore(deps): bump actions/cache from 4 to 5 (#299) by @app/dependabot in #299
  • fix(ci): reduce docker stop grace period for CI cleanup containers (#322) by @CatherineSue in #322
  • fix(ci): add reusable composite actions for backend setup (#319) by @CatherineSue in #319
  • feat(reasoning-parser): add CohereCmdParser for Command models (#317) by @slin1237 in #317
  • fix(e2e): thread-aware caching for setup_backend fixture (#321) by @CatherineSue in #321
  • feat(vllm-pd): add Mooncake KV transfer support with auto-discovery (#312) by @slin1237 in #312
  • test: Update the H200 benchmark schedule to run weekly. (#318) by @key4ng in #318
  • fix(logging): add --log-json to python cli (#316) by @zhaowenzi in #316
  • feat(tool-parser): add Cohere Command model tool call parser (#315) by @slin1237 in #315
  • fix(e2e): prevent flaky tests from port races and resource leaks (#314) by @slin1237 in #314
  • feat(ci): add nightly benchmark workflow for comprehensive model performance tracking (#231) by @key4ng in #231
  • fix(logging): fix --log-json and Python binding support (#310) by @zhaowenzi in #310
  • fix(e2e): vLLM PD worker tracking and test infrastructure improvements (#311) by @slin1237 in #311
  • fix(grpc): correct completion_tokens counting for vLLM harmony streaming (#282) by @key4ng in #282
  • refactor: rename SGLang Model Gateway to Shepherd Model Gateway (#297) by @slin1237 in #297
  • feat(e2e): add TensorRT-LLM gRPC backend support for CI (#274) by @slin1237 in #274
  • docs: add vLLM PD disaggregation support (#294) by @slin1237 in #294
  • feat(grpc): add vLLM PD disaggregation support via NIXL (#293) by @slin1237 in #293
  • fix(ci): Move to K8s Cpu Runners (#279) by @XinyueZhang369 in #279
  • fix(mcp): stdio MCP servers cannot get triggered (#273) by @xuwenyihust in #273
  • feat(serve): add GPU assignment for multi-worker DP (#272) by @slin1237 in #272
  • feat(serve): add ServeOrchestrator for worker lifecycle management (#271) by @slin1237 in #271
  • feat(cli): add smg serve subcommand with two-pass arg parsing (#270) by @slin1237 in #270
  • fix(ci): migrate benchmark-radix-tree to k8s gpu runner (#269) by @slin1237 in #269
  • fix: critical correctness and reliability bugs across 12 crates (#254) by @slin1237 in #254
  • fix(kv-index): prevent synthetic "empty" tenant from polluting tree (#268) by @slin1237 in #268
  • fix(ci): speed up chat-completions CI jobs (#267) by @CatherineSue in #267
  • fix(ci): use standard install scripts for go-bindings-benchmark (#264) by @slin1237 in #264
  • docs(README): resolve broken doc links (#256) by @xuwenyihust in #256
  • fix(ci): add apt-get update before SGLang dependency install (#255) by @slin1237 in #255
  • chore(bindings): rename SGLang to Shepherd in Python CLI (#247) by @xuwenyihust in #247
  • feat(messages api): support basic non streaming (#230) by @key4ng in #230
  • test(go-bindings): expand E2E test coverage and add performance benchmarks (#206) by @slin1237 in #206
  • fix(mcp): align OpenAI router streaming multi-server handling (#209) by @zhaowenzi in #209
  • run test workflow to cluster (#192) by @XinyueZhang369 in #192
  • feat: Add Messages API foundation (#227) by @key4ng in #227
  • feat: implement graceful shutdown for MCP (#228) by @ppraneth in #228
  • refactor(proto): Use uint32 for n, max_tokens, min_tokens in SGLang/TRT-LLM protos (#226) by @CatherineSue in #226
  • ci: add vLLM gRPC e2e tests (#158) by @key4ng in #158
  • refactor(grpc): Centralize request building dispatch in GrpcClient (#219) by @CatherineSue in #219
  • refactor(grpc): Standardize token counts to uint32 across all protos (#218) by @CatherineSue in #218
  • [model-gateway] Reuse WASM Linker across executions (#211) by @ppraneth in #211
  • feat(harmony): Enable vLLM and TensorRT-LLM gRPC backend support (#217) by @CatherineSue in #217
  • feat(grpc): Add TensorRT-LLM logprobs support in proto_wrapper (#216) by @CatherineSue in #216
  • feat(grpc): Add vLLM logprobs and n>1 sampling support (#215) by @CatherineSue in #215
  • chore(deps): update tiktoken-rs requirement from 0.7.0 to 0.9.1 (#201) by @app/dependabot in #201
  • docs: Add missing e2e-test.sh script (#214) by @CatherineSue in #214
  • feat: Rename TensorRT-LLM gRPC client from TrtLlmEngine to TrtllmService (#213) by @CatherineSue in #213
  • fix clippy warnings by Kun(llfl)
  • refactor: mesh service clean up by Kun(llfl)
  • refactor: mesh lib exports clean up by Kun(llfl)
  • chore(deps): bump actions/setup-go from 5 to 6 (#197) by @app/dependabot in #197
  • chore: add zoey to workflow and e2e owner, add yanbo to golang owner (#207) by @slin1237 in #207
  • feat(grpc/harmony): Add logprobs support for Harmony models (#205) by @CatherineSue in #205
  • refactor(go-bindings): eliminate code duplication and consolidate shared modules (#204) by @slin1237 in #204
  • chore(deps): bump actions/labeler from 5 to 6 (#198) by @app/dependabot in #198
  • fix(mcp): align streamable transport headers with SSE to forward custom headers consistently (#196) by @zhaowenzi in #196
  • feat(bindings): add safety docs and include bindings in workspace (#195) by @slin1237 in #195
  • ci(go-bindings): add dedicated CI job for Go bindings testing (#189) by @slin1237 in #189
  • feat(grpc): Add TensorRT-LLM Backend Support (#194) by @CatherineSue in #194
  • Fix chat_template format detection by Chang Su
  • fix(mcp): use headers from request payload instead of HTTP headers (#191) by @slin1237 in #191
  • perf(tokenizer): optimize stop sequence search using Aho-Corasick algorithm (#190) by @slin1237 in #190
  • chore(repo): add GitHub issue templates (#188) by @slin1237 in #188
  • fix(streaming): complete SSE event emission for all built-in tool types (#187) by @slin1237 in #187
  • docs(reasoning-parser): add comprehensive README (#186) by @slin1237 in #186
  • docs(wfaas): add comprehensive README for workflow engine crate (#185) by @slin1237 in #185
  • fix(streaming): correct SSE event emission for built-in tools and remove dead code (#184) by @slin1237 in #184
  • feat(mcp): complete router integration for built-in tool routing (#182) by @slin1237 in #182
  • [bugfix] fix ci dep script path (#183) by @slin1237 in #183
  • docs: gradient mesh homepage design with animations (#180) by @slin1237 in #180
  • feat(mcp): implement built-in tool routing infrastructure (#181) by @slin1237 in #181
  • fix(mcp): OpenAI router non-streaming MCP outputs for multiple servers. (#157) by @zhaowenzi in #157
  • feat(mcp): implement ResponseFormat transformation in tool execution pipeline (#178) by @slin1237 in #178
  • fix(docs): correct alignment issues in MCP architecture SVG (#175) by @slin1237 in #175
  • docs: restructure documentation from configuration-centric to concept-centric architecture (#174) by @slin1237 in #174
  • feat(mcp): forward HTTP request headers to MCP servers (#155) by @slin1237 in #155
  • feat(mcp): implement auth-aware connection pooling and code quality improvements (#154) by @slin1237 in #154
  • feat(mcp): add batch tool execution API to McpOrchestrator (#153) by @slin1237 in #153
  • ci: remove docker-build-test from PR workflow (#152) by @slin1237 in #152
  • refactor(mcp): migrate from McpManager to McpOrchestrator (#151) by @slin1237 in #151
  • feat(mcp): implement McpOrchestrator with YAML policy configuration (#149) by @slin1237 in #149
  • feat(mcp): implement ResponseFormat and ResponseTransformer (#146) by @slin1237 in #146
  • refactor(mcp): production hardening - remove dead code and optimize allocations (#142) by @slin1237 in #142
  • feat(mcp): reorganize crate structure and implement approval system (#140) by @slin1237 in #140
  • chore(deps): update wasmtime-wasi requirement from 38.0 to 41.0 (#134) by @app/dependabot in #134
  • Update placeholder in PR template (#141) by @CatherineSue in #141
  • doc: Add PR template (#139) by @CatherineSue in #139
  • chore(deps): bump actions/upload-artifact from 4 to 6 (#132) by @app/dependabot in #132
  • chore(deps): update metrics-exporter-prometheus requirement from 0.17.0 to 0.18.1 (#137) by @app/dependabot in #137
  • chore(deps): bump actions/checkout from 4 to 6 (#129) by @app/dependabot in #129
  • chore(deps): bump actions/upload-pages-artifact from 3 to 4 (#128) by @app/dependabot in #128
  • chore(deps): bump actions/setup-python from 5 to 6 (#131) by @app/dependabot in #131
  • chore(deps): bump actions/download-artifact from 4 to 7 (#130) by @app/dependabot in #130
  • feat(mcp): implement qualified tool names for collision handling (#110) by @slin1237 in #110
  • fix(mcp): pass raw auth token to rmcp transport (#117) by @zhaowenzi in #117
  • ci: add Dependabot and centralize CI gate logic (#127) by @slin1237 in #127
  • fix: use crates.io version for openai-harmony dependency (#126) by @slin1237 in #126
  • fix: handle already-published crates gracefully and set smg to v0.4.0 (#125) by @slin1237 in #125
  • ci: consolidate crate publishing into single tiered workflow (#124) by @slin1237 in #124
  • fix: rename auth to smg-auth and add protoc to publish workflow (#123) by @slin1237 in #123
  • fix: add version numbers to path dependencies for crates.io publishing (#122) by @slin1237 in #122
  • fix: use correct rust-toolchain action name (#121) by @slin1237 in #121
  • chore: prepare v1.0.0 release (#120) by @slin1237 in #120
  • ci: add automated crate publishing workflows (#118) by @slin1237 in #118
  • [misc] update code owner (#119) by @slin1237 in #119
  • fix: remove broken cli.md references in docs (#116) by @slin1237 in #116
  • ci: re-enable GitHub Pages deployment workflow (#115) by @slin1237 in #115
  • refactor: rename tokenizer and workflow packages (#113) by @slin1237 in #113
  • feat: add new logo assets and update branding (#111) by @slin1237 in #111
  • Update labeler to match the latest file tree (#112) by @CatherineSue in #112
  • refactor: extract main binary to model_gateway/ workspace crate (#109) by @slin1237 in #109
  • docs: Add admin API reference documentation (#108) by @slin1237 in #108
  • docs: Improve README with features overview and organization (#107) by @slin1237 in #107
  • docs: Comprehensive configuration documentation and architecture diagrams (#106) by @slin1237 in #106
  • docs(readme): Restructure for clarity and conciseness (#83) by @slin1237 in #83
  • [docs] disable github doc page deployment since repo is private (#71) by @slin1237 in #71
  • refactor: Rename sgl-model-gateway to smg across entire codebase (#70) by @slin1237 in #70
  • refactor: Rename Python bindings from sglang_router to smg (#69) by @slin1237 in #69
  • refactor: Extract grpc_client into standalone smg-grpc-client workspace crate (#68) by @slin1237 in #68
  • Add architecture diagram SVG (#67) by @slin1237 in #67
  • chore: Update CODEOWNERS for extracted workspace crates (#66) by @slin1237 in #66
  • refactor: Rename mcp package from 'mcp' to 'smg-mcp' (#65) by @slin1237 in #65
  • refactor: Extract mesh module into standalone smg-mesh workspace crate (#64) by @slin1237 in #64
  • Add site/ to gitignore by Simo Lin
  • Add MkDocs Material documentation by Simo Lin
  • refactor: Extract wasm module into standalone smg-wasm workspace crate (#62) by @slin1237 in #62
  • fix: clippy multi modal code import error (#61) by @slin1237 in #61
  • fix: remove dead code (#60) by @slin1237 in #60
  • chore: Standardize package metadata across all workspace crates (#59) by @slin1237 in #59
  • refactor: Migrate cache_aware policy from tree.rs to kv_index crate (#58) by @slin1237 in #58
  • Fix golang bindings build and multimodal crate (#57) by @slin1237 in #57
  • Extract multimodal module into standalone llm-multimodal crate (#56) by @slin1237 in #56
  • Extract data_connector to workspace crate and standardize folder naming (#55) by @slin1237 in #55
  • rename: kv-index to kv_index to follow Rust naming conventions (#54) by @slin1237 in #54
  • refactor: Extract kv-index crate and modularize CI benchmark workflows (#53) by @slin1237 in #53
  • refactor: Extract MCP module into standalone workspace crate (#52) by @slin1237 in #52
  • Consolidate workspace dependencies across all crates (#50) by @slin1237 in #50
  • Reduce benchmark CI overhead (#51) by @slin1237 in #51
  • refactor: Extract auth into standalone smg-auth workspace crate (#49) by @slin1237 in #49
  • Use request-scoped MCP clients for Responses API tools (#43) by @zhaowenzi in #43
  • refactor: Extract tokenizer into standalone llm-tokenizer workspace crate (#48) by @slin1237 in #48
  • refactor: Extract workflow into standalone wfaas workspace crate (#47) by @slin1237 in #47
  • refactor: Extract tool-parser into standalone workspace crate (#46) by @slin1237 in #46
  • refactor: Extract reasoning-parser into standalone workspace crate (#44) by @slin1237 in #44
  • Extract protocols as openai-protocol workspace crate (#45) by @slin1237 in #45
  • Add ci benchmark workflow (#37) by @key4ng in #37
  • add release-pypi.yml (#41) by @key4ng in #41
  • Fix test_state_synchronization by Chang Su
  • Fix empty tenant issue in tree.rs by Chang Su
  • Setup auto labeler (#40) by @CatherineSue in #40
  • refactor: Consolidate "unknown" model id usage (#39) by @CatherineSue in #39
  • refactor: unify registration workflow and duplication check (#38) by @CatherineSue in #38
  • Add ci workflow[wip] (#36) by @key4ng in #36
  • Add .gitignore (#34) by @key4ng in #34
  • Add Code owner (#35) by @key4ng in #35
  • Add pre-commit hooks and fix clippy warnings (#27) by @key4ng in #27
  • [smg] release 0.3.2 (#17168) by Simo Lin in https://github.com/lightseekorg/smg/pull/17168

New Contributors

Full Changelog: 6caca5b...v1.0.1