Skip to content

Fix router#69

Merged
tastelikefeet merged 9 commits intomainfrom
fix/routing-0215
Feb 16, 2026
Merged

Fix router#69
tastelikefeet merged 9 commits intomainfrom
fix/routing-0215

Conversation

@tastelikefeet
Copy link
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tastelikefeet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the routing and request handling mechanisms within the server, specifically for multiplexed model serving. It standardizes the request identification header from X-Ray-Serve-Request-Id to serve_multiplexed_model_id across the client, server, and documentation. Additionally, it integrates new multiplexing logic for managing adapters and processors, and updates the list of explicitly supported models in the Tinker server. These changes aim to improve the robustness and clarity of the routing infrastructure.

Highlights

  • Header Renaming and Standardization: The request identification header X-Ray-Serve-Request-Id has been consistently renamed to serve_multiplexed_model_id across documentation, server-side validation, logging, and client-side header injection to standardize its usage.
  • Multiplexed Adapter/Processor Integration: New get_multiplexed_adapter methods, decorated with @serve.multiplexed, were introduced in tinker/model.py, twinkle/model.py, and twinkle/processor.py. These methods are now called during adapter and processor creation, integrating with a multiplexed serving framework.
  • Model Support List Update: The list of explicitly supported models in src/twinkle/server/tinker/server.py has been streamlined, removing several Qwen/Qwen2.5 models and retaining only Qwen/Qwen3-30B-A3B-Instruct-2507.
  • Asynchronous Function Updates: Several functions, including add_adapter_to_model in twinkle/model.py and create in twinkle/processor.py, have been updated to be asynchronous (async def) to accommodate the new multiplexing logic.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/source_en/Usage Guide/Server and Client/Tinker-Compatible-Client.md
    • Updated the name of the request header from X-Ray-Serve-Request-Id to serve_multiplexed_model_id in the English documentation.
  • docs/source_zh/使用指引/服务端和客户端/Tinker兼容客户端.md
    • Updated the name of the request header from X-Ray-Serve-Request-Id to serve_multiplexed_model_id in the Chinese documentation.
  • src/twinkle/server/tinker/model.py
    • Added a new get_multiplexed_adapter async method decorated with @serve.multiplexed.
    • Called get_multiplexed_adapter within the _create_adapter logic.
  • src/twinkle/server/tinker/server.py
    • Removed several Qwen/Qwen2.5 models from the default supported_models list.
    • Updated logging to use serve_multiplexed_model_id instead of x-ray-serve-request-id.
  • src/twinkle/server/twinkle/model.py
    • Added a new get_multiplexed_adapter async method decorated with @serve.multiplexed.
    • Changed add_adapter_to_model to an async function.
    • Called get_multiplexed_adapter within add_adapter_to_model.
  • src/twinkle/server/twinkle/processor.py
    • Added a new get_multiplexed_adapter async method decorated with @serve.multiplexed.
    • Changed create to an async function.
    • Called get_multiplexed_adapter within the create method.
  • src/twinkle/server/utils/validation.py
    • Updated the expected header name from X-Ray-Serve-Request-Id to serve_multiplexed_model_id for request validation.
    • Modified error messages to reflect the new header name.
  • src/twinkle_client/init.py
    • Updated the default header name for init_tinker_compat_client from X-Ray-Serve-Request-Id to serve_multiplexed_model_id.
  • src/twinkle_client/http/http_utils.py
    • Updated the request ID header in HTTP utility functions from X-Ray-Serve-Request-Id to serve_multiplexed_model_id.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on integrating Ray Serve's model multiplexing feature by replacing the X-Ray-Serve-Request-Id header with serve_multiplexed_model_id across the server and client code. This includes updating documentation, client-side header creation, and server-side validation and logging. Additionally, it introduces get_multiplexed_adapter methods in several server components, decorated with @serve.multiplexed, and calls them at appropriate points (e.g., adapter creation) to register models with Ray Serve's multiplexer. The changes appear to be consistent and correctly implement the intended feature. I have one suggestion regarding the modification of the default supported models list.

@tastelikefeet tastelikefeet merged commit 5cba3a1 into main Feb 16, 2026
3 of 4 checks passed
@tastelikefeet tastelikefeet deleted the fix/routing-0215 branch February 16, 2026 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments