Skip to content

[Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages#61372

Merged
kouroshHakha merged 1 commit intoray-project:masterfrom
eicherseiji:sglang-lint
Feb 27, 2026
Merged

[Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages#61372
kouroshHakha merged 1 commit intoray-project:masterfrom
eicherseiji:sglang-lint

Conversation

@eicherseiji
Copy link
Copy Markdown
Contributor

Fix

[2026-02-26T22:03:30Z] use logger.warning(......................................................Passed
[2026-02-26T22:03:30Z] + for HOOK in "${HOOKS[@]}"
[2026-02-26T22:03:30Z] + pre-commit run ruff --all-files --show-diff-on-failure
[2026-02-26T22:03:30Z] [INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[2026-02-26T22:03:30Z] [INFO] Once installed this environment will be reused.
[2026-02-26T22:03:30Z] [INFO] This may take a few minutes...
[2026-02-26T22:03:41Z] ruff.....................................................................Failed
[2026-02-26T22:03:41Z] - hook id: ruff
[2026-02-26T22:03:41Z] - exit code: 1
[2026-02-26T22:03:41Z]
[2026-02-26T22:03:41Z] python/ray/llm/examples/sglang/modules/sglang_engine.py:330:22: F821 Undefined name `format_messages_to_prompt`
[2026-02-26T22:03:41Z]     |
[2026-02-26T22:03:41Z] 328 |         else:
[2026-02-26T22:03:41Z] 329 |             # Chat embedding request - convert messages to prompt
[2026-02-26T22:03:41Z] 330 |             prompt = format_messages_to_prompt(request.messages)
[2026-02-26T22:03:41Z]     |                      ^^^^^^^^^^^^^^^^^^^^^^^^^ F821
[2026-02-26T22:03:41Z] 331 |
[2026-02-26T22:03:41Z] 332 |         # async_encode handles both single strings and lists of strings
[2026-02-26T22:03:41Z]     |
[2026-02-26T22:03:41Z]
[2026-02-26T22:03:41Z] Found 1 error.
[2026-02-26T22:03:41Z] All checks passed!
[2026-02-26T22:03:41Z]
[2026-02-26T22:03:42Z] ruff.....................................................................Passed
[2026-02-26T22:03:42Z] 🚨 Error: The command exited with status 1

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji requested a review from a team as a code owner February 26, 2026 23:12
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Feb 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a crash in the SGLang example caused by an undefined function call. The fix correctly replaces the call with existing helper methods from the same class. My review includes one suggestion to adjust the prompt formatting for embedding requests to ensure the resulting embedding is not unintentionally skewed by a generation-specific token.

Comment on lines +330 to +332
prompt = self._render_fallback_prompt(
self._build_chat_messages(request.messages)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _render_fallback_prompt method appends assistant: to the prompt, which is suitable for generation tasks to prompt the model for a response. However, for embedding requests, the goal is typically to get a representation of the existing conversation history. Including this generation prompt might skew the resulting embedding vector.

To ensure the embedding accurately reflects only the provided conversation, consider creating the prompt from the chat messages directly, without appending the assistant: token.

Suggested change
prompt = self._render_fallback_prompt(
self._build_chat_messages(request.messages)
)
chat_messages = self._build_chat_messages(request.messages)
prompt_lines = []
for message in chat_messages:
role = str(message.get("role", "user"))
content = message.get("content", "") or ""
prompt_lines.append(f"{role}: {content}")
prompt = "\n".join(prompt_lines)

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

prompt = format_messages_to_prompt(request.messages)
prompt = self._render_fallback_prompt(
self._build_chat_messages(request.messages)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback prompt appends "assistant:" unsuitable for embeddings

Medium Severity

_render_fallback_prompt unconditionally appends "assistant:" at the end of the rendered prompt (line 153), which is designed for text generation to cue the model to produce the next assistant turn. For embedding requests, this trailing "assistant:" contaminates the text being encoded, producing embeddings that don't faithfully represent the actual conversation content. The embedding code path now always uses this generation-oriented renderer rather than a plain message concatenation.

Additional Locations (1)

Fix in Cursor Fix in Web

@kouroshHakha kouroshHakha changed the title [LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages [Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages Feb 27, 2026
@kouroshHakha kouroshHakha merged commit 3006839 into ray-project:master Feb 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants