[Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages by eicherseiji · Pull Request #61372 · ray-project/ray

eicherseiji · 2026-02-26T23:12:21Z

Fix

[2026-02-26T22:03:30Z] use logger.warning(......................................................Passed
[2026-02-26T22:03:30Z] + for HOOK in "${HOOKS[@]}"
[2026-02-26T22:03:30Z] + pre-commit run ruff --all-files --show-diff-on-failure
[2026-02-26T22:03:30Z] [INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[2026-02-26T22:03:30Z] [INFO] Once installed this environment will be reused.
[2026-02-26T22:03:30Z] [INFO] This may take a few minutes...
[2026-02-26T22:03:41Z] ruff.....................................................................Failed
[2026-02-26T22:03:41Z] - hook id: ruff
[2026-02-26T22:03:41Z] - exit code: 1
[2026-02-26T22:03:41Z]
[2026-02-26T22:03:41Z] python/ray/llm/examples/sglang/modules/sglang_engine.py:330:22: F821 Undefined name `format_messages_to_prompt`
[2026-02-26T22:03:41Z]     |
[2026-02-26T22:03:41Z] 328 |         else:
[2026-02-26T22:03:41Z] 329 |             # Chat embedding request - convert messages to prompt
[2026-02-26T22:03:41Z] 330 |             prompt = format_messages_to_prompt(request.messages)
[2026-02-26T22:03:41Z]     |                      ^^^^^^^^^^^^^^^^^^^^^^^^^ F821
[2026-02-26T22:03:41Z] 331 |
[2026-02-26T22:03:41Z] 332 |         # async_encode handles both single strings and lists of strings
[2026-02-26T22:03:41Z]     |
[2026-02-26T22:03:41Z]
[2026-02-26T22:03:41Z] Found 1 error.
[2026-02-26T22:03:41Z] All checks passed!
[2026-02-26T22:03:41Z]
[2026-02-26T22:03:42Z] ruff.....................................................................Passed
[2026-02-26T22:03:42Z] 🚨 Error: The command exited with status 1

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

gemini-code-assist

Code Review

This pull request fixes a crash in the SGLang example caused by an undefined function call. The fix correctly replaces the call with existing helper methods from the same class. My review includes one suggestion to adjust the prompt formatting for embedding requests to ensure the resulting embedding is not unintentionally skewed by a generation-specific token.

gemini-code-assist · 2026-02-26T23:13:57Z

+            prompt = self._render_fallback_prompt(
+                self._build_chat_messages(request.messages)
+            )


The _render_fallback_prompt method appends assistant: to the prompt, which is suitable for generation tasks to prompt the model for a response. However, for embedding requests, the goal is typically to get a representation of the existing conversation history. Including this generation prompt might skew the resulting embedding vector.

To ensure the embedding accurately reflects only the provided conversation, consider creating the prompt from the chat messages directly, without appending the assistant: token.

Suggested change

prompt = self._render_fallback_prompt(

self._build_chat_messages(request.messages)

)

chat_messages = self._build_chat_messages(request.messages)

prompt_lines = []

for message in chat_messages:

role = str(message.get("role", "user"))

content = message.get("content", "") or ""

prompt_lines.append(f"{role}: {content}")

prompt = "\n".join(prompt_lines)

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-02-26T23:14:03Z

-            prompt = format_messages_to_prompt(request.messages)
+            prompt = self._render_fallback_prompt(
+                self._build_chat_messages(request.messages)
+            )


Fallback prompt appends "assistant:" unsuitable for embeddings

Medium Severity

_render_fallback_prompt unconditionally appends "assistant:" at the end of the rendered prompt (line 153), which is designed for text generation to cue the model to produce the next assistant turn. For embedding requests, this trailing "assistant:" contaminates the text being encoded, producing embeddings that don't faithfully represent the actual conversation content. The embedding code path now always uses this generation-oriented renderer rather than a plain message concatenation.

Additional Locations (1)

python/ray/llm/examples/sglang/modules/sglang_engine.py#L142-L154

[LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages

96bb95e

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

eicherseiji requested a review from a team as a code owner February 26, 2026 23:12

eicherseiji added the go add ONLY when ready to merge, run all tests label Feb 26, 2026

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

cursor bot reviewed Feb 26, 2026

View reviewed changes

jeffreywang-anyscale approved these changes Feb 27, 2026

View reviewed changes

kouroshHakha approved these changes Feb 27, 2026

View reviewed changes

kouroshHakha changed the title ~~[LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages~~ [Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages Feb 27, 2026

kouroshHakha merged commit 3006839 into ray-project:master Feb 27, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages#61372

[Serve][LLM] Replace SGLang format_messages_to_prompt with _build_chat_messages#61372
kouroshHakha merged 1 commit intoray-project:masterfrom
eicherseiji:sglang-lint

eicherseiji commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-            prompt = self._render_fallback_prompt(
-                self._build_chat_messages(request.messages)
-            )
+            chat_messages = self._build_chat_messages(request.messages)
+            prompt_lines = []
+            for message in chat_messages:
+                role = str(message.get("role", "user"))
+                content = message.get("content", "") or ""
+                prompt_lines.append(f"{role}: {content}")
+            prompt = "\n".join(prompt_lines)

Conversation

eicherseiji commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 26, 2026

Choose a reason for hiding this comment

Fallback prompt appends "assistant:" unsuitable for embeddings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants