LCORE-177: Implemented streaming_query endpoint by jrobertboos · Pull Request #126 · lightspeed-core/lightspeed-stack

jrobertboos · 2025-06-24T16:01:39Z

Description

Added streaming_query endpoint as well as unit tests. The streaming_query endpoint is closely based off of the query endpoint. The unit tests for streaming_query were generated using vibe coding.

Type of change

Related Tickets & Documents

Related Issue LCORE-177
Closes LCORE-177

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

* Added streaming_query endpoint so that it conforms with road-core streaming_query endpoint.

* Added unit testing for streaming_query endpoint, it is very closely based off of test_query. * test_streaming_query was generated using vibe coding.

manstis · 2025-06-24T16:05:57Z

@TamiTakamiya Looks like @jrobertboos has beaten you to it... but we need to be sure this covers our requirement. Could you please take a look.

tisnik

please re-use existing functions from query.py (or we can move these function to new module if it make more sense). The approach you choosen seems to be correct. Thank you!

tisnik · 2025-06-24T16:10:54Z

Also to nitpick something - core coverage report found not covered lines:

src/app/endpoints/streaming_query.py     121      2    98%   131-141

(error case handling)

* Still not passing `black` test. Thats because streaming_query is based off of query so they have similar code.

TamiTakamiya · 2025-06-24T19:24:02Z

@tisnik @jrobertboos @manstis Though I am aware that same codes are found in the non-streaming /query endpoint, I see an agent instance and a session are created in retrieve_response(), which is called at every invocation of /streaming_query endpoint. I was thinking that an agent session would replace the transactions, which were implemented in road-core/service. Even if we want to have conversation IDs separately, I think we'd better use the same agent/session for the one specific conversation ID. What do you think?

manstis · 2025-06-24T19:41:40Z

Totally agree @TamiTakamiya

I've already raised a GitHub Issue about not creating the client for everything incoming request and, IMO, streaming should definitely use the same llama-stack session.

I've not looked at lightspeed-stack session/conversation management. Hopefully they're not reinventing something supported by llama-stack.

TamiTakamiya · 2025-06-24T19:47:17Z

+    if query_request.attachments:
+        validate_attachments_metadata(query_request.attachments)
+
+    agent = Agent(


Shouldn't this be AsyncAgent instead of Agent?

TamiTakamiya · 2025-06-24T19:57:37Z

+        input_shields=available_shields if available_shields else [],
+        tools=[],
+    )
+    session_id = agent.create_session("chat_session")


If Agent is replaced with AsyncAgent, class methods like create_session, create_turn, etc. are called asynchronously and probably we need to be add await to each call.

@jrobertboos I attempted to replace Agent with AsyncAgent on the top of this PR. Would you take a look at this commit?

@jrobertboos I have added a small change to support AsyncLlamaStackAsLibraryClient.

TamiTakamiya · 2025-06-24T20:21:12Z

+    conversation_id = retrieve_conversation_id(query_request)
+    response = retrieve_response(client, model_id, query_request)
+
+    def response_generator(turn_response: Any) -> Iterator[str]:


The RAG agent sample in the llama-stack documentation uses AgentEventLogger, which is acturally EventLogger, which uses TurnStreamEventPrinter. TurnStreamEventPrinter can process different event types including errors. I think we'd better reuse the code found in TurnStreamEventPrinter to implement this response_generator method so that we can print various ouputs from a llama-stack agent.

I chose to make the response_generator in the way that it was implemented in the llama-stack playground UI component. I haven't used the TurnStreamEventPrinter but I will experiment with it tomorrow :).

Please look at the video recording attached to ansible/ansible-ai-connect-service#1655
Since it may take a longer time to process a user query that involes tool callings, we want to show some intermediate outputs from LLM's "thinking" process" before showing the final output.

manstis · 2025-06-24T20:36:16Z

@TamiTakamiya It is clear to me that you have in-depth knowledge about streaming support in llama-stack.

Perhaps this PR, that I believe was put together rather hastily, should be closed and your PR updated instead?

I worry that this PR may also lack support for "Referenced documents".

umago

Comments inline

umago · 2025-06-25T08:59:43Z

+@router.post("/streaming_query")
+async def streaming_query_endpoint_handler(
+    _request: Request,
+    query_request: QueryRequest,


One parameter that was left off when implementing /query was "media_type" [0]. I think we should create a StreamQueryRequest that inherits from QueryRequest and add that parameter.

By default the /streaming_query returns "plain/text" but can also output JSON when media_type is set to "application/json".

[0] https://github.com/road-core/service/blob/b1ee2f4d787c53fe304b82bcc7a11bfad035e025/ols/app/models/models.py#L73

There's a TODO for this already

lightspeed-stack/src/models/requests.py

Line 58 in a26c46b

# TODO(lucasagomes): add media_type when needed, current implementation

I know haha I wrote it

umago · 2025-06-25T09:01:51Z

+)
+
+logger = logging.getLogger("app.endpoints.handlers")
+router = APIRouter(tags=["query"])


s/query/streaming_query

What do you mean by this ^?

https://phoenixnap.com/kb/sed-replace

Replace query with streaming_query.

Yes, replace the tags=["query"] with tags=["streaming_query"]

The tags should match the endpoint

umago · 2025-06-25T09:02:11Z

+        {
+            "event": "end",
+            "data": {
+                "referenced_documents": None,  # TODO(jboos): implement referenced documents


by default should be an empty list

* Switched to EventLogger in order to capture more output.

TamiTakamiya · 2025-06-25T20:47:06Z

@manstis

Perhaps this PR, that I believe was put together rather hastily, should be closed and your PR updated instead?

I worry that this PR may also lack support for "Referenced documents".

My PR is outdated and requires some rework. Since @jrobertboos is actively working on this PR, I will wait until this is merged and add referenced documents support on the top of it.

For reusing LlamaStackClient, I think we can have a separate PR, but I wish we can include the AsyncAgent support, which also need to use AsyncLllamaStackClient, in this PR.

TamiTakamiya · 2025-06-27T16:13:25Z

@jrobertboos As Ansible Lightspeed chatbot will use streaming mode only, this streaming API support is critical. Could you give us the ETA for this?

jrobertboos · 2025-06-27T19:50:15Z

@TamiTakamiya Sorry, Ive been really under the weather these last 2 days so I haven't been able to get a lot done, Im planning on finishing this up over the weekend and will hopefully have all the requested changes implemented by Monday or Tuesday. Sorry for the delay.

* Removed use of EventLogger due to new use of AsyncAgent

* Added tool execution events to output stream. * Separated response generator out event more.

* I need to find a way to automate this!!!

jrobertboos · 2025-06-30T16:46:32Z

Im pretty sure this PR is complete from a content standpoint. I'm not sure what I should do about the 2 checks that are failing.

For the linter should I abstract the code that is marked as duplicate? (IDK if that would make sense but I can if that is the standard)
For the Pyright is there anyway I can ignore the error just because that is a llama-stack issue?

P.S Sorry for the string of commits 😞

TamiTakamiya · 2025-06-30T17:43:39Z

+                complete_response += json.loads(event.replace("data: ", ""))["data"][
+                    "token"
+                ]
+                yield event


I think we need to increment chunk_id by inserting chunk_id += 1 or similar code here.

Yes, I missed that, thanks! Ill wait to change that until you are done reviewing :).

TamiTakamiya

I want issues found by Pyright and Python linter to be fixed and I also add one minor comment on stream chunk id. Other than these points, changes looked good. Thanks!

tisnik

nice work

jrobertboos added 2 commits June 24, 2025 11:19

Inital streaming_query commit

8282533

* Added streaming_query endpoint so that it conforms with road-core streaming_query endpoint.

Added Unit Tests for streaming_query

7ad6cde

* Added unit testing for streaming_query endpoint, it is very closely based off of test_query. * test_streaming_query was generated using vibe coding.

manstis requested a review from TamiTakamiya June 24, 2025 16:05

tisnik requested changes Jun 24, 2025

View reviewed changes

jrobertboos added 2 commits June 24, 2025 14:53

Fixed checks

732f794

* Still not passing `black` test. Thats because streaming_query is based off of query so they have similar code.

Fixed mypy type errors

0de8874

TamiTakamiya reviewed Jun 24, 2025

View reviewed changes

umago reviewed Jun 25, 2025

View reviewed changes

Switched to using EventLogger

1058290

* Switched to EventLogger in order to capture more output.

jrobertboos added 4 commits June 29, 2025 13:45

Added AsyncAgent

c24f491

* Removed use of EventLogger due to new use of AsyncAgent

Added Tool Execution to Stream

551c176

* Added tool execution events to output stream. * Separated response generator out event more.

Fixed PyDocs

09302a0

Formatted Code

5ab40dc

* I need to find a way to automate this!!!

jrobertboos requested review from TamiTakamiya and tisnik June 30, 2025 16:52

Changed APIRouter tag

f25b6ec

TamiTakamiya reviewed Jun 30, 2025

View reviewed changes

Fixed Pyright

ef71993

tisnik approved these changes Jul 1, 2025

View reviewed changes

tisnik merged commit e01a1d6 into lightspeed-core:main Jul 1, 2025
15 of 16 checks passed

TamiTakamiya mentioned this pull request Jul 3, 2025

[WIP] Streaming Chat with Referenced Documents PoC #40

Closed

14 tasks

Conversation

jrobertboos commented Jun 24, 2025

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

manstis commented Jun 24, 2025

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik commented Jun 24, 2025

Uh oh!

TamiTakamiya commented Jun 24, 2025

Uh oh!

manstis commented Jun 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manstis commented Jun 24, 2025

Uh oh!

umago left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TamiTakamiya commented Jun 25, 2025

Uh oh!

TamiTakamiya commented Jun 27, 2025

Uh oh!

jrobertboos commented Jun 27, 2025

Uh oh!

jrobertboos commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrobertboos Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TamiTakamiya left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

jrobertboos commented Jun 30, 2025 •

edited

Loading

jrobertboos Jun 30, 2025 •

edited

Loading