[WIP] Streaming Chat with Referenced Documents PoC by TamiTakamiya · Pull Request #40 · lightspeed-core/lightspeed-stack

TamiTakamiya · 2025-05-20T01:30:25Z

Description

Streaming Chat with Referenced Documents PoC. Requires llama-stack 0.2.7.

Screenshare.-.2025-05-19.9_24_25.PM.mp4

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

manstis · 2025-05-20T08:12:39Z

@TamiTakamiya From what I understand this effectively adds post processing of the response turns to add formatting and most importantly extract the "Referenced Documents" from tool_call="query_from_memory"? Since, as we discussed yesterday, there will be no runtime "lightspeed-stack" component and "lightspeed-stack" will in essence be an "image builder"/configuration tool the code you propose in this PR should probably be elsewhere.

I am wondering whether it's possible to move all of this to a custom agent overriding the run method (or another, but this seems a plausible candidate?) to handle the post processing?

We'd then configure the llama-stack to use our custom external provider:

providers: 
  agent: 
    provider_type: inline::our-super-agent
...

At best this code would have to live in ansible-ai-connect-service but we should favour trying to write our own agent.

WDYT?

manstis · 2025-05-20T08:18:04Z

retrieve_response would be in ansible-ai-connect-service as it's using LlamaStackClient to invoke llama-stack etc.

response_processing_wrapper would live in our custom Agent but instead of post-processing LlamaStackClient objects it would post-process the llama-stack server equivalents.

TamiTakamiya · 2025-05-20T12:13:51Z

@manstis Yes, it's a PoC that is intended to show the possibility to implement the road-core/service /streaming-chat endpoint using the llama-stack builtin::rag/knowledge_search. Initially I thought opening a PR for ansible-ai-connect-service, but after I saw the /query endpint in lightspeed-stack, I decided to creating a PR here. This can be anywhere in between llama-stack and the chatbot client.

It is true that the logic is a bit redundant because the post-processing code parses the streaming output using a regex, stores the results, and then the results are sent at the end of streaming. If we write our own agent or tool, implementations could be simplified.

The _yield_printable_events method of the RAGTurnStreamEventPrinterclass was copied from theTurnStreamEventPrinter` class and contains lots of duplicated lines. I want to reduce the amount of duplicated lines.

manstis · 2025-05-22T22:22:17Z

@TamiTakamiya Was you wanting to keep this open given your PR for ansible-ai-connect-service? Perhaps this is now better refactoring to a README or something?

umago

Thanks Tami for this change! I would love to see a streaming endpoint here. But, I do think that this change needs some refactor and make both /query and /streaming_query compatible. I see a lot of potential in sharing code between these two endpoints I think we should do it otherwise we risk making them incompatible in the future.

umago · 2025-05-28T08:33:18Z



+class LLMRequest(BaseModel):
+    query: str


Let's keep models in the models/ directory.

Also, there's now a QueryRequest model that I think we should use instead of introducing a new one:

lightspeed-stack/src/models/requests.py

Line 55 in 0405417

class QueryRequest(BaseModel):

umago · 2025-05-28T08:35:22Z

+            request,
+            response,
+        )
+    )


Can we use the QueryResponse for this ? I don't see why introducing a new StreamingResponse model.

umago · 2025-05-28T08:35:49Z

+
+    # select the first LLM
+    llm = next(m for m in models if m.model_type == "llm")
+    model_id = llm.identifier


Let's keep compatibility with the /query endpoint, which supports passing a model/provider in the request

umago · 2025-05-28T08:36:16Z

+When a tool is required to answer the user's query, respond only with <|tool_call|> 
+followed by a JSON list of tools used. If a tool does not exist in the provided 
+list of tools, notify the user that you do not have the ability to fulfill the request.
+""",


Ditto, keeping compatibility with the /query endpoint which supports passing a system_prompt in the request

umago · 2025-05-28T08:37:14Z

+        session_id=session_id,
+    )
+    return response
+    # return str(response.output_message.content)


A lot of this code is duplicated with /query. I think we can refactor these two endpoints and have a common base of code. Otherwise we will endpoint creating a lot of inconsistencies between the two.

TamiTakamiya · 2025-07-03T19:17:15Z

@umago @manstis /streaming_query endpoint was implemented with #126. I will close this and open a new PR only for enabling referenced documents support.

Streaming Chat with Referenced Documents PoC

193525f

TamiTakamiya added 2 commits May 20, 2025 08:17

Fix black/ruff formatting errors

b667c3a

Fix black formatting errors

41f8d31

umago suggested changes May 28, 2025

View reviewed changes

This was referenced Jun 24, 2025

AAP-48293: llama-stack: Move to lsc/lightspeed-stack container image ansible/ansible-chatbot-stack#46

Merged

LCORE-177: Implemented streaming_query endpoint #126

Merged

TamiTakamiya closed this Jul 3, 2025

tisnik deleted the TamiTakamiya/AAP-45771/referenced-docs-poc branch July 31, 2025 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Streaming Chat with Referenced Documents PoC#40

[WIP] Streaming Chat with Referenced Documents PoC#40
TamiTakamiya wants to merge 3 commits into
mainfrom
TamiTakamiya/AAP-45771/referenced-docs-poc

TamiTakamiya commented May 20, 2025

Uh oh!

manstis commented May 20, 2025

Uh oh!

manstis commented May 20, 2025

Uh oh!

TamiTakamiya commented May 20, 2025

Uh oh!

manstis commented May 22, 2025

Uh oh!

umago left a comment

Uh oh!

umago May 28, 2025

Uh oh!

umago May 28, 2025

Uh oh!

umago May 28, 2025

Uh oh!

umago May 28, 2025

Uh oh!

umago May 28, 2025

Uh oh!

TamiTakamiya commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TamiTakamiya commented May 20, 2025

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

manstis commented May 20, 2025

Uh oh!

manstis commented May 20, 2025

Uh oh!

TamiTakamiya commented May 20, 2025

Uh oh!

manstis commented May 22, 2025

Uh oh!

umago left a comment

Choose a reason for hiding this comment

Uh oh!

umago May 28, 2025

Choose a reason for hiding this comment

Uh oh!

umago May 28, 2025

Choose a reason for hiding this comment

Uh oh!

umago May 28, 2025

Choose a reason for hiding this comment

Uh oh!

umago May 28, 2025

Choose a reason for hiding this comment

Uh oh!

umago May 28, 2025

Choose a reason for hiding this comment

Uh oh!

TamiTakamiya commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants