Skip to content

LCORE-177: Implemented streaming_query endpoint#126

Merged
tisnik merged 11 commits into
lightspeed-core:mainfrom
jrobertboos:main
Jul 1, 2025
Merged

LCORE-177: Implemented streaming_query endpoint#126
tisnik merged 11 commits into
lightspeed-core:mainfrom
jrobertboos:main

Conversation

@jrobertboos
Copy link
Copy Markdown
Contributor

Description

Added streaming_query endpoint as well as unit tests. The streaming_query endpoint is closely based off of the query endpoint. The unit tests for streaming_query were generated using vibe coding.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue LCORE-177
  • Closes LCORE-177

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

* Added streaming_query endpoint so that it conforms with road-core
  streaming_query endpoint.
* Added unit testing for streaming_query endpoint, it is very closely
  based off of test_query.
* test_streaming_query was generated using vibe coding.
@manstis manstis requested a review from TamiTakamiya June 24, 2025 16:05
@manstis
Copy link
Copy Markdown
Contributor

manstis commented Jun 24, 2025

@TamiTakamiya Looks like @jrobertboos has beaten you to it... but we need to be sure this covers our requirement. Could you please take a look.

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please re-use existing functions from query.py (or we can move these function to new module if it make more sense). The approach you choosen seems to be correct. Thank you!

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Jun 24, 2025

Also to nitpick something - core coverage report found not covered lines:

src/app/endpoints/streaming_query.py     121      2    98%   131-141

(error case handling)

* Still not passing `black` test. Thats because streaming_query is based
  off of query so they have similar code.
@TamiTakamiya
Copy link
Copy Markdown
Contributor

@tisnik @jrobertboos @manstis Though I am aware that same codes are found in the non-streaming /query endpoint, I see an agent instance and a session are created in retrieve_response(), which is called at every invocation of /streaming_query endpoint. I was thinking that an agent session would replace the transactions, which were implemented in road-core/service. Even if we want to have conversation IDs separately, I think we'd better use the same agent/session for the one specific conversation ID. What do you think?

@manstis
Copy link
Copy Markdown
Contributor

manstis commented Jun 24, 2025

Totally agree @TamiTakamiya

I've already raised a GitHub Issue about not creating the client for everything incoming request and, IMO, streaming should definitely use the same llama-stack session.

I've not looked at lightspeed-stack session/conversation management. Hopefully they're not reinventing something supported by llama-stack.

Comment thread src/app/endpoints/streaming_query.py Outdated
if query_request.attachments:
validate_attachments_metadata(query_request.attachments)

agent = Agent(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be AsyncAgent instead of Agent?

Comment thread src/app/endpoints/streaming_query.py Outdated
input_shields=available_shields if available_shields else [],
tools=[],
)
session_id = agent.create_session("chat_session")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Agent is replaced with AsyncAgent, class methods like create_session, create_turn, etc. are called asynchronously and probably we need to be add await to each call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrobertboos I attempted to replace Agent with AsyncAgent on the top of this PR. Would you take a look at this commit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrobertboos I have added a small change to support AsyncLlamaStackAsLibraryClient.

Comment thread src/app/endpoints/streaming_query.py Outdated
conversation_id = retrieve_conversation_id(query_request)
response = retrieve_response(client, model_id, query_request)

def response_generator(turn_response: Any) -> Iterator[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RAG agent sample in the llama-stack documentation uses AgentEventLogger, which is acturally EventLogger, which uses TurnStreamEventPrinter. TurnStreamEventPrinter can process different event types including errors. I think we'd better reuse the code found in TurnStreamEventPrinter to implement this response_generator method so that we can print various ouputs from a llama-stack agent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to make the response_generator in the way that it was implemented in the llama-stack playground UI component. I haven't used the TurnStreamEventPrinter but I will experiment with it tomorrow :).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at the video recording attached to ansible/ansible-ai-connect-service#1655
Since it may take a longer time to process a user query that involes tool callings, we want to show some intermediate outputs from LLM's "thinking" process" before showing the final output.

@manstis
Copy link
Copy Markdown
Contributor

manstis commented Jun 24, 2025

@TamiTakamiya It is clear to me that you have in-depth knowledge about streaming support in llama-stack.

Perhaps this PR, that I believe was put together rather hastily, should be closed and your PR updated instead?

I worry that this PR may also lack support for "Referenced documents".

Copy link
Copy Markdown
Contributor

@umago umago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline

@router.post("/streaming_query")
async def streaming_query_endpoint_handler(
_request: Request,
query_request: QueryRequest,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One parameter that was left off when implementing /query was "media_type" [0]. I think we should create a StreamQueryRequest that inherits from QueryRequest and add that parameter.

By default the /streaming_query returns "plain/text" but can also output JSON when media_type is set to "application/json".

[0] https://github.com/road-core/service/blob/b1ee2f4d787c53fe304b82bcc7a11bfad035e025/ols/app/models/models.py#L73

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a TODO for this already

# TODO(lucasagomes): add media_type when needed, current implementation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know haha I wrote it

Comment thread src/app/endpoints/streaming_query.py Outdated
)

logger = logging.getLogger("app.endpoints.handlers")
router = APIRouter(tags=["query"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/query/streaming_query

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this ^?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://phoenixnap.com/kb/sed-replace

Replace query with streaming_query.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, replace the tags=["query"] with tags=["streaming_query"]

The tags should match the endpoint

Comment thread src/app/endpoints/streaming_query.py Outdated
{
"event": "end",
"data": {
"referenced_documents": None, # TODO(jboos): implement referenced documents
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default should be an empty list

* Switched to EventLogger in order to capture more output.
@TamiTakamiya
Copy link
Copy Markdown
Contributor

@manstis

Perhaps this PR, that I believe was put together rather hastily, should be closed and your PR updated instead?

I worry that this PR may also lack support for "Referenced documents".

My PR is outdated and requires some rework. Since @jrobertboos is actively working on this PR, I will wait until this is merged and add referenced documents support on the top of it.

For reusing LlamaStackClient, I think we can have a separate PR, but I wish we can include the AsyncAgent support, which also need to use AsyncLllamaStackClient, in this PR.

@TamiTakamiya
Copy link
Copy Markdown
Contributor

@jrobertboos As Ansible Lightspeed chatbot will use streaming mode only, this streaming API support is critical. Could you give us the ETA for this?

@jrobertboos
Copy link
Copy Markdown
Contributor Author

@TamiTakamiya Sorry, Ive been really under the weather these last 2 days so I haven't been able to get a lot done, Im planning on finishing this up over the weekend and will hopefully have all the requested changes implemented by Monday or Tuesday. Sorry for the delay.

* Removed use of EventLogger due to new use of AsyncAgent
* Added tool execution events to output stream.
* Separated response generator out event more.
* I need to find a way to automate this!!!
@jrobertboos
Copy link
Copy Markdown
Contributor Author

jrobertboos commented Jun 30, 2025

Im pretty sure this PR is complete from a content standpoint. I'm not sure what I should do about the 2 checks that are failing.

  • For the linter should I abstract the code that is marked as duplicate? (IDK if that would make sense but I can if that is the standard)
  • For the Pyright is there anyway I can ignore the error just because that is a llama-stack issue?

P.S Sorry for the string of commits 😞

complete_response += json.loads(event.replace("data: ", ""))["data"][
"token"
]
yield event
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to increment chunk_id by inserting chunk_id += 1 or similar code here.

Copy link
Copy Markdown
Contributor Author

@jrobertboos jrobertboos Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed that, thanks! Ill wait to change that until you are done reviewing :).

Copy link
Copy Markdown
Contributor

@TamiTakamiya TamiTakamiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want issues found by Pyright and Python linter to be fixed and I also add one minor comment on stream chunk id. Other than these points, changes looked good. Thanks!

Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

@tisnik tisnik merged commit e01a1d6 into lightspeed-core:main Jul 1, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants