Skip to content

Conversation

@rawagner
Copy link
Contributor

@rawagner rawagner commented Jul 1, 2025

Description

Before this change, a new session is created for every query, which makes the assistant hardly use-able.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Copy link
Contributor

@manstis manstis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look out of curiosity really.

I think how a conversation_id is created needs some thought as it's really the llama-stack session_id now.

@rawagner rawagner force-pushed the fix_session branch 3 times, most recently from 25b9c40 to 215f914 Compare July 2, 2025 06:09
@eranco74
Copy link
Contributor

eranco74 commented Jul 2, 2025

@onmete how is this done in OLS?

@tisnik
Copy link
Contributor

tisnik commented Jul 2, 2025

@onmete how is this done in OLS?

@eranco74:

there is no Llama Stack involved in OLS.
So just conversation_id is needed:

  • when not provided in query endpoint, it is created
  • when provided, it is used as is

Nothing super special there

@rawagner
Copy link
Contributor Author

rawagner commented Jul 2, 2025

Thanks for the quick review.

I've looked into this a bit further and realized that a new Agent is created on every request - which results in having empty sessions. I've added a code to cache the Agents so that we can reuse the instances in a following requests.

I don't think that is the solution you'd want to go with and I'm not familiar with the code-base - though, at least, it helps to demonstrate the problem ?

@manstis
Copy link
Contributor

manstis commented Jul 2, 2025

@rawagner I think this is a step in the right direction.

Personally, if a conversation_id was not present in the QueryRequest I'd first create an Agent and session; returning the session_id as the conversation_id... they are one and the same thing just with different terms (conversation_id, IIRC, is a road-core term carried into lightspeed-core and session_id is a llama-stack term.. but they both mean the same thing).

You are correct that we need to re-use Agent for a conversation.

Looking at the llama-stack source code for an Agent the persistent store (for the conversation) is created when the Agent is instantiated. If we create a new Agent for each QueryRequest a new agent_id is assigned and a new persistent store created.... meaning we effectively loose any conversational history.

Similar changes will be needed for https://github.com/lightspeed-core/lightspeed-stack/blob/main/src/app/endpoints/streaming_query.py

I suspect this PR also addresses #122

Finally, and conscious of my participation on lightspeed-core, I'm just a contributor like yourself. I am not part of the lightspeed-core team and therefore review/approval/rejection of this PR is really for the likes of @tisnik to decide.

I do however believe that this PR is important, addressing a serious flaw in the existing code.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perfectly ok with update like this, but we'd need some integration and/or end to end tests for it to be able to simulate multiple requests from different clients etc. Gimme some time please

@rawagner rawagner force-pushed the fix_session branch 2 times, most recently from 177c221 to 8100bd1 Compare July 2, 2025 12:29
Copy link
Contributor

@manstis manstis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Thank-you @rawagner I think this is a great improvement on your original PR.

This will however need applying to streaming_query.py too.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct in overall, just have some nitpicks. Pls update, then we'll merge. TYVM

@tisnik
Copy link
Contributor

tisnik commented Jul 3, 2025

@rawagner it looks nice now! You'd need rebase/resolve conflict - the same code was changed meantime. But it should be easy. Thanks a lot in advance!

@manstis
Copy link
Contributor

manstis commented Jul 3, 2025

it looks nice now! You'd need rebase/resolve conflict - the same code was changed meantime. But it should be easy. Thanks a lot in advance!

@rawagner @tisnik

And we need the same changes made to the streaming_query.py endpoint handler 👍

@rawagner
Copy link
Contributor Author

rawagner commented Jul 3, 2025

Looking into it. Thanks for the reviews!

@rawagner
Copy link
Contributor Author

rawagner commented Jul 3, 2025

For now, i've just quick-fixed the streaming_query. I am looking for a proper solution.

@rawagner rawagner force-pushed the fix_session branch 2 times, most recently from 7cf0493 to 9d60493 Compare July 3, 2025 09:16
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@umago umago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise it looks good, thanks very much for this addition.

I'm however a bit skeptical about the use of the expiringdict library, it seems like an abandoned project. Fortunately, there's another project which is well-maintained with a similar syntax that we can use. Left more details inline.

"uvicorn>=0.34.3",
"llama-stack>=0.2.13",
"rich>=14.0.0",
"expiringdict>=1.2.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should be using expiringdict, the project looks abandoned [0] the last release was in 2022 and the code repository is marked as not active [1].

May I suggest using cachetools for this ? Very similar syntax and well maintained project [2]

from cachetools import TTLCache
import time

cache = TTLCache(maxsize=100, ttl=5)  # max 100 items, TTL 5 seconds

cache['foo'] = 'bar'
print(cache['foo'])   # Prints bar

time.sleep(6)
print(cache.get('foo'))  # -> None (expired)

[0] https://pypi.org/project/expiringdict/#history
[1] https://app.travis-ci.com/github/mailgun/expiringdict/
[2] https://pypi.org/project/cachetools/#history

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion. Switched to cachetools.

import logging
import os
from pathlib import Path
from typing import Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: an empty line separating the built-in library to the 3rd party libraries

Copy link
Contributor

@manstis manstis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes as they are are generally fine 👍

However it does introduce an edge-case that could be problematic. I propose a solution.

The edge-case would only manifest itself if different requests had different system_prompt.

Whether:

  • It's considered an unlikely scenario to worry about
  • It should be fixed in this PR
  • It should be fixed in a new PR

I leave to you.

agent = Agent(
client,
model=model_id,
instructions=system_prompt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This represents an interesting scenario I missed earlier.

system_prompt is part of the QueryRequest class that is used per request.

The Agent is now cached with whatever system_prompt was supplied on the first request.

Meaning QueryRequest.system_prompt effectively becomes redundant.

I think we should probably remove instructions=system_prompt from here and ...

Copy link
Contributor

@eranco74 eranco74 Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, it seems strange that the system prompt is passed in the request.

I think that we should keep this and in case the query_request.system_prompt isn't empty we shold add it to the messages (as you suggested).

Unsure whether that is the right thing to do here, perhaps it's better to differ this and solve it in another PR (along with #123.


vector_db_ids = [vector_db.identifier for vector_db in client.vector_dbs.list()]
response = agent.create_turn(
messages=[UserMessage(role="user", content=query_request.query)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and change this to be:

messages=[
    UserMessage(role="user", content=query_request.query),
    SystemMessage(role="system", content=query_request.system_prompt),
]

This is what llama-stack is doing already:

https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/agents/meta_reference/agent_instance.py#L223

https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/agents/meta_reference/agent_instance.py#L168-L175

agent = AsyncAgent(
client, # type: ignore[arg-type]
model=model_id,
instructions=system_prompt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise; remove this line.

vector_db.identifier for vector_db in await client.vector_dbs.list()
]
response = await agent.create_turn(
messages=[UserMessage(role="user", content=query_request.query)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, add a SystemMessage here for the system_prompt.

omertuc added a commit to omertuc/assisted-chat that referenced this pull request Jul 4, 2025
There's two PRs that haven't been merged yet, but are useful:

lightspeed-core/lightspeed-stack#163

(tl;dr - conversation_id retention fix)

openshift-assisted/assisted-installer-ui#3016

(tl;dr - draft UI for the chatbot)

This commit updates the submodules to checkout those PRs
omertuc added a commit to omertuc/assisted-chat that referenced this pull request Jul 4, 2025
There's two PRs that haven't been merged yet, but are useful:

lightspeed-core/lightspeed-stack#163

(tl;dr - conversation_id retention fix)

openshift-assisted/assisted-installer-ui#3016

(tl;dr - draft UI for the chatbot)

This commit updates the submodules to checkout those PRs
omertuc added a commit to omertuc/assisted-chat that referenced this pull request Jul 4, 2025
There's two PRs that haven't been merged yet, but are useful:

lightspeed-core/lightspeed-stack#163

(tl;dr - conversation_id retention fix)

openshift-assisted/assisted-installer-ui#3016

(tl;dr - draft UI for the chatbot)

This commit updates the submodules to checkout those PRs

Also updates the assisted-chat-pod.yaml to use correctly set up the UI
pod so it's accessible at http://localhost:8080 and can communicate with
the lightspeed-stack container. This still doesn't fully work because
the proxy in the container cannot be configured to use the token / URL
yet
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please:

  1. rebase
  2. remove pdm.lock completely, we switched to uv to meantime (sorry, too many changes occurred in the last week)

@rawagner
Copy link
Contributor Author

rawagner commented Jul 7, 2025

I've rebased & switched to cachetools as suggested.
for now i've not made any changes to system_prompt as the discussion on the proper solution is ongoing & I'd rather see those changes in a separate PR.

@tisnik tisnik closed this pull request by merging all changes into lightspeed-core:main in 5737dbf Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants