-
Notifications
You must be signed in to change notification settings - Fork 55
Do not create new session if conversation_id is provided #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
manstis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look out of curiosity really.
I think how a conversation_id is created needs some thought as it's really the llama-stack session_id now.
25b9c40 to
215f914
Compare
|
@onmete how is this done in OLS? |
|
Thanks for the quick review. I've looked into this a bit further and realized that a new Agent is created on every request - which results in having empty sessions. I've added a code to cache the Agents so that we can reuse the instances in a following requests. I don't think that is the solution you'd want to go with and I'm not familiar with the code-base - though, at least, it helps to demonstrate the problem ? |
|
@rawagner I think this is a step in the right direction. Personally, if a You are correct that we need to re-use Looking at the Similar changes will be needed for https://github.com/lightspeed-core/lightspeed-stack/blob/main/src/app/endpoints/streaming_query.py I suspect this PR also addresses #122 Finally, and conscious of my participation on I do however believe that this PR is important, addressing a serious flaw in the existing code. |
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm perfectly ok with update like this, but we'd need some integration and/or end to end tests for it to be able to simulate multiple requests from different clients etc. Gimme some time please
177c221 to
8100bd1
Compare
manstis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thank-you @rawagner I think this is a great improvement on your original PR.
This will however need applying to streaming_query.py too.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct in overall, just have some nitpicks. Pls update, then we'll merge. TYVM
|
@rawagner it looks nice now! You'd need rebase/resolve conflict - the same code was changed meantime. But it should be easy. Thanks a lot in advance! |
|
Looking into it. Thanks for the reviews! |
|
For now, i've just quick-fixed the streaming_query. I am looking for a proper solution. |
7cf0493 to
9d60493
Compare
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
umago
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise it looks good, thanks very much for this addition.
I'm however a bit skeptical about the use of the expiringdict library, it seems like an abandoned project. Fortunately, there's another project which is well-maintained with a similar syntax that we can use. Left more details inline.
| "uvicorn>=0.34.3", | ||
| "llama-stack>=0.2.13", | ||
| "rich>=14.0.0", | ||
| "expiringdict>=1.2.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should be using expiringdict, the project looks abandoned [0] the last release was in 2022 and the code repository is marked as not active [1].
May I suggest using cachetools for this ? Very similar syntax and well maintained project [2]
from cachetools import TTLCache
import time
cache = TTLCache(maxsize=100, ttl=5) # max 100 items, TTL 5 seconds
cache['foo'] = 'bar'
print(cache['foo']) # Prints bar
time.sleep(6)
print(cache.get('foo')) # -> None (expired)
[0] https://pypi.org/project/expiringdict/#history
[1] https://app.travis-ci.com/github/mailgun/expiringdict/
[2] https://pypi.org/project/cachetools/#history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestion. Switched to cachetools.
| import logging | ||
| import os | ||
| from pathlib import Path | ||
| from typing import Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: an empty line separating the built-in library to the 3rd party libraries
manstis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes as they are are generally fine 👍
However it does introduce an edge-case that could be problematic. I propose a solution.
The edge-case would only manifest itself if different requests had different system_prompt.
Whether:
- It's considered an unlikely scenario to worry about
- It should be fixed in this PR
- It should be fixed in a new PR
I leave to you.
| agent = Agent( | ||
| client, | ||
| model=model_id, | ||
| instructions=system_prompt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represents an interesting scenario I missed earlier.
system_prompt is part of the QueryRequest class that is used per request.
The Agent is now cached with whatever system_prompt was supplied on the first request.
Meaning QueryRequest.system_prompt effectively becomes redundant.
I think we should probably remove instructions=system_prompt from here and ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, it seems strange that the system prompt is passed in the request.
- Also relates to [RFE] Move system_prompt to configuration #123
I think that we should keep this and in case the query_request.system_prompt isn't empty we shold add it to the messages (as you suggested).
Unsure whether that is the right thing to do here, perhaps it's better to differ this and solve it in another PR (along with #123.
|
|
||
| vector_db_ids = [vector_db.identifier for vector_db in client.vector_dbs.list()] | ||
| response = agent.create_turn( | ||
| messages=[UserMessage(role="user", content=query_request.query)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and change this to be:
messages=[
UserMessage(role="user", content=query_request.query),
SystemMessage(role="system", content=query_request.system_prompt),
]
This is what llama-stack is doing already:
| agent = AsyncAgent( | ||
| client, # type: ignore[arg-type] | ||
| model=model_id, | ||
| instructions=system_prompt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise; remove this line.
| vector_db.identifier for vector_db in await client.vector_dbs.list() | ||
| ] | ||
| response = await agent.create_turn( | ||
| messages=[UserMessage(role="user", content=query_request.query)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, add a SystemMessage here for the system_prompt.
There's two PRs that haven't been merged yet, but are useful: lightspeed-core/lightspeed-stack#163 (tl;dr - conversation_id retention fix) openshift-assisted/assisted-installer-ui#3016 (tl;dr - draft UI for the chatbot) This commit updates the submodules to checkout those PRs
There's two PRs that haven't been merged yet, but are useful: lightspeed-core/lightspeed-stack#163 (tl;dr - conversation_id retention fix) openshift-assisted/assisted-installer-ui#3016 (tl;dr - draft UI for the chatbot) This commit updates the submodules to checkout those PRs
There's two PRs that haven't been merged yet, but are useful: lightspeed-core/lightspeed-stack#163 (tl;dr - conversation_id retention fix) openshift-assisted/assisted-installer-ui#3016 (tl;dr - draft UI for the chatbot) This commit updates the submodules to checkout those PRs Also updates the assisted-chat-pod.yaml to use correctly set up the UI pod so it's accessible at http://localhost:8080 and can communicate with the lightspeed-stack container. This still doesn't fully work because the proxy in the container cannot be configured to use the token / URL yet
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please:
- rebase
- remove pdm.lock completely, we switched to
uvto meantime (sorry, too many changes occurred in the last week)
|
I've rebased & switched to cachetools as suggested. |
5737dbf
Description
Before this change, a new session is created for every query, which makes the assistant hardly use-able.
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing