Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support streaming in dev-ui chat #590

Merged
merged 1 commit into from
Jun 11, 2024
Merged

Conversation

iocanel
Copy link
Collaborator

@iocanel iocanel commented May 15, 2024

The pull request introduces a new option (streaming chat). The option is enabled when model supports streaming chat model and rag is disabled (streaming chat can't work with rag atm). When user enables streaming chat. Responses from the model are streamed to the devui chat screen.

@iocanel iocanel requested a review from a team as a code owner May 15, 2024 14:18
@jmartisk
Copy link
Collaborator

I'm at a conference so I'll have time to test it later, but I'll just throw in that, in case of a timeout, or whatever other error from the LLM, are we sure that we don't get the history in the Dev UI desynchronized with the one stored in the actual chat memory? That's why the original synchronous variant of the chat method always returns the whole history - so the Dev UI can replace the whole history by that, and prevent desynchronization. I think this might be a little tricky to implement robustly when you just stream a single message.

@iocanel
Copy link
Collaborator Author

iocanel commented May 15, 2024

I'm at a conference so I'll have time to test it later, but I'll just throw in that, in case of a timeout, or whatever other error from the LLM, are we sure that we don't get the history in the Dev UI desynchronized with the one stored in the actual chat memory? That's why the original synchronous variant of the chat method always returns the whole history - so the Dev UI can replace the whole history by that, and prevent desynchronization. I think this might be a little tricky to implement robustly when you just stream a single message.

The memory gets modified when onComplete is called, with the whole message. In case of an error the memory is restored from the backup in a way similar to regular chat. Do you think we need to do more?

@geoand
Copy link
Collaborator

geoand commented May 16, 2024

I plan on going to do a 0.14.0 today, so we can leave this for 0.14.1 once @jmartisk gives his final approval

@geoand geoand requested a review from jmartisk May 17, 2024 15:48
@jmartisk
Copy link
Collaborator

jmartisk commented May 20, 2024

The memory gets modified when onComplete is called, with the whole message. In case of an error the memory is restored from the backup in a way similar to regular chat. Do you think we need to do more?

I'm trying to cause an error by setting a short timeout (I used ollama and set quarkus.langchain4j.ollama.timeout=1s). Then after I ask the LLM for something, Quarkus log gets an error:

2024-05-20 12:12:24,707 ERROR [io.qua.dev.run.jso.JsonRpcCodec] (vert.x-eventloop-thread-2) Error in JsonRPC Call: io.vertx.core.impl.NoStackTraceTimeoutException: The timeout period of 1000ms has been exceeded while executing POST /api/chat for server null

but the UI doesn't receive/show anything and the progress bar keeps going forever. Ideally the jsonrpc service should send an error and the page should discard the previous user message.

@iocanel
Copy link
Collaborator Author

iocanel commented May 20, 2024

I updated the PR and added error handling in all possible ways I could imagine:

  • using orError
  • handling the error as part of onNext
  • wrapping the whole thing in try / catch

Even though the error is reaching the UI as I can tell from the console, I can't somehow capture it in my javascript code.
@phillip-kruger any idea what I am doing wrong here?

@iocanel
Copy link
Collaborator Author

iocanel commented May 29, 2024

@jmartisk: I checked what you do in qwc-chat.js and it's not applicable to this case. In this case json rpc returns a multi errors are not just catch using catch. They should be caught using onError but the errors are not propagated. @phillip-kruger mentioned that it most probably an issue in json rpc impl.

@phillip-kruger
Copy link
Member

I have not looked at this yet, but in the mean time, can you see if anything prints in :

  1. The browser console ?
  2. The dev-ui log (the json message tab)

@jmartisk
Copy link
Collaborator

jmartisk commented Jun 5, 2024

I'm now realizing, this will also cause problems when Tools are used, not just RAG, no? Because the streaming response from the server will not contain tool-related messages. :(
To fix that, we would have to check the chat memory for changes (for the tool messages to appear in it) while the streaming is still going on, and somehow hack them into the stream (for sending the augmented message with RAG, we'd actually have to do something similar).

@geoand
Copy link
Collaborator

geoand commented Jun 5, 2024

Yeah, that sounds right

@iocanel
Copy link
Collaborator Author

iocanel commented Jun 11, 2024

@jmartisk: the issue you had with errors being swallowed has been fixed on the quarkus side of the house by @phillip-kruger and should be available in 3.11.1. Nothing additional needed here.

Now, regarding your comment about tools, the suggested solution is not clear to me. Is it something that is required for this PR or something we could add later on?

@jmartisk
Copy link
Collaborator

It would be nice to have RAG and tools working properly with streaming, but if you don't want to do it I can have a look. I have some heavier refactoring planned anyway, to be able to support images.

@jmartisk jmartisk merged commit 93d6f95 into quarkiverse:main Jun 11, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming chat in DevUI
4 participants