Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming ONLY the final message?[Feature Request]: #1143

Closed
tyler-suard-parker opened this issue Jan 4, 2024 · 8 comments · Fixed by #1551
Closed

Streaming ONLY the final message?[Feature Request]: #1143

tyler-suard-parker opened this issue Jan 4, 2024 · 8 comments · Fixed by #1551
Labels
enhancement New feature or request

Comments

@tyler-suard-parker
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Hello. I am using AutoGen as a retrieval augmented generation agent. It works fantastically, and it performs multiple searches for different topics when necessary. However, building and sending the final answer takes a long time, too long for my users. I was hoping there is a way to stream just that one final answer, as it takes the majority of the time (Like 20 seconds out of 30 seconds total). I looked at all the open issues and pull requests and I am still not sure of the status of streaming with AutoGen.

Describe the solution you'd like

In the user_proxy agent class, have a parameter called stream_final_message = True.
This will allow all the agents to converse back and forth and pull whatever information is needed, but the final message is streamed so users don't have to wait for the entire formation of that message, because it tends to be long.

Additional context

No response

@tyler-suard-parker tyler-suard-parker added the enhancement New feature or request label Jan 4, 2024
@rickyloynd-microsoft
Copy link
Contributor

@thinkall Do you think streaming would help here?

@victordibia
Copy link
Collaborator

I am not sure streaming might help here.
Interaction between agents in AutoGen is sequential currently ..ie, each agent generates their response which gets sent to the next agent (written into their message history). This means all previous messages must be generated (and the associated latent), before the final response is generated.
In terms of UX, what could help might be showing users the intermediate messages as they are generated towards a final answer.
Happy to hear more thoughts here.

@tyler-suard-parker
Copy link
Contributor Author

@victordibia Thank you for your input. I understand that the interactions between agents are sequential. Our agent interaction is something like this:

  1. Agent receives question (0 seconds)
  2. Agent generates a query (1 second)
  3. Search is performed using query and results are returned (1 second)
  4. Answer to user question is generated using the query results (30 seconds)

I am hoping to stream just number 4 to my frontend, because users are not willing to wait those 30 seconds to receive an answer, and it would be great if they could at least see the first few words immediately, as would be the case with streaming.

@victordibia
Copy link
Collaborator

victordibia commented Jan 5, 2024

Ah ... got it. You want to stream responses (in your case, just the last message).
I recall there was a PR for streaming.
@ragyabraham has extensive experience in that area (he's built a tool that implements this functionality)
@ragyabraham , any pointers you can share will be appreciated!

@tyler-suard-parker
Copy link
Contributor Author

Thank you @victordibia ! @ragyabraham I am sure this is a common use case. I want to be able to stream just the last message to my front end, as it is being created. Do you have any suggestions on how I could do that?

@ragyabraham
Copy link
Collaborator

Hey @tyler-suard-parker sure. We utilise sockets to stream messages to the FE. We instantiate a socket client and pass that as a callable in the agent config. Then we use that to emit the message to the FE. If you want more detail checkout our fork of autogen

@tyler-suard-parker
Copy link
Contributor Author

tyler-suard-parker commented Jan 6, 2024

@ragyabraham Thank you so much for your help! I was not able to get your branch to run, I opened an issue. For my use case, I am using a frontend, an azure functions app for the backend, and openai. My main concern is the OpenAI generation time, some answers take up to 2 minutes to generate and users are complaining, so I want every word to hit my frontend as it is generated by OpenAI. Would I be able to do that using your fork?

@lordlinus
Copy link
Collaborator

+1 looking for the same.
How can i stream the final messages as a stream ( also, ideal if we can stream the intermediate messages )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants