Issue: how stream results with long context #5532

qlql489 · 2023-06-01T00:52:51Z

Issue you'd like to raise.

i follow the chapter “Chat Over Documents with Chat History” to build a bot chat with pdf，
i want to streanming return,
but when i use stuff chain like this

    doc_chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff", prompt=QA_PROMPT)
    chain = ConversationalRetrievalChain(
        retriever=vector_db.as_retriever(),
        question_generator=question_generator,
        combine_docs_chain=doc_chain
    )

it return "This model's maximum context length is 4097 tokens, however you requested 5741 tokens (5485 in your prompt; 256 for the completion). Please reduce your prompt; or completion length"

when i use map_reduce chain

doc_chain = load_qa_chain(OpenAI(temperature=0,streaming=True,callbacks=[StreamingStdOutCallbackHandler()]), chain_type="map_reduce", combine_prompt=getQaMap_reducePromot())

it return "Cannot stream results with multiple prompts."

how to resolve it when the context is too long

Suggestion:

No response

fitolobo · 2023-06-15T21:26:26Z

Same problem here, I would like to stream the chain output, something like:
llm = OpenAI(streaming= True, temperature = 0.0, model_name= "gpt-3.5-turbo", callbacks=some_callback)
chain = MultiRetrievalQAChain.from_retrievers(llm, multi_retriever_info, verbose=False)

to yield the tokens of the last answer. I could stream, and print the json intermediate step, however I couldn't get a generator following different instructions concerning the callbacks in the documentation.

@hwchase17

…async (#6181) This will add the ability to add an AsyncCallbackManager (handler) for the reducer chain, which would be able to stream the tokens via the `async def on_llm_new_token` callback method Fixes # (issue) [5532](#5532) @hwchase17 @agola11 The following code snippet explains how this change would be used to enable `reduce_llm` with streaming support in a `map_reduce` chain I have tested this change and it works for the streaming use-case of reducer responses. I am happy to share more information if this makes solution sense. ``` AsyncHandler .......................... class StreamingLLMCallbackHandler(AsyncCallbackHandler): """Callback handler for streaming LLM responses.""" def __init__(self, websocket): self.websocket = websocket # This callback method is to be executed in async async def on_llm_new_token(self, token: str, **kwargs: Any) -> None: resp = ChatResponse(sender="bot", message=token, type="stream") await self.websocket.send_json(resp.dict()) Chain .......... stream_handler = StreamingLLMCallbackHandler(websocket) stream_manager = AsyncCallbackManager([stream_handler]) streaming_llm = ChatOpenAI( streaming=True, callback_manager=stream_manager, verbose=False, temperature=0, ) main_llm = OpenAI( temperature=0, verbose=False, ) doc_chain = load_qa_chain( llm=main_llm, reduce_llm=streaming_llm, chain_type="map_reduce", callback_manager=manager ) qa_chain = ConversationalRetrievalChain( retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator, callback_manager=manager, ) # Here `acall` will trigger `acombine_docs` on `map_reduce` which should then call `_aprocess_result` which in turn will call `self.combine_document_chain.arun` hence async callback will be awaited result = await qa_chain.acall( {"question": question, "chat_history": chat_history} ) ```

clemlesne · 2023-07-26T17:50:24Z

Seems related to #1349.

straeter · 2023-10-11T16:17:53Z

was this fixed in 2b3b4e0 ? If so, can someone please create an example how to stream a load_qa_chain response?

dosubot · 2024-01-31T16:31:27Z

Hi, @qlql489,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue pertains to encountering errors when attempting to stream results with a long context in a chatbot built using the "Chat Over Documents with Chat History" chapter. Another user, "fitolobo," has also experienced the same problem and is seeking a solution to stream the chain output. User "clemlesne" has pointed out a related issue, while "straeter" has inquired about a potential fix and requested an example on how to stream a load_qa_chain response.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

MalhotraVijay mentioned this issue Jun 14, 2023

Add the ability to run the map_reduce chains process results step as async #6181

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jan 31, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: how stream results with long context #5532

Issue: how stream results with long context #5532

qlql489 commented Jun 1, 2023

fitolobo commented Jun 15, 2023 •

edited

Loading

clemlesne commented Jul 26, 2023

straeter commented Oct 11, 2023

dosubot bot commented Jan 31, 2024

Issue: how stream results with long context #5532

Issue: how stream results with long context #5532

Comments

qlql489 commented Jun 1, 2023

Issue you'd like to raise.

Suggestion:

fitolobo commented Jun 15, 2023 • edited Loading

clemlesne commented Jul 26, 2023

straeter commented Oct 11, 2023

dosubot bot commented Jan 31, 2024

fitolobo commented Jun 15, 2023 •

edited

Loading