Issue: How to stop extra LLM call in ConversationalRetrievalChain for question rephrasing #9649

sid8491 · 2023-08-23T09:53:18Z

Issue you'd like to raise.

I am using ConversationalRetrievalChain for the RAG question-answer bot.
There is one LLM call that I have not configured and it is reducing the quality of responses and increasing the time.

The prompt in the LLM call is:

Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
It is taking around 1 second to complete this call, and it is reducing the quality of the response as well.

How do I stop this call?

Suggestion:

No response

dosubot · 2023-08-23T10:00:33Z

🤖

Hello,

Thank you for bringing up this issue. The extra LLM call you're referring to is controlled by the rephrase_question attribute in the BaseConversationalRetrievalChain class. This attribute determines whether or not to pass the newly generated question to the combine_docs_chain.

If you want to disable this extra LLM call, you can set rephrase_question to False. Here's how you can do it:

class BaseConversationalRetrievalChain(Chain):
    ...
    rephrase_question: bool = False
    ...

By setting rephrase_question to False, the new generated question will only be used for retrieval and the original question will be passed along to the combine_docs_chain. This should prevent the extra LLM call from happening, reducing the response time and potentially improving the quality of responses.

You can find more details about this in the BaseConversationalRetrievalChain class in the libs/langchain/langchain/chains/conversational_retrieval/base.py file.

Please let me know if this resolves your issue or if you have any other questions.

Best,
Dosu

Sources

libs/langchain/langchain/chains/conversational_retrieval/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

sid8491 · 2023-08-23T12:10:11Z

not useful.

devstein · 2023-08-23T13:20:52Z

@sid8491 If I understand you question correctly, the bot's answer is correct. The rephrase_question=False flag skips the step that runs this prompt

Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

sid8491 · 2023-08-23T14:06:35Z

@sid8491 If I understand you question correctly, the bot's answer is correct. The rephrase_question=False flag skips the step that runs this prompt

Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

@devstein I have tried it like this, but this is not changing anything:

qa_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=custom_retriever,
        chain_type="stuff",
        combine_docs_chain_kwargs=QA_PROMPT,
        return_source_documents=True,
        verbose=False,
        rephrase_question=False,
    )

sid8491 · 2023-08-23T14:31:26Z

@devstein Upon careful examination, I discovered that when I set rephrase_question to False, the tab is making an additional call to the Language Model (LLM) but it's not utilizing the rephrased question. How can I stop this LLM call? There are two issues arising from this extra LLM call:

It's causing a delay in response generation, resulting in increased response time.
I am using streaming responses in Streamlit, and the response from this LLM call is also being printed within the on_llm_new_token() function.

JoAmps · 2023-11-03T09:01:42Z

@sid8491 Those are the exact issues I have been searching for solutions on, it delays response and the rephrase gets shown to the user. I want a way where it can rephrase the question, but it doesn't need to increase response time and it doesn't need to show it

harshadk-sourcefuse · 2023-11-08T06:44:06Z

@hwchase17, @baskaryan, I am also facing the same issue. I am using ConversationalRetrievalChain with ConversationBufferWindowMemory, and I am experiencing the following issues:

In the streaming response, it contains the rephrased question.
When the same question is asked again for the second time, it answers with answer not found in context.
It rephrases the question based on the previous question, even when I have not mentioned phrases like 'based on the previous question' or 'as per the previous conversation.' In other words, when I have not indicated to use the previous question to answer the current question, the model still uses the previous question and rephrases the question accordingly, leading to an answer not found in context response.

0ENZO · 2023-11-15T17:13:29Z

@JoAmps Facing the same issue, have you found a solution yet ?

khanhnd0408 · 2023-11-20T02:25:55Z

rephrase_question=False still generates the repharse_question (I think it will incur costs), but the final answer no longer depends on the previous answer (which is what I want).
But I also want to stream the answer back to the client by using a callback with a custom class wrap over StreamingStdOutCallbackHandler, but the on_llm_new_token also includes the output from the repharse_question step. Is there any way to completely disable the repharse step?

khanhnd0408 · 2023-11-20T03:28:31Z

@harshadk-sourcefuse, I'm facing same issued with "In the streaming response, it contains the rephrased question." Did you find the solution to fix it?

harshadk-sourcefuse · 2023-11-20T13:13:04Z

@harshadk-sourcefuse, I'm facing same issued with "In the streaming response, it contains the rephrased question." Did you find the solution to fix it?

You can't consider it a complete solution, but there is a workaround. If you don't require memory, you can set k=0 in ConversationBufferWindowMemory. Since there will be no memory, it will not rephrase the questions.

However, I am still awaiting a proper solution.

summer1704 · 2023-12-14T07:20:28Z

rephrase_question=False still generates the repharse_question (I think it will incur costs), but the final answer no longer depends on the previous answer (which is what I want). But I also want to stream the answer back to the client by using a callback with a custom class wrap over StreamingStdOutCallbackHandler, but the on_llm_new_token also includes the output from the repharse_question step. Is there any way to completely disable the repharse step?

@khanhnd0408 Hey did you find any solution? I am facing the same issue while streaming back the response using the custom callback handler. the response is including the rephrased question as well

Gusakovskyi · 2024-01-16T17:57:51Z

Hi, any update on it? I faced the same issue, and I cant understand why I should receive a rephrased query in the response from the model. I am using a streamed output to pass the answer further token by token but according to current implementation I have to trim the rephrased query somehow. Thats absolutely makes no sence to me. It costs additional time and looks useless

harshadk-sourcefuse · 2024-01-18T12:41:59Z

Hi, is there any update on this issue? facing same issue on v0.1.0

dosubot · 2024-01-18T12:48:12Z

🤖

Based on your description, it seems like you want to prevent the additional Language Model (LLM) call in the ConversationalRetrievalChain when the rephrase_question parameter is set to False. This is because it's causing increased response time and unwanted output in Streamlit's on_llm_new_token() function.

In the current implementation of LangChain, the LLM call is made regardless of the rephrase_question parameter's value. The rephrase_question parameter only controls whether the new question generated by the LLM is passed to the combine_docs_chain or not.

To prevent the LLM call when rephrase_question is set to False, you can modify the _call and _acall methods in the BaseConversationalRetrievalChain class. Here's how you can do it:

def _call(
    self,
    inputs: Dict[str, Any],
    run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
    _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
    question = inputs["question"]
    get_chat_history = self.get_chat_history or _get_chat_history
    chat_history_str = get_chat_history(inputs["chat_history"])
    if self.rephrase_question and chat_history_str:
        callbacks = _run_manager.get_child()
        new_question = self.question_generator.run(
            question=question, chat_history=chat_history_str, callbacks=callbacks
        )
    else:
        new_question = question
    # Rest of the method remains the same...

async def _acall(
    self,
    inputs: Dict[str, Any],
    run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
    _run_manager = run_manager or AsyncCallbackManagerForChainRun.get_noop_manager()
    question = inputs["question"]
    get_chat_history = self.get_chat_history or _get_chat_history
    chat_history_str = get_chat_history(inputs["chat_history"])
    if self.rephrase_question and chat_history_str:
        callbacks = _run_manager.get_child()
        new_question = await self.question_generator.arun(
            question=question, chat_history=chat_history_str, callbacks=callbacks
        )
    else:
        new_question = question
    # Rest of the method remains the same...

In these modified methods, the LLM call is made only if rephrase_question is True and there is a chat history. If rephrase_question is False, the original question is used, and the LLM call is skipped.

You can find the original implementation of these methods in the BaseConversationalRetrievalChain class.

Please note that this is a workaround and might not be the best solution. It would be better if the LangChain developers provide an official way to control the LLM call based on the rephrase_question parameter. You might want to raise this as a feature request in the LangChain repository.

I hope this helps! If you have any other questions, feel free to ask.

Sources

libs/langchain/langchain/chains/conversational_retrieval/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

arturgoms · 2024-05-07T18:22:16Z

this issue is still happening. Should not be closed

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Aug 23, 2023

Fizunimo mentioned this issue Aug 29, 2023

Fix unnecessary LLM calls. #9921

Closed

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 18, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: How to stop extra LLM call in ConversationalRetrievalChain for question rephrasing #9649

Issue: How to stop extra LLM call in ConversationalRetrievalChain for question rephrasing #9649

sid8491 commented Aug 23, 2023

dosubot bot commented Aug 23, 2023

sid8491 commented Aug 23, 2023

devstein commented Aug 23, 2023

sid8491 commented Aug 23, 2023 •

edited

sid8491 commented Aug 23, 2023 •

edited

JoAmps commented Nov 3, 2023

harshadk-sourcefuse commented Nov 8, 2023 •

edited

0ENZO commented Nov 15, 2023

khanhnd0408 commented Nov 20, 2023 •

edited

khanhnd0408 commented Nov 20, 2023

harshadk-sourcefuse commented Nov 20, 2023

summer1704 commented Dec 14, 2023 •

edited

Gusakovskyi commented Jan 16, 2024 •

edited

harshadk-sourcefuse commented Jan 18, 2024

dosubot bot commented Jan 18, 2024

arturgoms commented May 7, 2024

Issue: How to stop extra LLM call in ConversationalRetrievalChain for question rephrasing #9649

Issue: How to stop extra LLM call in ConversationalRetrievalChain for question rephrasing #9649

Comments

sid8491 commented Aug 23, 2023

Issue you'd like to raise.

Suggestion:

dosubot bot commented Aug 23, 2023

Sources

sid8491 commented Aug 23, 2023

devstein commented Aug 23, 2023

sid8491 commented Aug 23, 2023 • edited

sid8491 commented Aug 23, 2023 • edited

JoAmps commented Nov 3, 2023

harshadk-sourcefuse commented Nov 8, 2023 • edited

0ENZO commented Nov 15, 2023

khanhnd0408 commented Nov 20, 2023 • edited

khanhnd0408 commented Nov 20, 2023

harshadk-sourcefuse commented Nov 20, 2023

summer1704 commented Dec 14, 2023 • edited

Gusakovskyi commented Jan 16, 2024 • edited

harshadk-sourcefuse commented Jan 18, 2024

dosubot bot commented Jan 18, 2024

Sources

arturgoms commented May 7, 2024

sid8491 commented Aug 23, 2023 •

edited

sid8491 commented Aug 23, 2023 •

edited

harshadk-sourcefuse commented Nov 8, 2023 •

edited

khanhnd0408 commented Nov 20, 2023 •

edited

summer1704 commented Dec 14, 2023 •

edited

Gusakovskyi commented Jan 16, 2024 •

edited