-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue: How to stop extra LLM call in ConversationalRetrievalChain for question rephrasing #9649
Comments
🤖 Hello, Thank you for bringing up this issue. The extra LLM call you're referring to is controlled by the If you want to disable this extra LLM call, you can set class BaseConversationalRetrievalChain(Chain):
...
rephrase_question: bool = False
... By setting You can find more details about this in the Please let me know if this resolves your issue or if you have any other questions. Best, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
not useful. |
@sid8491 If I understand you question correctly, the bot's answer is correct. The
|
@devstein I have tried it like this, but this is not changing anything:
|
@devstein Upon careful examination, I discovered that when I set rephrase_question to False, the tab is making an additional call to the Language Model (LLM) but it's not utilizing the rephrased question. How can I stop this LLM call? There are two issues arising from this extra LLM call:
|
@sid8491 Those are the exact issues I have been searching for solutions on, it delays response and the rephrase gets shown to the user. I want a way where it can rephrase the question, but it doesn't need to increase response time and it doesn't need to show it |
@hwchase17, @baskaryan, I am also facing the same issue. I am using
|
@JoAmps Facing the same issue, have you found a solution yet ? |
rephrase_question=False still generates the repharse_question (I think it will incur costs), but the final answer no longer depends on the previous answer (which is what I want). |
@harshadk-sourcefuse, I'm facing same issued with "In the streaming response, it contains the rephrased question." Did you find the solution to fix it? |
You can't consider it a complete solution, but there is a workaround. If you don't require memory, you can set However, I am still awaiting a proper solution. |
@khanhnd0408 Hey did you find any solution? I am facing the same issue while streaming back the response using the custom callback handler. the response is including the rephrased question as well |
Hi, any update on it? I faced the same issue, and I cant understand why I should receive a rephrased query in the response from the model. I am using a streamed output to pass the answer further token by token but according to current implementation I have to trim the rephrased query somehow. Thats absolutely makes no sence to me. It costs additional time and looks useless |
Hi, is there any update on this issue? facing same issue on v0.1.0 |
🤖 Based on your description, it seems like you want to prevent the additional Language Model (LLM) call in the In the current implementation of LangChain, the LLM call is made regardless of the To prevent the LLM call when def _call(
self,
inputs: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
question = inputs["question"]
get_chat_history = self.get_chat_history or _get_chat_history
chat_history_str = get_chat_history(inputs["chat_history"])
if self.rephrase_question and chat_history_str:
callbacks = _run_manager.get_child()
new_question = self.question_generator.run(
question=question, chat_history=chat_history_str, callbacks=callbacks
)
else:
new_question = question
# Rest of the method remains the same...
async def _acall(
self,
inputs: Dict[str, Any],
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
_run_manager = run_manager or AsyncCallbackManagerForChainRun.get_noop_manager()
question = inputs["question"]
get_chat_history = self.get_chat_history or _get_chat_history
chat_history_str = get_chat_history(inputs["chat_history"])
if self.rephrase_question and chat_history_str:
callbacks = _run_manager.get_child()
new_question = await self.question_generator.arun(
question=question, chat_history=chat_history_str, callbacks=callbacks
)
else:
new_question = question
# Rest of the method remains the same... In these modified methods, the LLM call is made only if You can find the original implementation of these methods in the BaseConversationalRetrievalChain class. Please note that this is a workaround and might not be the best solution. It would be better if the LangChain developers provide an official way to control the LLM call based on the I hope this helps! If you have any other questions, feel free to ask. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
this issue is still happening. Should not be closed |
Issue you'd like to raise.
I am using ConversationalRetrievalChain for the RAG question-answer bot.
There is one LLM call that I have not configured and it is reducing the quality of responses and increasing the time.
The prompt in the LLM call is:
How do I stop this call?
Suggestion:
No response
The text was updated successfully, but these errors were encountered: