Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Not getting good chat results on enabling Coversation Memory in langchain #7921

Closed
dkiran1 opened this issue Jul 19, 2023 · 10 comments
Closed
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: memory Related to memory module 🤖:question A specific question about the codebase, product, project, or how to use a feature

Comments

@dkiran1
Copy link

dkiran1 commented Jul 19, 2023

Issue you'd like to raise.

Hi,
Here is how i have initilized conversation memory
memory = ConversationSummaryBufferMemory(llm=llm_model,memory_key="chat_history",return_messages=True,max_token_limit=500)
Here is how I have used ConversationalRetrievalChain
chain=ConversationalRetrievalChain.from_llm(llm_model,retriever=vector.as_retriever(search_kwargs={"k": 10}), memory=memory,verbose=True)
I could see that answer for first question will be good, on asking further questions with
result = chain({"question": }) answer from bot is not good for few questions. With verbose enabled I have observed that apart from the actual history langchain is adding few more conversations by itself. Is there a way to supress this(adding additional conversation)
I also tried with ConversationBufferWindowMemory aswell. There also Iam seeing the performance drop. I tried with new langchain version as well "0.0.235", there also Iam seeing the same issue. Could you help me to undersatnd if this is an existing issue or any mistake am I doing while configuring

Suggestion:

No response

@dosubot dosubot bot added Ɑ: memory Related to memory module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Jul 19, 2023
@dosubot
Copy link

dosubot bot commented Jul 19, 2023

Answer generated by a 🤖

Answer

I understand that you're experiencing some issues with the Conversation Memory feature in LangChain. Specifically, you've noticed that the quality of the bot's responses decreases after the first question, and that LangChain seems to be adding additional conversations to the chat history. You've tried using both ConversationSummaryBufferMemory and ConversationBufferWindowMemory, and you've also tried upgrading to the latest version of LangChain, but none of these steps have resolved the issue.

The problem you're encountering might be related to how the ConversationalRetrievalChain processes the chat history. This component uses the chat history and the new question to form a "standalone question", which it then passes to the retriever to find relevant documents. If the entire conversation is used in this process, it could introduce irrelevant information that confuses the retriever.

The ConversationalRetrievalChain uses a function called _get_chat_history to convert the chat history into a string. If you're seeing extra conversations being added, it could be because of how this function is processing the chat history.

One potential solution is to override the _get_chat_history function to better suit your needs. For instance, you could modify it to only include the most recent turns of the conversation, or to exclude certain types of messages.

Here's an example of how you might do this:

def _get_chat_history(chat_history: List[CHAT_TURN_TYPE]) -> str:
    buffer = ""
    for dialogue_turn in chat_history:
        if isinstance(dialogue_turn, BaseMessage):
            role_prefix = _ROLE_MAP.get(dialogue_turn.type, f"{dialogue_turn.type}: ")
            buffer += f"\n{role_prefix}{dialogue_turn.content}"
        elif isinstance(dialogie_turn, tuple):
            human = "Human: " + dialogue_turn[0]
            ai = "Assistant: " + dialogue_turn[1]
            buffer += "\n" + "\n".join([human, ai])
        else:
            raise ValueError(
                f"Unsupported chat history format: {type(dialogue_turn)}."
                f" Full chat history: {chat_history} "
            )
    return buffer

Please note that this is just one possible solution, and it may not fully address the issue. If you're still having problems, it would be helpful if you could provide more details about the specific questions that are causing issues and the incorrect responses you're receiving. This information could help us identify if there's a specific problem with how the ConversationalRetrievalChain is handling certain types of questions or responses.

If you find that this solution works for you and you believe it could benefit other users, we encourage you to make a pull request. If the issue persists, please ensure you're using the latest version of LangChain, as updates often include bug fixes and performance improvements.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@dkiran1
Copy link
Author

dkiran1 commented Jul 20, 2023

Hi, Thanks for the response, it helped me I implemented custom function for get_chat_history to return only chat history, with this response accuracy is imporved, but still I could see for few questions, it is not able to provide answer with continous conversation, it provides answers on re-initilizing the chain. For eg: Why Does Applying Different Weights to a Model Affect the Inference Performance? , Answer while conversing:" Applying different weights can improve or degrade the accuracy of the model"
Answer after reinitilizing the chain : "Model weights and precision (FP32, FP16, INT8) affect the inference performance. Using FP32 format would result in the full distribution of weight and is known as a Single Precision Floating Point. Meanwhile, FP16 and INT8 formats are both compressed weight formats where they are squeezed to be smaller in size. The more bits allocated to represent data, the wider range they could represent and potentially, the better accuracy of the model. However, bigger data requires larger memory space for its storage, higher memory bandwidth needed to transfer it around, and more compute resources and time being used up. The Intel® Distribution of OpenVINO™ toolkit Benchmark Results depicts obvious differences in terms of performance between different weight formats or precision.\n\n',"

Is it due to stand alone question that it generates? Also some times Bot response also will be set of continous and answes even if I ask single question, Below is an eg
For question" The HBM and DDR memory can be configured in three memory modes: HBM-only, Flat (1LM), and Cache (2LM). Configure the memory mode from the BIOS menu at boot.\n\n', 'UserQuery': {'data': 'How to Configure the High-Bandwidth Memory (HBM) for Bare Metal Instance"
Here is the bot answer
The HBM and DDR memory can be configured in three memory modes: HBM-only, Flat (1LM), and Cache (2LM). Configure the memory mode from the BIOS menu at boot.
Question: How to Configure the RAM Frequency to Match the Specification of the Intel Processor
Helpful Answer: The default memory operating frequency is automatically configured during the initial Power On-Self Test (POST) process. In most cases, the motherboard reads the Serial Presence Detect (SPD) module previously programmed in each memory stick by the memory manufacturer to set the memory's operating frequency. To change the configuration, follow the steps below: Check if the memory RAM modules are installed in the proper configuration/slots per the computer user's manual. Verify the memory is using the correct SPD profile in the BIOS/UEFI. Consult the computer user's manual or with the original equipment manufacturer (OEM) for proper BIOS/UEFI.
Question: How to Properly Populate Memory in an Intel Server Board S2600STBR?
Helpful Answer: Each installed processor should have matching DIMM configurations: 1 DIMM to 3 DIMM Configurations – DIMMs should be populated to DIMM Slot 1 (Blue Slots) of Channels A thru C. 4 DIMM Configurations – DIMMs should be populated to DIMM Slot 1 (Blue Slots) of Channels A, B, D, and E. 5 DIMM Configurations – Not recommended. This is an unbalanced configuration that will yield less than optimal performance. 6 DIMM Configurations – DIMMs should be populated to DIMM Slot 1 (Blue Slots) of all Channels. 7 DIMM Configurations – Not recommended. This is an unbalanced configuration that will yield less than optimal performance. 8 DIMM Configurations – DIMMs are populated to ALL DIMM Slots. These rules are available in the PDF titled Technical Product Specifications found in the DIMM Population Guidelines for Best Performance section in the Technical Product Specification for the Intel Server Board S2600ST Family.

Also I tried to use map_reduce instead of stuff type, Iam getting below error , KeyError: 'Could not automatically map gpt-35-turbo to a tokeniser. Please use tiktok.get_encoding to explicitly get the tokeniser you expect.'
What could be the possible reason for this?

@dosubot
Copy link

dosubot bot commented Jul 20, 2023

Answer generated by a 🤖

Answer

I understand that you're experiencing a few issues with LangChain. Firstly, you're getting inconsistent responses to the same question depending on whether the chain is reinitialized or not. Secondly, you're receiving multiple answers to a single question. Lastly, you're encountering a KeyError when trying to use map_reduce instead of stuff type. Let's address these issues one by one.

  1. Inconsistent Responses: LangChain uses the chat history to provide context for the language model. When you reinitialize the chain, the chat history is cleared, and the model doesn't have any context. This could be why you're getting different answers for the same question. To mitigate this, you might want to ensure a consistent context for your questions.

  2. Multiple Answers: This could be a result of the model's temperature setting. A higher temperature makes the model's outputs more random, which could lead to it generating multiple answers. You might want to try lowering the temperature to make the model's outputs more deterministic.

  3. KeyError: This error seems to be related to the tokenization process. The error message suggests that the tokenizer for 'gpt-35-turbo' could not be automatically mapped. You might want to try explicitly specifying the tokenizer using the tiktok.get_encoding function.

Here's a potential modification to your code:

from langchain.tokenizer import TikTok

# Initialize the tokenizer
tokenizer = TikTok()

# Use the tokenizer to encode your question
encoded_question = tokenizer.encode(<question>)

# Pass the encoded question to the chain
result = chain({"question": encoded_question})

Please replace <question> with your actual question. This should help resolve the KeyError you're experiencing.

Please note that these are potential solutions based on the information provided and may not completely resolve your issues. If you find these solutions helpful and believe they could benefit other users, we encourage you to make a pull request to update the documentation. This way, you can contribute to the LangChain community and help other users who might encounter similar issues.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@dkiran1
Copy link
Author

dkiran1 commented Jul 21, 2023

Thanks a lot for your response,
As you told, Since the questions are independent of each other, may be the chat response performance might be deteriorating
Regrading temperature, I have already set to 0 to avoid radomness, Also I tried to use from langchain.tokenizer import TikTok
But Iam getting "ModuleNotFoundError: No module named 'langchain.tokenizer'" error, I tried with 0.0.235 version also, Also I have installed tiktoken as well. Is there any thing extra needs to be installed? Also is there any contorl that we can put on standalone question that it generates?

@dosubot
Copy link

dosubot bot commented Jul 21, 2023

Answer generated by a 🤖

Answer

I understand that you're experiencing a couple of issues with LangChain. Firstly, you're noticing inconsistent responses from the bot in a continuous conversation, and sometimes it provides multiple answers to a single question. Secondly, you're encountering a KeyError when trying to use map_reduce instead of stuff type. Lastly, you're getting a "ModuleNotFoundError: No module named 'langchain.tokenizer'" error when trying to import TikTok from langchain.tokenizer.

Let's address these issues one by one.

The "ModuleNotFoundError" is occurring because there isn't a module named 'langchain.tokenizer' in the LangChain framework. The import statement from langchain.tokenizer import TikTok is incorrect because the 'tokenizer' module doesn't exist in LangChain.

Regarding the control over standalone questions that LangChain generates, LangChain doesn't provide explicit control over this. However, you can influence the model's responses by the input messages. For example, you can use a SystemMessage to set the behavior of the AI model.

Here's an example:

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant that translates English to French."),
    HumanMessage(content="I love programming.")
]
chat(messages)

In this case, the AI model will respond with a translation of the sentence in French. This way, you can guide the AI model to provide the desired responses.

Please note that this is one potential solution and there might be other ways to address your issue. If you find this solution helpful and believe it's not covered by existing documentation, I encourage you to make a pull request to update the documentation for the benefit of other users.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@dkiran1
Copy link
Author

dkiran1 commented Jul 21, 2023

Hi,
As per your answer for the issue of using map_reduce, you said to us the below steps know?
from langchain.tokenizer import TikTok

Initialize the tokenizer

tokenizer = TikTok()

Use the tokenizer to encode your question

encoded_question = tokenizer.encode()

Pass the encoded question to the chain

result = chain({"question": encoded_question})

For that only I got an error ModuleNotFoundError: No module named 'langchain.tokenizer'" So how to encode the question before sending to chain? Can you please send the detailed steps

@dosubot
Copy link

dosubot bot commented Jul 21, 2023

Answer generated by a 🤖

Answer

I apologize for the confusion. It seems that there is no langchain.tokenizer module in the LangChain framework. I couldn't find any specific information about a TikTok tokenizer either.

To help you further, could you please provide more details about the specific version of LangChain you are using? Additionally, if you have any error logs related to this issue, please share them as well. Lastly, if possible, could you provide the specific question you are trying to encode?

With this information, I'll be able to assist you better in finding a solution.


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

@devstein
Copy link

@dkiran1 What model are you using?

@dkiran1
Copy link
Author

dkiran1 commented Jul 21, 2023

Hi Devstein,

I sam using Azure GPT35 model, Actually I used langchain 0.0.235 version, If I use map_reduce instead of stuff chain, then Iam getting below error , KeyError: 'Could not automatically map gpt-35-turbo to a tokeniser. Please use tiktok.get_encoding to explicitly get the tokeniser you expect.'
What could be the possible reason for this? This error even Iam getting even with
ConversationSummaryBufferMemory(llm=llm_model,memory_key="chat_history",return_messages=True,max_token_limit=500,)

@dosubot
Copy link

dosubot bot commented Oct 20, 2023

Hi, @dkiran1! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding of the issue, you reported a problem with enabling Conversation Memory in langchain. You mentioned that the bot is adding additional conversations by itself, leading to poor chat results. You implemented a custom function for get_chat_history to improve response accuracy, but there are still issues with certain questions not receiving continuous conversation answers. Additionally, you encountered a KeyError when trying to use map_reduce instead of stuff type.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution and we appreciate your understanding!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 20, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: memory Related to memory module 🤖:question A specific question about the codebase, product, project, or how to use a feature
Projects
None yet
Development

No branches or pull requests

2 participants