Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to combine chatbot and question answering over docs? #2185

Closed
derekhsu opened this issue Mar 30, 2023 · 10 comments
Closed

Is there any way to combine chatbot and question answering over docs? #2185

derekhsu opened this issue Mar 30, 2023 · 10 comments

Comments

@derekhsu
Copy link

Hi, I read the docs and examples then tried to make a chatbot and a qustion answering bot over docs. I wonder is there any way to combine there two function together?

From my point of view, I mean basically it's chatbot which uses memory module to carry on conversation with users. If the user asking a question, then the chatbot rertives docs based on embeddings and get the answer.

Then I change the prompt of the conversation and add the answer to it, asking the chatbot response based on the memory and the answer. Will it work? Or there is another conventient way or chain to combine there two types of bots?

@punitvara
Copy link

punitvara commented Mar 30, 2023

I am working to make this two work together. I will update in case if it works.

@sergerdn
Copy link
Contributor

sergerdn commented Mar 30, 2023

@derekhsu

Here's basic plan fo you:

  1. The user submits a question to the Frontend client application.
  2. The question is sent to the Backend server over websockets.
  3. The Backend server normalizes the user's question and uses OpenAI's GPT model to generate a condensed
    version of the question using the LLMChain instance with the CONDENSE_PROMPT prompt.
  4. The server creates a Pinecone index to store embeddings of the text documents and retrieves the most similar documents to the user's condensed question, along with the condensed question itself and the chat history (if possible and available).
  5. If the retrieved documents are not satisfactory, the server sends the condensed question and chat history to the
    LLMChain instance to generate a better version of the user's question with the QA_PROMPT prompt.
  6. The server merges or concatenates multiple responses generated by the GPT model using techniques like summarization, fusion, or generation, and sends the response back to the Frontend client application over websockets for display to the user.

I am working to make this two work together. I will update in case if it works.

@punitvara, I have tested it a little bit with a proof of concept, and it works perfectly.

@punitvara
Copy link

No need to use memory @sergerdn ? We just need to use CONDENSE_PROMPT only ? I am trying above task with QA_with_source chain

@sergerdn
Copy link
Contributor

sergerdn commented Mar 30, 2023

No need to use memory @sergerdn ? We just need to use CONDENSE_PROMPT only ? I am trying above task with QA_with_source chain

@punitvara

I didn't use any memory storage because I only created a proof of concept. The user's browser should send the chat history to the backend directly. When user reload their web page, all their chatting history is cleared.

Maybe using 'CONDENSE_PROMPT' is not the best solution because I didn't realise that the chatting history is might be important at this stage.
I did not care about many things, all because of proof of concept.

I used Pinecone exclusively for vector storage, but I didn't recommend it as a long-term storage solution for this use case. Rather, I used it solely for demonstration purposes.

There are several ways to accomplish the same outcome. However, I may not be able to provide extensive consultation as my experience with OpenAI is limited.

Please note that the libraries I'm using have several bugs, so be prepared to encounter them.

@zakkl13
Copy link

zakkl13 commented Mar 30, 2023

I have been working on this as well. I have something which technically works but isn't very good right now. The agent gets kind of wacky, often retrieves good data but doesn't share it or summarizes it too much. In theory this approach should work and you should be able to add arbitrary tools (search etc.)

I don't have a fully working code sample to share yet but essentially:

  1. RetrievalQA chain on top of a vector DB is used as a Tool. Description "useful for when you need to answer questions about your_topic." Note: no memory is added to this chain.
  2. A ZeroShotAgent with chat_history in its prompt and memory added to it. This is created using the AgentExecutor.from_agent_and_tools(

Code (doesn't compile but gives an idea)

llm = OpenAI(temperature=0, model_name='gpt-3.5-turbo')
vectordb = <create your vectordb>
memory = ConversationBufferMemory(memory_key='chat_history')
chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectordb.as_retriever())

tools = [
    Tool(
        name = "Topic QA System",
        func=chain.run,
        description="useful for when you need to answer questions about <Topic>. This could include questions about <specific things related to Topic>, and more. Input should be a fully formed question."
    )
  ]

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools, 
    prefix=prefix, 
    suffix=suffix, 
    input_variables=["input", "chat_history", "agent_scratchpad"]
)

llm_chain = LLMChain(llm=llm, prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_exe = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)

So far I have found gpt3.5-turbo to be more effective than the default.

Cheers and looking forward to other's contributing! This seems like a major use case.

@sergerdn
Copy link
Contributor

Cheers and looking forward to other's contributing! This seems like a major use case.
@zakkl13

Take a look at the world of JavaScript. There are a few solutions available, but I believe they may not be ready to use in production yet, as they are still working as proof of concept.

@derekhsu
Copy link
Author

https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html

Actually, I guess this is the answer that mix chat history and knowledge base. When did this come up?

@tbdnoticer
Copy link

tbdnoticer commented May 7, 2023

Perhaps not the most valuable input to this conversation, but

For non-agent, non-server approach. I think the simple solution is:

  • have chatbot form a chat history memory for whatever topic the user is interested in
  • when the time comes for utilizing the vector db to answer a precise question, extract the chat history memory and feed it into the qa retrieval in the form of a question + chat history context

That's how I'm dealing with this problem at least. I think the chat_vector_db approach that derekHsu posted is probably the closest thing to my simple use case. The problem with that for me is that the chat history is ALWAYS tied to the retrieval mechanism, whereas in my case, the chat history should ALWAYS be tied to the chat sequence portion of the application, and only tied to the retrieval portion when the vectordb is needed.

Anyway, again, maybe not super relevant, but can't find where else to discuss this.

@dosubot
Copy link

dosubot bot commented Sep 21, 2023

Hi, @derekhsu! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were asking for advice on combining a chatbot and a question answering bot that retrieves information from documents based on embeddings. It looks like there have been some discussions and progress made by users "punitvara" and "sergerdn" in making these two bots work together. User "zakkl13" also shared their approach using a RetrievalQA chain and a ZeroShotAgent. Additionally, user "eddiesaltaccount" suggested a simple solution of using the chat history memory for the topic of interest when utilizing the vector DB for answering precise questions.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contributions to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 21, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 28, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 28, 2023
@derekhsu
Copy link
Author

Thanks for your effort. I think this should be enough for my request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants