Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine langchain retrievers #14082

Closed
RishiMalhotra920 opened this issue Nov 30, 2023 · 9 comments
Closed

Combine langchain retrievers #14082

RishiMalhotra920 opened this issue Nov 30, 2023 · 9 comments
Labels
🤖:improvement Medium size change to existing code to handle new use-cases 🤖:question A specific question about the codebase, product, project, or how to use a feature Ɑ: vector store Related to vector store module

Comments

@RishiMalhotra920
Copy link

Issue you'd like to raise.

I am using ConversationalRetrievalChain. I have created two vector stores and I want the query from ConversationalRetrievalChain to be compared against both vector stores and results from both vector stores to be used to create the final answer.

So I have decided to create two retrievers

retriever1 = vectorstore1.as_retriever()
retriever2 = vectorstore2.as_retriever()

How can I now override the Retriever class so that when the query is compared against my custom_retriever, it is compared against documents from both retrievers, and documents from both retrievers are used to create the prompt.

Note: i don't want to merge the vectorstores because that messes up the similarity search.

Suggestion:

No response

Copy link

dosubot bot commented Nov 30, 2023

🤖

Hello @RishiMalhotra920,

Great to see you again! I hope you're having a good day. Let's look into your issue with LangChain.

Based on the information you've provided and the similar issues I found in the LangChain repository, you can create a custom retriever that inherits from the BaseRetriever class and overrides the _get_relevant_documents method. This method should use your existing retrievers to get the documents and then return them. Here is an example:

from langchain.schema.retriever import BaseRetriever, Document
from typing import List
from langchain.callbacks.manager import CallbackManagerForRetrieverRun

class CustomRetriever(BaseRetriever):
    def __init__(self, retrievers):
        self.retrievers = retrievers

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        # Use your existing retrievers to get the documents
        documents = []
        for i, retriever in enumerate(self.retrievers):
            documents.extend(retriever.get_relevant_documents(query, callbacks=run_manager.get_child(f"retriever_{i+1}")))
        
        return documents

You can then use this custom retriever in your ConversationalRetrievalChain like this:

custom_retriever = CustomRetriever([retriever1, retriever2])
chain = ConversationalRetrievalChain(llm=llm, retriever=custom_retriever)

This will ensure that when the query is compared against your custom_retriever, it is compared against documents from both retrievers, and documents from both retrievers are used to create the prompt.

Please note that the get_relevant_documents and aget_relevant_documents methods in the BaseRetriever class are now deprecated and the _get_relevant_documents and _aget_relevant_documents methods should be implemented instead. These methods also take an additional run_manager argument.

You can find more information about creating a custom retriever in this issue.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Nov 30, 2023
@RishiMalhotra920
Copy link
Author

@dosu-bot

I tried running your code and get this issue:

"CustomRetriever" object has no field "retrievers".

Can you modify your code?

Copy link

dosubot bot commented Nov 30, 2023

🤖

Based on the error message you're seeing, it seems like the CustomRetriever object is not correctly initializing the retrievers field. In your CustomRetriever class, you're correctly defining the retrievers field in the __init__ method. However, you're not calling the __init__ method of the parent class BaseRetriever, which could be causing the issue.

In Python, when you override the __init__ method in a subclass, the __init__ method of the parent class is not called automatically. You need to explicitly call it using the super() function. Here's how you can modify your CustomRetriever class to call the __init__ method of BaseRetriever:

class CustomRetriever(BaseRetriever):
    def __init__(self, retrievers):
        super().__init__()  # Call the __init__ method of BaseRetriever
        self.retrievers = retrievers

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        # Use your existing retrievers to get the documents
        documents = []
        for i, retriever in enumerate(self.retrievers):
            documents.extend(retriever.get_relevant_documents(query, callbacks=run_manager.get_child(f"retriever_{i+1}")))
        
        return documents

This should fix the error you're seeing. If you're still having issues, please provide more details about how you're using the CustomRetriever class and I'll be happy to help further.

Regarding the similar issues in the LangChain repository, they seem to be related to the MultiRetrievalQAChain and the use of FAISS retrievers, which might not be directly applicable to your case. However, they do provide some insights into how retrievers are used in LangChain. For example, the Issue: Abstract Class Implementation problem in Retrievers suggests that you can directly use the FAISS retrievers when initializing the MultiRetrievalQAChain, without needing to wrap them in a custom class.

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@GMartin-dev
Copy link
Contributor

GMartin-dev commented Dec 2, 2023

@RishiMalhotra920 you can use a merger retriever, already does this job for you.
https://python.langchain.com/docs/integrations/retrievers/merger_retriever

@ssk2706
Copy link

ssk2706 commented Dec 5, 2023

@GMartin-dev when I try to merge two retrievers I get answers from ConversationalRetrivalChain based on only the first retriever present in the list of retrievers defined

Like if I do this-
merger = MergerRetriever(retrievers= [vectordb_ret,db1_ret])

Then LLM is only able to get perfect answers when query is about vectordb_ret
Once I ask question related to db1_ret it doesn't give back correct answer please help

@GMartin-dev
Copy link
Contributor

@GMartin-dev when I try to merge two retrievers I get answers from ConversationalRetrivalChain based on only the first retriever present in the list of retrievers defined

Like if I do this- merger = MergerRetriever(retrievers= [vectordb_ret,db1_ret])

Then LLM is only able to get perfect answers when query is about vectordb_ret Once I ask question related to db1_ret it doesn't give back correct answer please help

Could be a lot of stuff happening:

  • The second retriever is not getting relevant documents correctly.
  • The model could not be good enough to figure out the answer or the prompt is too complex.
  • The context could be too long and you have a "lost in the middle" issue.

do you have a link to a run on langsmith to debug/ take a look to the prompt?

@ssk2706
Copy link

ssk2706 commented Dec 6, 2023

@GMartin-dev No, I don't have access to langsmith

@ssk2706
Copy link

ssk2706 commented Dec 6, 2023

@GMartin-dev I am using ChromaDB vector-store is that what causing the problem??

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 11, 2024
Copy link

dosubot bot commented Mar 11, 2024

Hi, @RishiMalhotra920

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you are looking to combine two vector stores in ConversationalRetrievalChain to compare the query against both stores and use results from both to create the final answer. There have been discussions and suggestions around creating a custom retriever and using a merger retriever, with some troubleshooting suggestions provided.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases 🤖:question A specific question about the codebase, product, project, or how to use a feature Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

3 participants