Combine langchain retrievers #14082

RishiMalhotra920 · 2023-11-30T19:39:09Z

Issue you'd like to raise.

I am using ConversationalRetrievalChain. I have created two vector stores and I want the query from ConversationalRetrievalChain to be compared against both vector stores and results from both vector stores to be used to create the final answer.

So I have decided to create two retrievers

retriever1 = vectorstore1.as_retriever()
retriever2 = vectorstore2.as_retriever()

How can I now override the Retriever class so that when the query is compared against my custom_retriever, it is compared against documents from both retrievers, and documents from both retrievers are used to create the prompt.

Note: i don't want to merge the vectorstores because that messes up the similarity search.

Suggestion:

No response

dosubot · 2023-11-30T19:39:19Z

🤖

Hello @RishiMalhotra920,

Great to see you again! I hope you're having a good day. Let's look into your issue with LangChain.

Based on the information you've provided and the similar issues I found in the LangChain repository, you can create a custom retriever that inherits from the BaseRetriever class and overrides the _get_relevant_documents method. This method should use your existing retrievers to get the documents and then return them. Here is an example:

from langchain.schema.retriever import BaseRetriever, Document
from typing import List
from langchain.callbacks.manager import CallbackManagerForRetrieverRun

class CustomRetriever(BaseRetriever):
    def __init__(self, retrievers):
        self.retrievers = retrievers

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        # Use your existing retrievers to get the documents
        documents = []
        for i, retriever in enumerate(self.retrievers):
            documents.extend(retriever.get_relevant_documents(query, callbacks=run_manager.get_child(f"retriever_{i+1}")))
        
        return documents

You can then use this custom retriever in your ConversationalRetrievalChain like this:

custom_retriever = CustomRetriever([retriever1, retriever2])
chain = ConversationalRetrievalChain(llm=llm, retriever=custom_retriever)

This will ensure that when the query is compared against your custom_retriever, it is compared against documents from both retrievers, and documents from both retrievers are used to create the prompt.

Please note that the get_relevant_documents and aget_relevant_documents methods in the BaseRetriever class are now deprecated and the _get_relevant_documents and _aget_relevant_documents methods should be implemented instead. These methods also take an additional run_manager argument.

You can find more information about creating a custom retriever in this issue.

I hope this helps! If you have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

RishiMalhotra920 · 2023-11-30T20:10:31Z

@dosu-bot

I tried running your code and get this issue:

"CustomRetriever" object has no field "retrievers".

Can you modify your code?

dosubot · 2023-11-30T20:18:18Z

🤖

Based on the error message you're seeing, it seems like the CustomRetriever object is not correctly initializing the retrievers field. In your CustomRetriever class, you're correctly defining the retrievers field in the __init__ method. However, you're not calling the __init__ method of the parent class BaseRetriever, which could be causing the issue.

In Python, when you override the __init__ method in a subclass, the __init__ method of the parent class is not called automatically. You need to explicitly call it using the super() function. Here's how you can modify your CustomRetriever class to call the __init__ method of BaseRetriever:

class CustomRetriever(BaseRetriever):
    def __init__(self, retrievers):
        super().__init__()  # Call the __init__ method of BaseRetriever
        self.retrievers = retrievers

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        # Use your existing retrievers to get the documents
        documents = []
        for i, retriever in enumerate(self.retrievers):
            documents.extend(retriever.get_relevant_documents(query, callbacks=run_manager.get_child(f"retriever_{i+1}")))
        
        return documents

This should fix the error you're seeing. If you're still having issues, please provide more details about how you're using the CustomRetriever class and I'll be happy to help further.

Regarding the similar issues in the LangChain repository, they seem to be related to the MultiRetrievalQAChain and the use of FAISS retrievers, which might not be directly applicable to your case. However, they do provide some insights into how retrievers are used in LangChain. For example, the Issue: Abstract Class Implementation problem in Retrievers suggests that you can directly use the FAISS retrievers when initializing the MultiRetrievalQAChain, without needing to wrap them in a custom class.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

GMartin-dev · 2023-12-02T05:28:35Z

@RishiMalhotra920 you can use a merger retriever, already does this job for you.
https://python.langchain.com/docs/integrations/retrievers/merger_retriever

ssk2706 · 2023-12-05T12:34:03Z

@GMartin-dev when I try to merge two retrievers I get answers from ConversationalRetrivalChain based on only the first retriever present in the list of retrievers defined

Like if I do this-
merger = MergerRetriever(retrievers= [vectordb_ret,db1_ret])

Then LLM is only able to get perfect answers when query is about vectordb_ret
Once I ask question related to db1_ret it doesn't give back correct answer please help

GMartin-dev · 2023-12-05T18:39:21Z

@GMartin-dev when I try to merge two retrievers I get answers from ConversationalRetrivalChain based on only the first retriever present in the list of retrievers defined

Like if I do this- merger = MergerRetriever(retrievers= [vectordb_ret,db1_ret])

Then LLM is only able to get perfect answers when query is about vectordb_ret Once I ask question related to db1_ret it doesn't give back correct answer please help

Could be a lot of stuff happening:

The second retriever is not getting relevant documents correctly.
The model could not be good enough to figure out the answer or the prompt is too complex.
The context could be too long and you have a "lost in the middle" issue.

do you have a link to a run on langsmith to debug/ take a look to the prompt?

ssk2706 · 2023-12-06T09:04:19Z

@GMartin-dev No, I don't have access to langsmith

ssk2706 · 2023-12-06T09:30:56Z

@GMartin-dev I am using ChromaDB vector-store is that what causing the problem??

dosubot · 2024-03-11T16:03:54Z

Hi, @RishiMalhotra920

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you are looking to combine two vector stores in ConversationalRetrievalChain to compare the query against both stores and use results from both to create the final answer. There have been discussions and suggestions around creating a custom retriever and using a merger retriever, with some troubleshooting suggestions provided.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

dosubot bot added Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Nov 30, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 11, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine langchain retrievers #14082

Combine langchain retrievers #14082

RishiMalhotra920 commented Nov 30, 2023

dosubot bot commented Nov 30, 2023 •

edited

RishiMalhotra920 commented Nov 30, 2023

dosubot bot commented Nov 30, 2023

GMartin-dev commented Dec 2, 2023 •

edited

ssk2706 commented Dec 5, 2023

GMartin-dev commented Dec 5, 2023

ssk2706 commented Dec 6, 2023

ssk2706 commented Dec 6, 2023

dosubot bot commented Mar 11, 2024

Combine langchain retrievers #14082

Combine langchain retrievers #14082

Comments

RishiMalhotra920 commented Nov 30, 2023

Issue you'd like to raise.

Suggestion:

dosubot bot commented Nov 30, 2023 • edited

Sources

RishiMalhotra920 commented Nov 30, 2023

dosubot bot commented Nov 30, 2023

Sources

GMartin-dev commented Dec 2, 2023 • edited

ssk2706 commented Dec 5, 2023

GMartin-dev commented Dec 5, 2023

ssk2706 commented Dec 6, 2023

ssk2706 commented Dec 6, 2023

dosubot bot commented Mar 11, 2024

dosubot bot commented Nov 30, 2023 •

edited

GMartin-dev commented Dec 2, 2023 •

edited