Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: trying to call generate_queries() on a MultiQueryRetriever but where do I get a run_manager from? #9231

Closed
maspotts opened this issue Aug 14, 2023 · 5 comments
Labels
Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature

Comments

@maspotts
Copy link
Contributor

maspotts commented Aug 14, 2023

Issue you'd like to raise.

I'm trying to use MultiQueryRetriever to generate variations on a question: it seems to work, but I can't use it inside a chain created with load_qa_with_sources_chain() because that generates a chain that expects a list of input_documents, rather than a retriever, and I don't want to use RetrievalQAWithSourcesChain instead of load_qa_with_sources_chain() because I want to continue implementing my own similarity search (eg. I'm supporting switches to select using index.similarity_search() vs. index.max_marginal_relevance_search() on my index, and to specify the number of matches (k)), so I figured I could just call generate_queries() on my MultiQueryRetriever instance and then manually run my (load_qa_with_sources_chain()) chain for each variation. However, that method requires a run_manager, and I can't figure out how to create one. I already have the load_qa_with_sources_chain() chain: can I get a run_manager from that? Or more generally, what's the best way to use MultiQueryRetriever whilst maintaining one's own code for fetching matching text snippets?

Suggestion:

No response

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Aug 14, 2023
@dosubot
Copy link

dosubot bot commented Aug 14, 2023

🤖

Hello,

To create a run_manager for the generate_queries() method in a MultiQueryRetriever instance, you need to create an instance of CallbackManagerForRetrieverRun. This class is used to manage callbacks during the execution of the retriever run. Here is an example of how you can do this:

from langchain.callbacks.manager import CallbackManagerForRetrieverRun
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.llms.base import BaseLLM
from langchain.retrievers.base import BaseRetriever

# Assuming you have instances of BaseLLM and BaseRetriever
llm = BaseLLM()
retriever = BaseRetriever()

# Create an instance of MultiQueryRetriever
multi_query_retriever = MultiQueryRetriever.from_llm(retriever, llm)

# Create an instance of CallbackManagerForRetrieverRun
run_manager = CallbackManagerForRetrieverRun()

# Use the run_manager in generate_queries method
queries = multi_query_retriever.generate_queries("your question", run_manager)

For fetching matching text snippets and using load_qa_with_sources_chain(), you would need to implement your own code. The provided context does not include any information about load_qa_with_sources_chain(), so I can't provide a specific example of how to use it. However, you would typically use it in a similar way to generate_queries(), passing in the necessary arguments and using the returned results in your application.

If you want to use a custom prompt and output parser, you can create an instance of LLMChain with your custom prompt and output parser, and then pass this LLMChain instance to the MultiQueryRetriever. Here is an example:

from typing import List
from langchain import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser

# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")

class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)

output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions seperated by newlines.
    Original question: {question}""",
)
llm = ChatOpenAI(temperature=0)

# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

# Other inputs
question = "What are the approaches to Task Decomposition?"

# Run
retriever = MultiQueryRetriever(
    retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output
# Results
unique_docs = retriever.get_relevant_documents(
    query="What does the course say about regression?"
)
len(unique_docs)

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@maspotts
Copy link
Contributor Author

I see that CallbackManagerForRetrieverRun() actually requires 3 arguments: run_id, handlers and inheritable_handlers, so I am still not able to run generate_queries() on my MultiQueryRetriever instance. Does anyone know if I'm barking up the wrong tree here?

@maspotts
Copy link
Contributor Author

maspotts commented Aug 15, 2023

I'm wondering whether I need to define a subclass of MultiQueryRetriever, in which I override _get_relevant_documents() to incorporate my own similarity search code?

Copy link

dosubot bot commented Nov 14, 2023

Hi, @maspotts! I'm Dosu, and I'm here to help the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are trying to use the MultiQueryRetriever in LangChain but are unsure how to obtain a run_manager for the generate_queries() method. I provided an example of how to create a run_manager using CallbackManagerForRetrieverRun. You then asked for clarification on the required arguments for CallbackManagerForRetrieverRun. Additionally, you mentioned considering overriding _get_relevant_documents() in a subclass of MultiQueryRetriever to incorporate your own similarity search code.

Could you please let us know if this issue is still relevant to the latest version of the LangChain repository? If it is, please comment on this issue to let us know. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding, and please don't hesitate to reach out if you have any further questions or concerns!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 14, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 21, 2023
@forviaVision
Copy link

Hello, did this issue get resolved? I am still having the issue for the 3 arguments: run_id, handlers and inheritable_handlers. Thank you :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature
Projects
None yet
Development

No branches or pull requests

2 participants