Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiQueryRetriever is failing #17352

Closed
4 tasks done
jswift24 opened this issue Feb 9, 2024 · 13 comments · Fixed by #17434
Closed
4 tasks done

MultiQueryRetriever is failing #17352

jswift24 opened this issue Feb 9, 2024 · 13 comments · Fixed by #17434
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations Ɑ: retriever Related to retriever module

Comments

@jswift24
Copy link

jswift24 commented Feb 9, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.

Example Code

import os, dotenv, openai
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers import MultiQueryRetriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

#API Key
dotenv.load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

#Load and split docs
documents = WebBaseLoader("https://en.wikipedia.org/wiki/New_York_City").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 50)
documents = text_splitter.split_documents(documents)
vector_store = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vector_store.as_retriever()

#MultiQueryRetriever
primary_qa_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
advanced_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=primary_qa_llm)
print(advanced_retriever.get_relevant_documents("Where is nyc?"))

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\pydantic\v1\main.py", line 522, in parse_obj
    obj = dict(obj)
          ^^^^^^^^^
TypeError: 'int' object is not iterable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\output_parsers\pydantic.py", line 25, in parse_result
    return self.pydantic_object.parse_obj(json_object)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\pydantic\v1\main.py", line 525, in parse_obj
    raise ValidationError([ErrorWrapper(exc, loc=ROOT_KEY)], cls) from e
pydantic.v1.error_wrappers.ValidationError: 1 validation error for LineList
__root__
  LineList expected dict not int (type=type_error)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\Documents-Alon\MapleRAG\ragas-tutorial\ragas-debug.py", line 28, in <module>
    print(advanced_retriever.get_relevant_documents("Who are you?"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain_core\retrievers.py", line 224, in get_relevant_documents
    raise e
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain_core\retrievers.py", line 217, in get_relevant_documents
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\retrievers\multi_query.py", line 172, in _get_relevant_documents
    queries = self.generate_queries(query, run_manager)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\retrievers\multi_query.py", line 189, in generate_queries
    response = self.llm_chain(
               ^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\base.py", line 363, in __call__
    return self.invoke(
           ^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\base.py", line 162, in invoke
    raise e
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\base.py", line 156, in invoke
    self._call(inputs, run_manager=run_manager)
    return self.create_outputs(response)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\llm.py", line 258, in create_outputs
             ^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\llm.py", line 261, in <listcomp>
    self.output_key: self.output_parser.parse_result(generation),
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\output_parsers\pydantic.py", line 29, in parse_result
    raise OutputParserException(msg, llm_output=json_object)
langchain_core.exceptions.OutputParserException: Failed to parse LineList from completion 1. Got: 1 validation error for LineList
__root__
  LineList expected dict not int (type=type_error)

Description

Trying to use MultiQueryRetriever and getting an error.
The base retriever works. The primary_qa_llm works too.
Using windows.

System Info

> pip list | findstr /i "langchain"
langchain                 0.1.6
langchain-community       0.0.19
langchain-core            0.1.22
langchain-openai          0.0.5
langchainhub              0.1.14

Platform: Windows
Python version: 3.11.7

Copy link

dosubot bot commented Feb 9, 2024

🤖

Hey there @jswift24! Good to see you diving into another LangChain adventure. 🚀

Thank you for providing detailed information about the issue you're facing. The error message you're seeing is related to the Pydantic library, which is used in LangChain for data validation and settings management. The error message suggests that the LineList object is expecting a dictionary, but it's receiving an integer instead.

This error might be occurring in the generate_queries method of the MultiQueryRetriever class. This method uses the language model to generate additional queries based on the original user query. The generated queries are then parsed using the parse_result method of the OutputParser class. If the output of the language model is not in the expected format, the parse_result method will raise a ValidationError.

Here's the relevant part of the code from the MultiQueryRetriever class:

def generate_queries(self, query: str, run_manager: CallbackManagerForRetrieverRun) -> List[str]:
    response = self.llm_chain(
        {self.input_key: query},
        run_manager=run_manager,
    )
    return [line.text for line in response[self.output_key]]

And here's the parse_result method from the OutputParser class:

def parse_result(self, json_object: Any) -> Any:
    try:
        return self.pydantic_object.parse_obj(json_object)
    except ValidationError as e:
        msg = f"Failed to parse {self.pydantic_object.__name__} from completion {json_object}. Got: {e}"
        raise OutputParserException(msg, llm_output=json_object)

To resolve this issue, you might need to check the output of the language model to ensure it's in the expected format. If the language model is not returning the expected output, you might need to adjust its parameters or use a different model.

If you're still facing the issue after checking the language model's output, it would be helpful if you could provide more information about the language model you're using and its output. This will help us understand the issue better and provide a more accurate solution.

For more information about the MultiQueryRetriever class and its methods, you can refer to the LangChain repository. For more information about the OutputParser class and its methods, you can refer to the LangChain repository.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added Ɑ: retriever Related to retriever module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations labels Feb 9, 2024
@jswift24
Copy link
Author

This is not useful. The LM is fine. The problem is with LangChain code.

@GMartin-dev
Copy link
Contributor

I'm having this issue too. This seems like a regression introduced by #16914

It seems that List parser used in the MultiRetriever chain that extends PydanticOutputParser now depends on JsonOutputParser, that fails to parse generation coming back from the LLM.

@plaban1981
Copy link

Even I am facing the the same issue

from langchain.prompts import PromptTemplate
prompt_template = """You are an AI language model assistant.

Your task is to generate 3 different versions of the given user question to retrieve relevant documents from a vector database.

By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations  of distance-based similarity search.

Provide these alternative questions separated by newlines.

Original question: {question}"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["question"]
)

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=openai_llm, prompt=PROMPT
)

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in parse_obj(cls, obj)
521 try:
--> 522 obj = dict(obj)
523 except (TypeError, ValueError) as e:

TypeError: 'int' object is not iterable

The above exception was the direct cause of the following exception:

ValidationError Traceback (most recent call last)
14 frames
ValidationError: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)

During handling of the above exception, another exception occurred:

OutputParserException Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/langchain/output_parsers/pydantic.py in parse_result(self, result, partial)
27 name = self.pydantic_object.name
28 msg = f"Failed to parse {name} from completion {json_object}. Got: {e}"
---> 29 raise OutputParserException(msg, llm_output=json_object)
30
31 def get_format_instructions(self) -> str:

OutputParserException: Failed to parse LineList from completion 1. Got: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)

@baskaryan
Copy link
Collaborator

just to confirm that #16914 is root cause, can someone confirm that downgrading to langchain-core==0.1.21 fixes issue?

@baskaryan
Copy link
Collaborator

I still get the error with langchain==0.1.5 and langchain-community 0.1.21 so i'm not sure it's #16914

@GMartin-dev
Copy link
Contributor

Not sure if there is another missing piece but langchain==0.1.5 was working for me.

@hafiz031
Copy link

hafiz031 commented Feb 18, 2024

langchain==0.1.5 worked also for me.
Environment: Python 3.12.1; pip 23.3.1; Anaconda.

@ThomaTagashira
Copy link

I am having the same issue as well:

TypeError Traceback (most recent call last)
File c:\Users\TMOCo\Project_AI.venv\Lib\site-packages\pydantic\v1\main.py:522, in BaseModel.parse_obj(cls, obj)
521 try:
--> 522 obj = dict(obj)
523 except (TypeError, ValueError) as e:

TypeError: 'int' object is not iterable

The above exception was the direct cause of the following exception:

ValidationError Traceback (most recent call last)
File ~\Project_AI\langchain\libs\langchain\langchain\output_parsers\pydantic.py:25, in PydanticOutputParser.parse_result(self, result, partial)
24 try:
---> 25 return self.pydantic_object.parse_obj(json_object)
26 except ValidationError as e:

File c:\Users\TMOCo\Project_AI.venv\Lib\site-packages\pydantic\v1\main.py:525, in BaseModel.parse_obj(cls, obj)
524 exc = TypeError(f'{cls.name} expected dict not {obj.class.name}')
--> 525 raise ValidationError([ErrorWrapper(exc, loc=ROOT_KEY)], cls) from e
526 return cls(**obj)

ValidationError: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)
...
---> 29 raise OutputParserException(msg, llm_output=json_object)

OutputParserException: Failed to parse LineList from completion 1. Got: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)

I tried downgrading langchain to 0.1.5 and cloned the repo linked above with no success.

@israelViner
Copy link

Downgrading the version of langchain to 0.1.5 solved the problem for me as well.

@baskaryan
Copy link
Collaborator

this should be fixed in langchain>=0.1.7

haydeniw pushed a commit to haydeniw/langchain that referenced this issue Feb 27, 2024
@elfailali
Copy link

elfailali commented Apr 27, 2024

I think that the problem is with the LangChain code.
If your main objective for using this class is to edit the default prompt:

QUERY_PROMPT = "your customized prompt here"
retriever_from_llm = MultiQueryRetriever.from_llm( retriever=vector_db.as_retriever(), llm=generation_model, prompt=QUERY_PROMPT )

@nathnx
Copy link

nathnx commented May 13, 2024

I think that the problem is with the LangChain code. If your main objective for using this class is to edit the default prompt:

QUERY_PROMPT = "your customized prompt here" retriever_from_llm = MultiQueryRetriever.from_llm( retriever=vector_db.as_retriever(), llm=generation_model, prompt=QUERY_PROMPT )

This suggestion helps. I had to use MultiQueryRetriever.from_llm() and pass my own prompt via prompt parameter. However, there's no output parser like it is recommended in this article where the chain being created as shown below:

# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)

llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

Here's my prompt.

MULTI_QUERY_PROMPT_mqp = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate {n_question} 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
    partial_variables =  {"n_question": n_question},
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations Ɑ: retriever Related to retriever module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants