MultiQueryRetriever is failing #17352

jswift24 · 2024-02-09T23:57:05Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.

Example Code

import os, dotenv, openai
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers import MultiQueryRetriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

#API Key
dotenv.load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

#Load and split docs
documents = WebBaseLoader("https://en.wikipedia.org/wiki/New_York_City").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 50)
documents = text_splitter.split_documents(documents)
vector_store = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vector_store.as_retriever()

#MultiQueryRetriever
primary_qa_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
advanced_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=primary_qa_llm)
print(advanced_retriever.get_relevant_documents("Where is nyc?"))

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\pydantic\v1\main.py", line 522, in parse_obj
    obj = dict(obj)
          ^^^^^^^^^
TypeError: 'int' object is not iterable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\output_parsers\pydantic.py", line 25, in parse_result
    return self.pydantic_object.parse_obj(json_object)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\pydantic\v1\main.py", line 525, in parse_obj
    raise ValidationError([ErrorWrapper(exc, loc=ROOT_KEY)], cls) from e
pydantic.v1.error_wrappers.ValidationError: 1 validation error for LineList
__root__
  LineList expected dict not int (type=type_error)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\Documents-Alon\MapleRAG\ragas-tutorial\ragas-debug.py", line 28, in <module>
    print(advanced_retriever.get_relevant_documents("Who are you?"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain_core\retrievers.py", line 224, in get_relevant_documents
    raise e
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain_core\retrievers.py", line 217, in get_relevant_documents
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\retrievers\multi_query.py", line 172, in _get_relevant_documents
    queries = self.generate_queries(query, run_manager)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\retrievers\multi_query.py", line 189, in generate_queries
    response = self.llm_chain(
               ^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain_core\_api\deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\base.py", line 363, in __call__
    return self.invoke(
           ^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\base.py", line 162, in invoke
    raise e
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\base.py", line 156, in invoke
    self._call(inputs, run_manager=run_manager)
    return self.create_outputs(response)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\llm.py", line 258, in create_outputs
             ^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\chains\llm.py", line 261, in <listcomp>
    self.output_key: self.output_parser.parse_result(generation),
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\admin\.conda\envs\qdrant\Lib\site-packages\langchain\output_parsers\pydantic.py", line 29, in parse_result
    raise OutputParserException(msg, llm_output=json_object)
langchain_core.exceptions.OutputParserException: Failed to parse LineList from completion 1. Got: 1 validation error for LineList
__root__
  LineList expected dict not int (type=type_error)

Description

Trying to use MultiQueryRetriever and getting an error.
The base retriever works. The primary_qa_llm works too.
Using windows.

System Info

> pip list | findstr /i "langchain"
langchain                 0.1.6
langchain-community       0.0.19
langchain-core            0.1.22
langchain-openai          0.0.5
langchainhub              0.1.14

Platform: Windows
Python version: 3.11.7

The text was updated successfully, but these errors were encountered:

dosubot · 2024-02-09T23:57:24Z

🤖

Hey there @jswift24! Good to see you diving into another LangChain adventure. 🚀

Thank you for providing detailed information about the issue you're facing. The error message you're seeing is related to the Pydantic library, which is used in LangChain for data validation and settings management. The error message suggests that the LineList object is expecting a dictionary, but it's receiving an integer instead.

This error might be occurring in the generate_queries method of the MultiQueryRetriever class. This method uses the language model to generate additional queries based on the original user query. The generated queries are then parsed using the parse_result method of the OutputParser class. If the output of the language model is not in the expected format, the parse_result method will raise a ValidationError.

Here's the relevant part of the code from the MultiQueryRetriever class:

def generate_queries(self, query: str, run_manager: CallbackManagerForRetrieverRun) -> List[str]:
    response = self.llm_chain(
        {self.input_key: query},
        run_manager=run_manager,
    )
    return [line.text for line in response[self.output_key]]

And here's the parse_result method from the OutputParser class:

def parse_result(self, json_object: Any) -> Any:
    try:
        return self.pydantic_object.parse_obj(json_object)
    except ValidationError as e:
        msg = f"Failed to parse {self.pydantic_object.__name__} from completion {json_object}. Got: {e}"
        raise OutputParserException(msg, llm_output=json_object)

To resolve this issue, you might need to check the output of the language model to ensure it's in the expected format. If the language model is not returning the expected output, you might need to adjust its parameters or use a different model.

If you're still facing the issue after checking the language model's output, it would be helpful if you could provide more information about the language model you're using and its output. This will help us understand the issue better and provide a more accurate solution.

For more information about the MultiQueryRetriever class and its methods, you can refer to the LangChain repository. For more information about the OutputParser class and its methods, you can refer to the LangChain repository.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jswift24 · 2024-02-10T00:30:52Z

This is not useful. The LM is fine. The problem is with LangChain code.

GMartin-dev · 2024-02-10T02:38:49Z

I'm having this issue too. This seems like a regression introduced by #16914

It seems that List parser used in the MultiRetriever chain that extends PydanticOutputParser now depends on JsonOutputParser, that fails to parse generation coming back from the LLM.

plaban1981 · 2024-02-11T14:58:08Z

Even I am facing the the same issue

from langchain.prompts import PromptTemplate
prompt_template = """You are an AI language model assistant.

Your task is to generate 3 different versions of the given user question to retrieve relevant documents from a vector database.

By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations  of distance-based similarity search.

Provide these alternative questions separated by newlines.

Original question: {question}"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["question"]
)

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=openai_llm, prompt=PROMPT
)

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in parse_obj(cls, obj)
521 try:
--> 522 obj = dict(obj)
523 except (TypeError, ValueError) as e:

TypeError: 'int' object is not iterable

The above exception was the direct cause of the following exception:

ValidationError Traceback (most recent call last)
14 frames
ValidationError: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)

During handling of the above exception, another exception occurred:

OutputParserException Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/langchain/output_parsers/pydantic.py in parse_result(self, result, partial)
27 name = self.pydantic_object.name
28 msg = f"Failed to parse {name} from completion {json_object}. Got: {e}"
---> 29 raise OutputParserException(msg, llm_output=json_object)
30
31 def get_format_instructions(self) -> str:

OutputParserException: Failed to parse LineList from completion 1. Got: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)

baskaryan · 2024-02-12T22:05:54Z

just to confirm that #16914 is root cause, can someone confirm that downgrading to langchain-core==0.1.21 fixes issue?

baskaryan · 2024-02-12T22:17:05Z

I still get the error with langchain==0.1.5 and langchain-community 0.1.21 so i'm not sure it's #16914

GMartin-dev · 2024-02-13T06:00:59Z

Not sure if there is another missing piece but langchain==0.1.5 was working for me.

#17434) …search retriever Fixes #17352

langchain-ai#17434) …search retriever Fixes langchain-ai#17352

hafiz031 · 2024-02-18T08:28:26Z

langchain==0.1.5 worked also for me.
Environment: Python 3.12.1; pip 23.3.1; Anaconda.

ThomaTagashira · 2024-02-20T03:11:01Z

I am having the same issue as well:

TypeError Traceback (most recent call last)
File c:\Users\TMOCo\Project_AI.venv\Lib\site-packages\pydantic\v1\main.py:522, in BaseModel.parse_obj(cls, obj)
521 try:
--> 522 obj = dict(obj)
523 except (TypeError, ValueError) as e:

TypeError: 'int' object is not iterable

The above exception was the direct cause of the following exception:

ValidationError Traceback (most recent call last)
File ~\Project_AI\langchain\libs\langchain\langchain\output_parsers\pydantic.py:25, in PydanticOutputParser.parse_result(self, result, partial)
24 try:
---> 25 return self.pydantic_object.parse_obj(json_object)
26 except ValidationError as e:

File c:\Users\TMOCo\Project_AI.venv\Lib\site-packages\pydantic\v1\main.py:525, in BaseModel.parse_obj(cls, obj)
524 exc = TypeError(f'{cls.name} expected dict not {obj.class.name}')
--> 525 raise ValidationError([ErrorWrapper(exc, loc=ROOT_KEY)], cls) from e
526 return cls(**obj)

ValidationError: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)
...
---> 29 raise OutputParserException(msg, llm_output=json_object)

OutputParserException: Failed to parse LineList from completion 1. Got: 1 validation error for LineList
root
LineList expected dict not int (type=type_error)

I tried downgrading langchain to 0.1.5 and cloned the repo linked above with no success.

israelViner · 2024-02-24T17:31:48Z

Downgrading the version of langchain to 0.1.5 solved the problem for me as well.

baskaryan · 2024-02-26T02:40:15Z

this should be fixed in langchain>=0.1.7

langchain-ai#17434) …search retriever Fixes langchain-ai#17352

elfailali · 2024-04-27T17:34:19Z

I think that the problem is with the LangChain code.
If your main objective for using this class is to edit the default prompt:

QUERY_PROMPT = "your customized prompt here"
retriever_from_llm = MultiQueryRetriever.from_llm( retriever=vector_db.as_retriever(), llm=generation_model, prompt=QUERY_PROMPT )

nathnx · 2024-05-13T10:04:16Z

I think that the problem is with the LangChain code. If your main objective for using this class is to edit the default prompt:

QUERY_PROMPT = "your customized prompt here" retriever_from_llm = MultiQueryRetriever.from_llm( retriever=vector_db.as_retriever(), llm=generation_model, prompt=QUERY_PROMPT )

This suggestion helps. I had to use MultiQueryRetriever.from_llm() and pass my own prompt via prompt parameter. However, there's no output parser like it is recommended in this article where the chain being created as shown below:

# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)

llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

Here's my prompt.

MULTI_QUERY_PROMPT_mqp = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate {n_question} 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
    partial_variables =  {"n_question": n_question},
)

dosubot bot added Ɑ: retriever Related to retriever module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations labels Feb 9, 2024

baskaryan mentioned this issue Feb 12, 2024

langchain[patch], templates[patch]: fix multi query retriever, web re… #17434

Merged

baskaryan closed this as completed in #17434 Feb 13, 2024

baskaryan added a commit that referenced this issue Feb 13, 2024

langchain[patch], templates[patch]: fix multi query retriever, web re… (

3925071

#17434) …search retriever Fixes #17352

snsten pushed a commit to snsten/langchain that referenced this issue Feb 15, 2024

langchain[patch], templates[patch]: fix multi query retriever, web re… (

ed9e50a

langchain-ai#17434) …search retriever Fixes langchain-ai#17352

haydeniw pushed a commit to haydeniw/langchain that referenced this issue Feb 27, 2024

langchain[patch], templates[patch]: fix multi query retriever, web re… (

f9a7e81

langchain-ai#17434) …search retriever Fixes langchain-ai#17352

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiQueryRetriever is failing #17352

MultiQueryRetriever is failing #17352

jswift24 commented Feb 9, 2024

dosubot bot commented Feb 9, 2024 •

edited

Loading

About Dosu

jswift24 commented Feb 10, 2024

GMartin-dev commented Feb 10, 2024

plaban1981 commented Feb 11, 2024

baskaryan commented Feb 12, 2024

baskaryan commented Feb 12, 2024

GMartin-dev commented Feb 13, 2024

hafiz031 commented Feb 18, 2024 •

edited

Loading

ThomaTagashira commented Feb 20, 2024

israelViner commented Feb 24, 2024

baskaryan commented Feb 26, 2024

elfailali commented Apr 27, 2024 •

edited

Loading

nathnx commented May 13, 2024 •

edited

Loading

MultiQueryRetriever is failing #17352

MultiQueryRetriever is failing #17352

Comments

jswift24 commented Feb 9, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

dosubot bot commented Feb 9, 2024 • edited Loading

Sources

About Dosu

jswift24 commented Feb 10, 2024

GMartin-dev commented Feb 10, 2024

plaban1981 commented Feb 11, 2024

baskaryan commented Feb 12, 2024

baskaryan commented Feb 12, 2024

GMartin-dev commented Feb 13, 2024

hafiz031 commented Feb 18, 2024 • edited Loading

ThomaTagashira commented Feb 20, 2024

israelViner commented Feb 24, 2024

baskaryan commented Feb 26, 2024

elfailali commented Apr 27, 2024 • edited Loading

nathnx commented May 13, 2024 • edited Loading

dosubot bot commented Feb 9, 2024 •

edited

Loading

hafiz031 commented Feb 18, 2024 •

edited

Loading

elfailali commented Apr 27, 2024 •

edited

Loading

nathnx commented May 13, 2024 •

edited

Loading