Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SelfQueryRetriever gives error for some queries #9368

Open
5 of 14 tasks
Zledme opened this issue Aug 17, 2023 · 24 comments
Open
5 of 14 tasks

SelfQueryRetriever gives error for some queries #9368

Zledme opened this issue Aug 17, 2023 · 24 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: models Related to LLMs or chat model modules Ɑ: vector store Related to vector store module

Comments

@Zledme
Copy link

Zledme commented Aug 17, 2023

System Info

Langchain Version: 0.0.245
model: vicuna-13b-v1.5

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)    
metadata_field_info = [
   AttributeInfo(
        name='lesson',
        description="Lesson Number of Book",
        type="integer",
    )
]


document_content_description = "English Books"
retriever = SelfQueryRetriever.from_llm(
        llm, db, document_content_description, metadata_field_info, verbose=True
    )
    # llm_chain.predict(context = context, question=question)
    qa = RetrievalQA.from_chain_type(llm=llm ,chain_type="stuff", retriever=retriever, return_source_documents=True, chain_type_kwargs={"prompt": PROMPT})

res = qa(query)

this is one of the documents
image

 File "/home/roger/Documents/GitHub/RGRgithub/roger/testllm/main.py", line 122, in qanda
    res = qa(query)
          ^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/retrieval_qa/base.py", line 130, in _call
    docs = self._get_docs(question, run_manager=_run_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/retrieval_qa/base.py", line 210, in _get_docs
    return self.retriever.get_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/schema/retriever.py", line 193, in get_relevant_documents
    raise e
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/schema/retriever.py", line 186, in get_relevant_documents
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/retrievers/self_query/base.py", line 100, in _get_relevant_documents
    self.llm_chain.predict_and_parse(
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/llm.py", line 282, in predict_and_parse
    return self.prompt.output_parser.parse(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/query_constructor/base.py", line 52, in parse
    raise OutputParserException(
langchain.schema.output_parser.OutputParserException: Parsing text
```json
{
    "query": "phoneme",
    "filter": "eq(lesson, 1)"
}

 raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 10.
Expected one of: 
	* LPAR
Previous tokens: [Token('CNAME', 'lesson')]

Expected behavior

  • Error caused while using SelfQueryRetriever with RetrieverQA.
  • showing only with some queries
  • I found that in langchain/chains/llm.py, line 282, in predict_and_parse
result = self.predict(callbacks=callbacks, **kwargs)
if self.prompt.output_parser is not None:
    return self.prompt.output_parser.parse(result)
else:
    return result
  • when result is, the error occurs(Q: what is a phoneme)
    '```json\n{\n "query": "phoneme",\n "filter": "eq(lesson, 1)"\n}\n```'

  • it doesn't occur when result is , (Q: What is a phoneme)
    '```json\n{\n "query": "phoneme",\n "filter": "eq(\\"lesson\\", 1)"\n}\n```'

  • I can't change the version right now.

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Aug 17, 2023
@dosubot
Copy link

dosubot bot commented Aug 17, 2023

🤖

Hello,

Thank you for providing detailed information about the issue you're experiencing. It seems like the problem is related to the parsing of the result returned by the predict_and_parse method in the llm.py file.

The error message indicates that the parser is expecting a left parenthesis '(' but instead it encounters a comma ','. This is happening because the parser is trying to parse the string 'eq(lesson, 1)' as a function call with 'lesson' as the function name and '1' as the argument, which is not valid.

The difference between the two results you provided is that in the second one, the string 'lesson' is escaped with double quotes, making it a valid string argument for the 'eq' function. This is why the second result does not raise an error.

As for why this is happening, it's hard to say without more information. It could be a bug in the predict_and_parse method or in the OutputParser class, or it could be an issue with the specific language model you're using.

One thing you could try is to update your LangChain version to the latest one (v0.0.266) and see if the issue persists. There might have been fixes or improvements made in the newer versions that could resolve your issue.

If updating LangChain does not solve the problem, it would be helpful if you could provide more information about the specific queries that are causing the error, as well as the exact version of the language model you're using. This would help in identifying the root cause of the issue.

I hope this helps. Please let me know if you have any other questions or if there's anything else I can do to assist you.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@WaleedAlfaris
Copy link

WaleedAlfaris commented Aug 29, 2023

Having the same issue with langchain v0.0.274 using model Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin. Funny thing is if I rerun the exact same query after it fails with this error, it somehow works.

@ericfeunekes
Copy link

ericfeunekes commented Nov 1, 2023

I'm getting the same thing on langchain==0.0.327

OutputParserException: Parsing text
{
    "query": "legislative amendment",
    "filter": "eq(\"subsection\", \"12\") and eq(\"paragraph\", \"2.02\")"
}
raised following error:
Unexpected token Token('CNAME', 'and') at line 1, column 24.
Expected one of: 
	* $END
Previous tokens: [Token('RPAR', ')')]

@Danouchka
Copy link

Hi @Zledme , what was your Prompt Template ?

@nikitacorp
Copy link

@ericfeunekes @WaleedAlfaris Have you found solution to your problem?

@TheNor01
Copy link

TheNor01 commented Jan 1, 2024

Using v0.0.352
Same error, i am using SelfQueryRetriever with ChromaTranslator

`langchain_core.exceptions.OutputParserException: Parsing text

{
    "query": "chicken",
    "filter": "eq(\"category\", \"food\") and contain(content, \"chicken\")"
}

raised following error:
Unexpected token Token('CNAME', 'and') at line 1, column 24.
Expected one of:
* $END
Previous tokens: [Token('RPAR', ')')]`

@stolto-pirla
Copy link

Hi, on Langchain 0.1.0 having nasty problems too with the SelfQueryRetriever. Essentially the generated query is not correct.

Examples below:
Case 1:
{
"query": "depression",
"filter": "and(gt("Year", 2021), lt("Year", 2023))",
"limit": "NO_LIMIT"
}
Fails as "limit" should be integer, it is explicitely stated in the FewShotPromptTemplate: "Make sure the limit is always an int value. It is an optional parameter so leave it blank if it does not make sense.". However the generated query is wrong, see exception
raised following error:
1 validation error for StructuredQuery
limit
value is not a valid integer (type=type_error.integer)

Case 2:
{
"query": "depression",
"filter": "eq("Year", 2023), eq("Month", 12)",
"limit": 3
}
Fails as the "filter" syntax is not correct it should have an 'and' statement, see exception
raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 17.
Expected one of:
* $END

The issue is the model unable to produce a correct query. Even worse, results are not repeatable, sometimes the returned query is correct. Using ChatGPT 4.0 as LLM.

@Giselasnjota
Copy link

Giselasnjota commented Feb 6, 2024

I have the same issue, someone find a solution? I'm using ChatGpt 3.5. I try to change the prompt but don't work. I'm treating the common problems with specific replaces, but is not the ideal solution.

@vafokroy
Copy link

vafokroy commented Feb 8, 2024

Same here, has anyone found a solution yet?

@stolto-pirla
Copy link

No change and the issue is that the model often fails to generate the correct query. So one would need to redo the prompt, although it is not certain it will ever be 100% reliable.

@p-gonzo
Copy link

p-gonzo commented Feb 17, 2024

Jumping in: same issue here:

Metadata field info:

metadata_field_info = [
    AttributeInfo(
        name="law_name",
        description="The name of the law or piece of legislation",
        type="string",
    ),
    AttributeInfo(
        name="alt_law_name",
        description="The name of the law or piece of legislation",
        type="string",
    ),
]

document_content_description = "The contents of a law or piece of legislation"

Retriever

retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)

Input

retriever.get_relevant_documents("What does the law SB2102 ential")

Output

{
    "query": "SB2102",
    "filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}
---------------------------------------------------------------------------
UnexpectedCharacters                      Traceback (most recent call last)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:665](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=664), in ContextualLexer.lex(self, lexer_state, parser_state)
    664         lexer = self.lexers[parser_state.position]
--> 665         yield lexer.next_token(lexer_state, parser_state)
    666 except EOFError:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:598](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=597), in BasicLexer.next_token(self, lex_state, parser_state)
    597         allowed = {"<END-OF-FILE>"}
--> 598     raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
    599                                allowed=allowed, token_history=lex_state.last_token and [lex_state.last_token],
    600                                state=parser_state, terminals_by_name=self.terminals_by_name)
    602 value, type_ = res

UnexpectedCharacters: No terminal matches 'o' in the current parser context, at line 1 col 26

eq("law_name", "SB2102") or eq("alt_law_name", "SB2102")
                         ^
Expected one of: 
	* RPAR
	* RSQB
	* COMMA

Previous tokens: Token('RPAR', ')')


During handling of the above exception, another exception occurred:

UnexpectedToken                           Traceback (most recent call last)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py:56](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py#line=55), in StructuredQueryOutputParser.parse(self, text)
     55 else:
---> 56     parsed["filter"] = self.ast_parse(parsed["filter"])
     57 if not parsed.get("limit"):

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lark.py:658](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lark.py#line=657), in Lark.parse(self, text, start, on_error)
    641 """Parse the given text, according to the options provided.
    642 
    643 Parameters:
   (...)
    656 
    657 """
--> 658 return self.parser.parse(text, start=start, on_error=on_error)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parser_frontends.py:104](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parser_frontends.py#line=103), in ParsingFrontend.parse(self, text, start, on_error)
    103 stream = self._make_lexer_thread(text)
--> 104 return self.parser.parse(stream, chosen_start, **kw)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:42](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=41), in LALR_Parser.parse(self, lexer, start, on_error)
     41 try:
---> 42     return self.parser.parse(lexer, start)
     43 except UnexpectedInput as e:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:88](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=87), in _Parser.parse(self, lexer, start, value_stack, state_stack, start_interactive)
     87     return InteractiveParser(self, parser_state, parser_state.lexer)
---> 88 return self.parse_from_state(parser_state)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:111](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=110), in _Parser.parse_from_state(self, state, last_token)
    110         pass
--> 111     raise e
    112 except Exception as e:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:100](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=99), in _Parser.parse_from_state(self, state, last_token)
     99 token = last_token
--> 100 for token in state.lexer.lex(state):
    101     assert token is not None

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:674](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=673), in ContextualLexer.lex(self, lexer_state, parser_state)
    673     token = self.root_lexer.next_token(lexer_state, parser_state)
--> 674     raise UnexpectedToken(token, e.allowed, state=parser_state, token_history=[last_token], terminals_by_name=self.root_lexer.terminals_by_name)
    675 except UnexpectedCharacters:

UnexpectedToken: Unexpected token Token('CNAME', 'or') at line 1, column 26.
Expected one of: 
	* $END
Previous tokens: [Token('RPAR', ')')]


During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[8], line 1
----> 1 retriever.get_relevant_documents("What does the law SB2102 ential")

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py:224](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py#line=223), in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
    222 except Exception as e:
    223     run_manager.on_retriever_error(e)
--> 224     raise e
    225 else:
    226     run_manager.on_retriever_end(
    227         result,
    228     )

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py:217](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py#line=216), in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
    215 _kwargs = kwargs if self._expects_other_args else {}
    216 if self._new_arg_supported:
--> 217     result = self._get_relevant_documents(
    218         query, run_manager=run_manager, **_kwargs
    219     )
    220 else:
    221     result = self._get_relevant_documents(query, **_kwargs)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/retrievers/self_query/base.py:168](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/retrievers/self_query/base.py#line=167), in SelfQueryRetriever._get_relevant_documents(self, query, run_manager)
    157 def _get_relevant_documents(
    158     self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    159 ) -> List[Document]:
    160     """Get documents relevant for a query.
    161 
    162     Args:
   (...)
    166         List of relevant documents
    167     """
--> 168     structured_query = self.query_constructor.invoke(
    169         {"query": query}, config={"callbacks": run_manager.get_child()}
    170     )
    171     if self.verbose:
    172         logger.info(f"Generated Query: {structured_query}")

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py:2053](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py#line=2052), in RunnableSequence.invoke(self, input, config)
   2051 try:
   2052     for i, step in enumerate(self.steps):
-> 2053         input = step.invoke(
   2054             input,
   2055             # mark each step as a child run
   2056             patch_config(
   2057                 config, callbacks=run_manager.get_child(f"seq:step:{i+1}")
   2058             ),
   2059         )
   2060 # finish the root run
   2061 except BaseException as e:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:167](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=166), in BaseOutputParser.invoke(self, input, config)
    163 def invoke(
    164     self, input: Union[str, BaseMessage], config: Optional[RunnableConfig] = None
    165 ) -> T:
    166     if isinstance(input, BaseMessage):
--> 167         return self._call_with_config(
    168             lambda inner_input: self.parse_result(
    169                 [ChatGeneration(message=inner_input)]
    170             ),
    171             input,
    172             config,
    173             run_type="parser",
    174         )
    175     else:
    176         return self._call_with_config(
    177             lambda inner_input: self.parse_result([Generation(text=inner_input)]),
    178             input,
    179             config,
    180             run_type="parser",
    181         )

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py:1246](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py#line=1245), in Runnable._call_with_config(self, func, input, config, run_type, **kwargs)
   1242     context = copy_context()
   1243     context.run(var_child_runnable_config.set, child_config)
   1244     output = cast(
   1245         Output,
-> 1246         context.run(
   1247             call_func_with_variable_args,
   1248             func,  # type: ignore[arg-type]
   1249             input,  # type: ignore[arg-type]
   1250             config,
   1251             run_manager,
   1252             **kwargs,
   1253         ),
   1254     )
   1255 except BaseException as e:
   1256     run_manager.on_chain_error(e)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/config.py:326](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/config.py#line=325), in call_func_with_variable_args(func, input, config, run_manager, **kwargs)
    324 if run_manager is not None and accepts_run_manager(func):
    325     kwargs["run_manager"] = run_manager
--> 326 return func(input, **kwargs)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:168](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=167), in BaseOutputParser.invoke.<locals>.<lambda>(inner_input)
    163 def invoke(
    164     self, input: Union[str, BaseMessage], config: Optional[RunnableConfig] = None
    165 ) -> T:
    166     if isinstance(input, BaseMessage):
    167         return self._call_with_config(
--> 168             lambda inner_input: self.parse_result(
    169                 [ChatGeneration(message=inner_input)]
    170             ),
    171             input,
    172             config,
    173             run_type="parser",
    174         )
    175     else:
    176         return self._call_with_config(
    177             lambda inner_input: self.parse_result([Generation(text=inner_input)]),
    178             input,
    179             config,
    180             run_type="parser",
    181         )

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:219](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=218), in BaseOutputParser.parse_result(self, result, partial)
    206 def parse_result(self, result: List[Generation], *, partial: bool = False) -> T:
    207     """Parse a list of candidate model Generations into a specific format.
    208 
    209     The return value is parsed from only the first Generation in the result, which
   (...)
    217         Structured output.
    218     """
--> 219     return self.parse(result[0].text)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py:63](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py#line=62), in StructuredQueryOutputParser.parse(self, text)
     59     return StructuredQuery(
     60         **{k: v for k, v in parsed.items() if k in allowed_keys}
     61     )
     62 except Exception as e:
---> 63     raise OutputParserException(
     64         f"Parsing text\n{text}\n raised following error:\n{e}"
     65     )

OutputParserException: Parsing text
{
    "query": "SB2102",
    "filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}
 raised following error:
Unexpected token Token('CNAME', 'or') at line 1, column 26.
Expected one of: 
	* $END
Previous tokens: [Token('RPAR', ')')]

Any ideas? Thanks!

@wei-m-teh
Copy link

Is there any update on this issue? I ran into the same issue with langchain version 0.1.7.

@marouahamdi
Copy link

I have got the same issue. any idea or help plz?

@Giselasnjota
Copy link

Giselasnjota commented Feb 22, 2024 via email

@VladMstv
Copy link

VladMstv commented Mar 1, 2024

Same issue with "and" instead of comma produced by llm
langchain 0.1.9
langchain-community 0.0.24
langchain-core 0.1.27

@stolto-pirla
Copy link

Hi, Sure, Pinecone documentation say to treat date matadatas as unix epoch, so for me work this way: 1. My metadata is a float, so I define my AtributeInfo as: AttributeInfo( name="publication_date_as_float", description="Publication date is the date the content was sent to the client.", type="float" ) 2. I convert the datetime and upload to pinecone: metadata["publication_date_as_float"] = time.mktime( date_temp.replace(hour=0, minute=0, second=0, microsecond=0, tzinfo=None).timetuple()) 3. I'm using self_query in pinecone so I need to change the function visit_comparison in pinecone.py in the folder langchain/retrievers/self_query/pinecone.py: Obs.: My reference for this was myscale.py in the same folder in langchain packages, they made the same thing. def visit_comparison(self, comparison: Comparison) -> Dict: if comparison.comparator in (Comparator.IN, Comparator.NIN) and not isinstance( comparison.value, list ): comparison.value = [comparison.value] # convert datetime to unix epoch, in my case i want the date without time if type(comparison.value) is Dict and ("type" in comparison.value) and comparison.value["type"] == "date": data_formato = "%Y-%m-%d" date = datetime.strptime(comparison.value['date'], data_formato) comparison.value = int(time.mktime(date.timetuple())) return { comparison.attribute: { self._format_func(comparison.comparator): comparison.value } } Let me know if works for you.

On Thu, Feb 22, 2024 at 2:07 PM marouahamdi @.> wrote: I have got the same issue. any idea or help plz? — Reply to this email directly, view it on GitHub <#9368 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7DMHS2XBZRAG6XYTOFZA6TYU53NRAVCNFSM6AAAAAA3TT44ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHA4DQNBUHA . You are receiving this because you commented.Message ID: @.>
-- "Esta mensagem é reservada e sua divulgação, distribuição, reprodução ou qualquer forma de uso é proibida e depende de prévia autorização do JOTA. O remetente utiliza o e-mail no exercício do seu trabalho ou em razão dele, eximindo o JOTA de qualquer responsabilidade por utilização indevida. Se você recebeu esta mensagem por engano, favor eliminá-la imediatamente." "This message is reserved and its disclosure, distribution, reproduction or any other form of use is prohibited and shall depend upon previous proper authorization. The sender uses the electronic mail in the exercise of his/her work or by virtue thereof, and JOTA accepts no liability for its undue use. If you have received this e-mail by mistake, please delete it immediately."

This is very interesting. However, it doesn't address the issue, which is well before the query goes to Pinecone, or any other database. The LLM is unable to generate a correct query, either because the logical condition syntax is wrong, or because the "limit" attribute is not used correctly. I wonder if anyone tried with a different LLM, ChatGPT 3.5 or 4 shows the issue.

@fightingmonk
Copy link

I am seeing similar behavior when using gpt-3.5-turbo-0125 - compound boolean filters are not correctly generated by the llm. for example, the llm is returning "filter": "eq(\"type\", \"ticket\") and eq(\"ticket_type\", \"Epic\")" where it should be returning "filter": "and( eq(\"type\", \"engineering_ticket\"), eq(\"ticket_type\", \"Epic\") )".

This is in direct contradiction to the prompt, which explains how to structure boolean filters. 😢

For others finding this issue, my use cases work when I use gpt-4-0613.

@Ahmer967
Copy link

I am currently using the gpt-4 model, and it is performing exceptionally well. To achieve better results, I ensure that the keywords used in the prompts exactly match those in the data.

client = ChatOpenAI(model="gpt-4",api_key=openai_api_,temperature=0)
retriever_votes = SelfQueryRetriever.from_llm(
    client,
    db_votes,
    votes_content_description,
    metadata_field_votes,
    enable_limit=True,
)

Initially, the following prompt did not retrieve the expected results from the database:

Input

retriever_votes.get_relevant_documents("Give me details of votes where year is 2018 and bill's result is failed")

Therefore, I adjusted the prompt to use the exact keywords as they appear in the data, including the same capitalization used in my Document. This modification improved document retrieval.

retriever_votes.get_relevant_documents("Give me details of votes where year is 2018 and bill's result is Failed")

Output Sample

[Document(page_content='page_content', metadata={ 'day': 13, 'result': 'Failed','year': 2018})]

@kanzyai-emirarditi
Copy link

GPT-4 works well but all the other models I have tried, (command-r-plus, sonnet, 3.5) is giving the exception
`{
"query": "Apple stock performance",
"filter": "eq(ticker, "AAPL") and eq(year, 2023)"
}

 raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 10.
Expected one of: 
	* LPAR
Previous tokens: [Token('CNAME', 'ticker')]`

Is there an update on this issue?

@ravikumarmittal
Copy link

This issue is fixed with some simple setting with chat gpt 3.5 turbo model

  1. please change the langchain_pg_collection table column data type json to jsonb (cmetadata jsonb,)
  2. change the column data type for table langchain_pg_embedding to jsonb like (cmetadata jsonb,)

3 in the below code one important trick even if i am using PGVector data base but translation is working with chromaTranslation or pinecode but not with PGvector translator in your case please change the line and test.

structured_query_translator=ChromaTranslator(),

#structured_query_translator=PGVectorTranslator(),
#structured_query_translator=PineconeTranslator()

#use below code this is working

from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
#from langchain.vectorstores.pgvector import PGVector
from langchain_community.vectorstores import PGVector
from langchain.retrievers.self_query.pgvector import PGVectorTranslator
from langchain.retrievers.self_query.chroma import ChromaTranslator
from langchain.retrievers.self_query.pinecone import PineconeTranslator
import os

os.environ["OPENAI_API_KEY"]='open ai key'
connection = "db connecton string"

collection_name = "my_docs"
embeddings = OpenAIEmbeddings()

docs = [
Document(
page_content="A bunch of scientists bring back c and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "thriller",
"rating": 9.9,
},
),
]
embeddings = OpenAIEmbeddings()
vectorstore = PGVector.from_documents(
embedding=embeddings,
documents=docs,
collection_name=collection_name,
connection_string=connection,
use_jsonb=True,

)

from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI,OpenAI

metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="This is related year of the movie",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
structured_query_translator=ChromaTranslator(),
#structured_query_translator=PGVectorTranslator(),
#structured_query_translator=PineconeTranslator()
)

This example only specifies a filter

result=retriever.invoke("give me all movie related to dream having rating 8.6 or rating>9 ")
print(result)
print('#####################')

@orah1998
Copy link

anyone was able to solve this issue? i have the same myself: Unexpected token Token('COMMA', ',') at line 1, column 8.
when using mistral-small ill be really glad if someone could help me solve this

@santosh-gkg
Copy link

I was also facing the same issue, and though i have not exactly found the problem but the error depends upon the model you are using using a bigger model or we can say better model solves it, i was using 'Gemma-7b-it' which was giving error but using 'Llama-70b-8192' solved it

@bczaplicki-gd
Copy link

bczaplicki-gd commented Jul 20, 2024

In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo.

You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated.

@parsa-abbasi
Copy link
Contributor

In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo.

You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated.

This was really helpful! In my case, I was using the Cohere API and ran into a similar issue where the filter I was getting looked like Released_Year, 2016, but the Released_Year needed to be in quotes. Adding the note about the string field needing to be in quotes in the AttributeInfo description fixed the problem for me.

metadata_field_info = [
    AttributeInfo(
        name = "Released_Year",
        description = "The year the movie was released. You must put Released_Year in quotes like \"Released_Year\"",
        type = "integer",
    ),
    AttributeInfo(
        name = "IMDB_Rating",
        description = "A 1-10 rating for the movie. You must put IMDB_Rating in quotes like \"IMDB_Rating\"",
        type = "float"
    ),
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: models Related to LLMs or chat model modules Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests