-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SelfQueryRetriever gives error for some queries #9368
Comments
🤖 Hello, Thank you for providing detailed information about the issue you're experiencing. It seems like the problem is related to the parsing of the result returned by the The error message indicates that the parser is expecting a left parenthesis '(' but instead it encounters a comma ','. This is happening because the parser is trying to parse the string The difference between the two results you provided is that in the second one, the string 'lesson' is escaped with double quotes, making it a valid string argument for the 'eq' function. This is why the second result does not raise an error. As for why this is happening, it's hard to say without more information. It could be a bug in the One thing you could try is to update your LangChain version to the latest one (v0.0.266) and see if the issue persists. There might have been fixes or improvements made in the newer versions that could resolve your issue. If updating LangChain does not solve the problem, it would be helpful if you could provide more information about the specific queries that are causing the error, as well as the exact version of the language model you're using. This would help in identifying the root cause of the issue. I hope this helps. Please let me know if you have any other questions or if there's anything else I can do to assist you. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Having the same issue with langchain v0.0.274 using model Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin. Funny thing is if I rerun the exact same query after it fails with this error, it somehow works. |
I'm getting the same thing on langchain==0.0.327
|
Hi @Zledme , what was your Prompt Template ? |
@ericfeunekes @WaleedAlfaris Have you found solution to your problem? |
Using v0.0.352 `langchain_core.exceptions.OutputParserException: Parsing text {
"query": "chicken",
"filter": "eq(\"category\", \"food\") and contain(content, \"chicken\")"
} raised following error: |
Hi, on Langchain 0.1.0 having nasty problems too with the SelfQueryRetriever. Essentially the generated query is not correct. Examples below: Case 2: The issue is the model unable to produce a correct query. Even worse, results are not repeatable, sometimes the returned query is correct. Using ChatGPT 4.0 as LLM. |
I have the same issue, someone find a solution? I'm using ChatGpt 3.5. I try to change the prompt but don't work. I'm treating the common problems with specific replaces, but is not the ideal solution. |
Same here, has anyone found a solution yet? |
No change and the issue is that the model often fails to generate the correct query. So one would need to redo the prompt, although it is not certain it will ever be 100% reliable. |
Jumping in: same issue here: Metadata field info:metadata_field_info = [
AttributeInfo(
name="law_name",
description="The name of the law or piece of legislation",
type="string",
),
AttributeInfo(
name="alt_law_name",
description="The name of the law or piece of legislation",
type="string",
),
]
document_content_description = "The contents of a law or piece of legislation" Retrieverretriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True) Inputretriever.get_relevant_documents("What does the law SB2102 ential") Output{
"query": "SB2102",
"filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}
{
"query": "SB2102",
"filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}
Any ideas? Thanks! |
Is there any update on this issue? I ran into the same issue with langchain version 0.1.7. |
I have got the same issue. any idea or help plz? |
Hi,
Sure, Pinecone documentation say to treat date matadatas as unix epoch, so
for me work this way:
1. My metadata is a float, so I define my AtributeInfo as:
AttributeInfo(
name="publication_date_as_float",
description="Publication date is the date the content was sent to
the client.",
type="float"
)
2. I convert the datetime and upload to pinecone:
metadata["publication_date_as_float"] = time.mktime(
date_temp.replace(hour=0, minute=0, second=0, microsecond=0,
tzinfo=None).timetuple())
3. I'm using self_query in pinecone so I need to change the function
visit_comparison in pinecone.py in the folder
langchain/retrievers/self_query/pinecone.py:
Obs.: My reference for this was myscale.py in the same folder in langchain
packages, they made the same thing.
def visit_comparison(self, comparison: Comparison) -> Dict:
if comparison.comparator in (Comparator.IN, Comparator.NIN) and not
isinstance(
comparison.value, list
):
comparison.value = [comparison.value]
# convert datetime to unix epoch, in my case i want the date without time
if type(comparison.value) is Dict and ("type" in comparison.value)
and comparison.value["type"] == "date":
data_formato = "%Y-%m-%d"
date = datetime.strptime(comparison.value['date'], data_formato)
comparison.value = int(time.mktime(date.timetuple()))
return {
comparison.attribute: {
self._format_func(comparison.comparator): comparison.value
}
}
Let me know if works for you.
…On Thu, Feb 22, 2024 at 2:07 PM marouahamdi ***@***.***> wrote:
I have got the same issue. any idea or help plz?
—
Reply to this email directly, view it on GitHub
<#9368 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A7DMHS2XBZRAG6XYTOFZA6TYU53NRAVCNFSM6AAAAAA3TT44ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHA4DQNBUHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
"Esta mensagem é reservada e sua divulgação, distribuição, reprodução ou
qualquer forma de uso é proibida e depende de prévia autorização do JOTA. O
remetente utiliza o e-mail no exercício do seu trabalho ou em razão dele,
eximindo o JOTA de qualquer responsabilidade por utilização indevida. Se
você recebeu esta mensagem por engano, favor eliminá-la imediatamente."
"This message is reserved and its disclosure, distribution, reproduction or
any other form of use is prohibited and shall depend upon previous proper
authorization. The sender uses the electronic mail in the exercise of
his/her work or by virtue thereof, and JOTA accepts no liability for its
undue use. If you have received this e-mail by mistake, please delete it
immediately."
|
Same issue with "and" instead of comma produced by llm |
This is very interesting. However, it doesn't address the issue, which is well before the query goes to Pinecone, or any other database. The LLM is unable to generate a correct query, either because the logical condition syntax is wrong, or because the "limit" attribute is not used correctly. I wonder if anyone tried with a different LLM, ChatGPT 3.5 or 4 shows the issue. |
I am seeing similar behavior when using This is in direct contradiction to the prompt, which explains how to structure boolean filters. 😢 For others finding this issue, my use cases work when I use |
I am currently using the
Initially, the following prompt did not retrieve the expected results from the database: Input
Therefore, I adjusted the prompt to use the exact keywords as they appear in the data, including the same capitalization used in my
Output Sample
|
GPT-4 works well but all the other models I have tried, (command-r-plus, sonnet, 3.5) is giving the exception
|
This issue is fixed with some simple setting with chat gpt 3.5 turbo model
3 in the below code one important trick even if i am using PGVector data base but translation is working with chromaTranslation or pinecode but not with PGvector translator in your case please change the line and test. structured_query_translator=ChromaTranslator(),
#use below code this is working from langchain_core.documents import Document os.environ["OPENAI_API_KEY"]='open ai key' collection_name = "my_docs" docs = [ ) from langchain.chains.query_constructor.base import AttributeInfo metadata_field_info = [ This example only specifies a filterresult=retriever.invoke("give me all movie related to dream having rating 8.6 or rating>9 ") |
anyone was able to solve this issue? i have the same myself: Unexpected token Token('COMMA', ',') at line 1, column 8. |
I was also facing the same issue, and though i have not exactly found the problem but the error depends upon the model you are using using a bigger model or we can say better model solves it, i was using 'Gemma-7b-it' which was giving error but using 'Llama-70b-8192' solved it |
In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo. You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated. |
This was really helpful! In my case, I was using the Cohere API and ran into a similar issue where the filter I was getting looked like metadata_field_info = [
AttributeInfo(
name = "Released_Year",
description = "The year the movie was released. You must put Released_Year in quotes like \"Released_Year\"",
type = "integer",
),
AttributeInfo(
name = "IMDB_Rating",
description = "A 1-10 rating for the movie. You must put IMDB_Rating in quotes like \"IMDB_Rating\"",
type = "float"
),
] |
System Info
Langchain Version: 0.0.245
model: vicuna-13b-v1.5
Who can help?
No response
Information
Related Components
Reproduction
this is one of the documents
Expected behavior
when result is, the error occurs(Q: what is a phoneme)
'```json\n{\n "query": "phoneme",\n "filter": "eq(lesson, 1)"\n}\n```'
it doesn't occur when result is , (Q: What is a phoneme)
'```json\n{\n "query": "phoneme",\n "filter": "eq(\\"lesson\\", 1)"\n}\n```'
I can't change the version right now.
The text was updated successfully, but these errors were encountered: