Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to parse query with special characters via Elasticsearch search endpoints #5089

Closed
esatterwhite opened this issue Jun 6, 2024 · 6 comments · Fixed by #5093
Closed
Assignees
Labels
bug Something isn't working

Comments

@esatterwhite
Copy link
Collaborator

esatterwhite commented Jun 6, 2024

Describe the bug
In a json string, the escape character also has to be escaped for it to be transmitted correctly, but this creates a parsing error.

Steps to reproduce the behavior:

Query Payload
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "fields": [
              "program"
            ],
            "query": "vector\\:\\:sources\\:\\:kafka*",
            "lenient": true
          }
        }
      ]
    }
  }
}
Response Payload
{
  "status": 400,
  "error": {
    "caused_by": null,
    "reason": "failed to parse query: `vector\\:\\:sources\\:\\:kafka*`",
    "stack_trace": null,
    "type": null
  }
}

Expected behavior
The query should be parsed as an escaped colon : as described in the docs for escaping special characters vector\:\:sources\:\:kafka*

@esatterwhite esatterwhite added the bug Something isn't working label Jun 6, 2024
@trinity-1686a
Copy link
Contributor

fixed in quickwit-oss/tantivy#2416, we still have to update the tantivy used in quickwit

@trinity-1686a trinity-1686a self-assigned this Jun 6, 2024
@esatterwhite
Copy link
Collaborator Author

fixed in quickwit-oss/tantivy#2416, we still have to update the tantivy used in quickwit

Thank you!

@trinity-1686a
Copy link
Contributor

hum, actually there might be a 2nd issue, but I'm not entirely sure.
The tantivy patch fixes the error you get and #5041 returning everything, but then in both case we return nothing because we consider the entire thing as a wildcard query, which doesn't get tokenized.
I've checked what ES does, and in both case it doesn't find any matches, so it seems it doesn't tokenize wildcard queries either.

Is that the result you expect?

@esatterwhite
Copy link
Collaborator Author

No, elastic definitely returns the results matching only the query.

in elastic, we have program mapped as a keyword field w/ a lower case normalizer. And in quickwit, a text field, with a raw tokenizer + lowercase normalizer.

@trinity-1686a
Copy link
Contributor

oh okay. that field uses the raw tokenizer, then not tokenizing is what should happen indeed

@esatterwhite
Copy link
Collaborator Author

The documents have a field like this

  "program": "vector::sources::kafka"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants