New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exact match doesn't seem to work #921
Comments
Thanks @mrusme for the bug report. Can you share your index schema? I just want to make sure that you can do phrase queries. If the schema does not specify that positions are stored, then quickwit should return an error. |
@fmassot thanks for the quick reply. That's the schema I'm using: {
"version": 0,
"index_id": "wikipedia",
"index_uri": "file:///home/mrus/projects/@mrusme/ulpia/wikipedia",
"search_settings": {
"default_search_fields": ["title", "section_texts"]
},
"doc_mapping": {
"store_source": true,
"field_mappings": [
{
"name": "title",
"type": "text"
},
{
"name": "section_titles",
"type": "array<text>"
},
{
"name": "section_texts",
"type": "array<text>"
},
{
"name": "interlinks",
"type": "array<text>",
"indexed": false,
"stored": false
}
]
}
} Since I couldn't find any documentation for the latest master I had to scramble together the config myself. I might make a mistake there? |
sorry for the not updated documentation, we are working on it right now and it will be in line with the code in 2 weeks. With your current schema,quickwit should return an error, remove the Here is the schema you need to have to make phrase queries:
|
Unfortunately it doesn't return an error (or at least it doesn't look like one): ▲ quickwit index search --index-id wikipedia --metastore-uri file://$(pwd)/wikipedia --query 'title:"Barack Obama"'
{
"numHits": 0,
"hits": [],
"elapsedTimeMicros": 4614
} Also, even with |
Ok, it's a bug then, I need to check that. |
Also one more thing: I applied your change to my config and performed the following command: quickwit index ingest --index-id wikipedia --metastore-uri file://$(pwd)/wikipedia --data-dir-path ./wikipedia-data --input-path enwiki-latest-pages-articles.json --overwrite It seems like the overwrite flag doesn't do much here though. I assume I need to |
Yes exactly, you need to reindex the dataset and the simplest way to do it is to create again the index with |
sorry it's |
I've just tested a query like this on the tutorial Wikipedia dataset:
Here is my index config: version: 0
index_id: wikipedia
index_uri: file:///Users/fmassot/Documents/quickwit/indexes/wikipedia
doc_mapping:
field_mappings:
- name: title
type: text
tokenizer: default
record: position
- name: body
type: text
tokenizer: default
record: position
search_settings:
default_search_fields: [title, body] I have 10 hits. |
@mrusme I will try your query on the complete dataset. |
Yeah, just finished re-indexing and now my results for the example I gave initially look like this:
It's better now and I assume for exact matches I could always pick the very first match. However, it would be nice to have an exact exact match, which literally only shows a hit for an entry that's named "Apollo 11" or no hit at all of there is none. :) |
I opened a dedicated issue for the search command here #922. It will be fixed soon, thanks again for the report.
I see yes but when I look at other search engine and how tantivy is doing queries, we have several choices to do what you want:
I would go for the 3rd solution :) |
We should add BM25 scoring for this use case. That would help. |
@mrusme the search command bug has been fixed here #923 Let's close this issue and maybe add a dedicated one for the BM25 scoring? @fulmicoton ? |
Describe the bug
The exact search doesn't seem to be working.
Steps to reproduce (if applicable)
Steps to reproduce the behavior:
Okay, so it seems we've found what we're looking for as a second result.
However, since the article as literally named Apollo 11 we should be able to
perform what (according to quickwit's documentation) seems to be an exact
search:
Expected behavior
The "Apollo 11" result should be showing up.
System configuration:
60f897c0f49b4a920948b2bb98ca081f5557ed22
built from source on Linux, rustc 1.56.1Additional context
The text was updated successfully, but these errors were encountered: