-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird highlight results with non-ascii #2144
Comments
With ascii-only it seems more correct, but I still would like to get single result in both cases (Make sure to enable audio) Screen.Recording.2022-02-05.at.20.15.48.mov |
Hey @OrkhanAlikhanov, Thank you very much for your issue and detailed explanation! However, we plan to improve the tokenizer we use and we will probably expose a Unicode segmentation system too. @ManyTheFish plan to work on this for future releases. We will also expose parameters to disable the remove-words-at-the-end-of-the-query for when there are not enough documents to fulfill the 20 results. |
Hey! Thank you for the detailed answer. How can one contribute to this. Is it easy to update tokenizer function or? Any directions if I want to mess around to workaround this issue? |
Hey! You should look into our tokenizer repository, it is the crate we use in Meilisearch, more specifically in the milli crate. However, could first create an issue and wait until next week for an answer as we plan to rewrite the tokenizer and @ManyTheFish is the one who will do this. I would prefer that he plan the rewriting with you. You can always propose a patch on the tokenizer and patch meilisearch with your fork first. |
I put this issue in v0.28.0 Milestones. The new tokenizer might fix this, nothing sure, we need to test once the first RC is done |
Hello @curquiza, after some tests I'm confident that the new tokenizer will fix this issue. 😄 |
So this issue will be fixed when #2375 will be fixed 🚀 |
540: Integrate charabia r=Kerollmops a=ManyTheFish related to meilisearch/meilisearch#2375 related to meilisearch/meilisearch#2144 related to meilisearch/meilisearch#2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
540: Integrate charabia r=Kerollmops a=ManyTheFish related to meilisearch/meilisearch#2375 related to meilisearch/meilisearch#2144 related to meilisearch/meilisearch#2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=curquiza a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=curquiza a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=irevoire a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=curquiza a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
Describe the bug
Meilisearch is giving weird results compared to algolia and typesense.
To be honest I don't know if the software is built to behave in this way, or the language I use is unsupported. I wanted to report this in any case. I there is a settings toggle for it to become strict, I'd like to know.
To Reproduce
Steps to reproduce the behavior:
"Müəllimlərin İşə Qəbulu - Təsviri İncəsənət"
or"Təsviri İncəsənət"
Expected behavior
Should show correct results as algolia and typesense.
Screenshots
Check out the video where I explain the results. Don't forget to enable audio.
meilisearch_issue.mp4
if you want, here is youtube link https://www.youtube.com/watch?v=kumI6XbjnUA
Meilisearch version: v0.22.0, v0.25.2
The text was updated successfully, but these errors were encountered: