-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the new version of the tokenizer: charabia #2375
Comments
Done with @ManyTheFish |
Update for the @meilisearch/docs-team
We defined the supported languages as languages that have a dedicated tokenizer (segmenter + normalizer) into Charabia.
It does not mean Meilisearch does not work for non-listed languages. It means by default (for other languages than Japanese and Chinese currently) the Latin tokenizer will be used: so for some languages and situations, it can work as expected; for some other languages it can be a failure. A contributor just did a PR to add the Hebrew support. I will open a dedicated issue if we will introduce it into v0.28.0 so that you don't miss it! |
Issue opened about the Hebrew support: #2417 |
540: Integrate charabia r=Kerollmops a=ManyTheFish related to meilisearch/meilisearch#2375 related to meilisearch/meilisearch#2144 related to meilisearch/meilisearch#2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
540: Integrate charabia r=Kerollmops a=ManyTheFish related to meilisearch/meilisearch#2375 related to meilisearch/meilisearch#2144 related to meilisearch/meilisearch#2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=curquiza a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=curquiza a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=irevoire a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=curquiza a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
2468: Update milli 0.29 r=ManyTheFish a=ManyTheFish - [x] Update milli to 0.29 - [x] Integrate charabia - [x] Set disabled_words to default when Index::exact_words returns None - [x] Fix ranking rules integration test fixes #2375 fixes #2144 fixes #2417 fixes #2407 Co-authored-by: ManyTheFish <many@meilisearch.com>
Why?
We would make easier the contribution to our tokenizer in order to support more and more language. Indeed our community speaks multiple languages, we are not, and they are the best to choose which tokenizer and normalizer they want to use for their own native language. So @ManyTheFish worked on a new version of the tokenizer. More detailed in this issue meilisearch/charabia#72
What is fixed?
Changes
The text was updated successfully, but these errors were encountered: