-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Index text without diacritics #53
Comments
Omnisearch would have to index an additional computed field to hold the converted text without diacritics. Definitely doable, but I'd put that behind a setting, since that would roughly double the indexing duration and increase memory consumption. I'm not sure how it will work with the in-file search, but I can take a look at it. Anyway, this part needs a rework.
Because the same word with and without diacritics could have 2 different meanings? |
I don't mind it being behind a setting. It's too bad you may have to convert the text, would be nice if the search library you're using offered diacritics removal alongside stemming possibly lemmatization.
Indeed, many such cases. Though I would have to see it in practice to know if it would be usable if the results were ranked the same, I suspect it would be only a minor annoyance. |
Well, I suspect it's specifically because diacritics can change the meaning of a word. I have the same issue in French, but to a lesser degree: the fuzzy matching usually works around it ("creme brulee" will match "crème brulée", but not ""crème brûlée"). I think that a better solution would be a toggle to simply ignore diacritics: normalize all search queries, and notes before indexing. |
When I search for "zlutoucky", I would like to be able to find "žluťoučký", and vice versa. These results should have a lower relevancy/priority. Would this be possible?
The text was updated successfully, but these errors were encountered: