Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No lower case mapping for "Ø" in charset non_cjk #1361

Closed
jnystad opened this issue Aug 16, 2023 · 2 comments
Closed

No lower case mapping for "Ø" in charset non_cjk #1361

jnystad opened this issue Aug 16, 2023 · 2 comments

Comments

@jnystad
Copy link

jnystad commented Aug 16, 2023

Describe the bug
Ø (00D8) is included but not mapped in non_cjk charset. For consistency, this should be mapped to "o" (006F), the same way "ø" (00F8) is.

As a Scandinavian, I would also suggest you consider defaulting to handling ø, ö, ä and å as distinct from their seemingly similar siblings. I understand this may be an exception you don't really want, as I'm sure more languages have similar issues that may be in conflict with each other. However, æ is already treated as a distinct character, so it seems a little inconsistent.

Treating Ø as distinct from ø, however, seems like a clear bug to me.

To Reproduce
Steps to reproduce the behavior:

  1. Index some text containing words with Ø
  2. Try to search with ø or o

Expected behavior
Ø and ø should be treated the same.

@Nick-S-2018
Copy link
Collaborator

We fixed it in 8ed0f51

Regarding 'æ' as being handled differently from the other symbols mentioned above, it is a ligature symbol, while the other ones are diacritic so they have unambiguous 'base' symbols they can be naturally mapped to.

@jnystad
Copy link
Author

jnystad commented Aug 21, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants