Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to search Japanese (and possibly other non-English) text #16

Open
aonsager opened this issue Jun 29, 2022 · 2 comments
Open

Unable to search Japanese (and possibly other non-English) text #16

aonsager opened this issue Jun 29, 2022 · 2 comments

Comments

@aonsager
Copy link

When typing a search query in Japanese I get zero search results. The text is visible in search results when I find the item through English queries, so I thought there may be some filter it's not getting through when parsing either the query or the results.

I understand that this may be very low-priority, so please handle as you see fit. Thanks!

en_query
ja_query

@iansinnott
Copy link
Member

Ah yes, thanks for pointing this out @aonsager . This is indeed the case and it's a problem. Not just Japanese, CJK scripts do not work currently.

This has to do with how the FTS system tokenizer [1]. This can be configured through and is in the backlog

[1] https://www.sqlite.org/fts5.html#tokenizers

@iansinnott
Copy link
Member

Have been rewriting the backend for this to use an alternate search system. In my limited testing CJK works roughly as expected. In the case of Chinese there's no special handling for word separation so individual characters are treated as terms. Could be improved but definitely better than the status quo.

It's not exactly the same tool but here's the link: https://github.com/iansinnott/browser-gopher

CleanShot 2022-10-20 at 17 01 10@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants