Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hebrew support #216

Closed
Rashty opened this issue May 1, 2016 · 4 comments
Closed

Hebrew support #216

Rashty opened this issue May 1, 2016 · 4 comments

Comments

@Rashty
Copy link

Rashty commented May 1, 2016

Hi,
Is there a way to search text in Hebrew?

@alex-money
Copy link

@Rashty what error are you getting?

@olivernn
Copy link
Owner

olivernn commented Jul 9, 2016

The lunr-languages project has support for many languages, I don't think that hebrew is one of them, though it should at least give you some ideas as to what is involved in adding support for another language.

The language specific parts are the tokenisation/trimming, stemming and stop-word filters. The default implementation of these are english specific, and will probably be causing some issues with a language such as Hebrew.

The simplest approach would be to remove the existing english specific pipeline functions before you add any documents to the index.

idx.pipeline.reset()

The more involved approach would be to implement these text processing functions specifically for Hebrew, take a look at the lunr-languages project for some implementations for other languages for ideas. If you do go this route, please do update this issue with your progress so others can also benefit from being able to use lunr with Hebrew.

@biodranik
Copy link

Has anyone implemented Hebrew support for lunr? Please share your implementation.

@olivernn
Copy link
Owner

Found this just now, might be at least a starting point for anyone who is willing to do the work required to implement this.

https://github.com/iddoberger/awesome-hebrew-nlp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants