-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for german stemmer #30
Comments
i agree! one of my highest priorities for natural before fall 2012 is non-English stemmers. i personally was going to look into doing French as I can likely handle that completely, but was hoping to get native speakers to help me at least verify my work with other languages. would you either be able to handle either the implementation or at least help me verify its accuracy? the algorithm you've attached, have you played with it much? are you aware if there are any licensing restrictions with it? |
Hi, But great to hear that this is on your top priorities list. :) |
Feel free to take a stab at it! |
oops! i did not mean to close this. |
+1 for Dutch stemming. Hopefully I can help out in some sort of way in the future. |
You can use the JS Snowball port to do so: https://github.com/fortnightlabs/snowball-js It does change the capital letter U to lowercase though: http://code.google.com/p/urim/issues/detail?id=3 |
Added Porter Stemmer for Dutch. I should say that the Porter algorithm makes mistakes in Dutch and that my implementation fails in 305 cases of 45669 in the snowball file. That is less than 1% failure. Also the Snowball file contains wrong examples; for instance Hugo |
News? |
I checked the license:
As I see it BSD licensed code can be integrated in a MIT licensed code base as long as the the added code has the original (BSD) license. |
I am also considering jsSnowball transpiled from Java sources. It is licensed with BSD 3.0 which can be combined with MIT license as well. Source can be found here: |
See #663 for progress |
#663 is merged. |
Would be great to have a stemmer for the german language.
Maybe this is a good starting point? https://gist.github.com/2199965
The text was updated successfully, but these errors were encountered: