Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for german stemmer #30

Closed
thomasfr opened this issue Mar 25, 2012 · 12 comments
Closed

Support for german stemmer #30

thomasfr opened this issue Mar 25, 2012 · 12 comments

Comments

@thomasfr
Copy link

Would be great to have a stemmer for the german language.
Maybe this is a good starting point? https://gist.github.com/2199965

@chrisumbel
Copy link
Member

i agree! one of my highest priorities for natural before fall 2012 is non-English stemmers. i personally was going to look into doing French as I can likely handle that completely, but was hoping to get native speakers to help me at least verify my work with other languages.

would you either be able to handle either the implementation or at least help me verify its accuracy?

the algorithm you've attached, have you played with it much? are you aware if there are any licensing restrictions with it?

@thomasfr
Copy link
Author

Hi,
I could try to do a simple base implementation based on top of the Gist i provided. but i have to check the license first. Otherwise i can i help you in testing yours.

But great to hear that this is on your top priorities list. :)

@chrisumbel
Copy link
Member

Feel free to take a stab at it!

@chrisumbel
Copy link
Member

oops! i did not mean to close this.

@alfredwesterveld
Copy link

+1 for Dutch stemming. Hopefully I can help out in some sort of way in the future.

@joscha
Copy link
Contributor

joscha commented Sep 5, 2012

You can use the JS Snowball port to do so:

https://github.com/fortnightlabs/snowball-js

It does change the capital letter U to lowercase though: http://code.google.com/p/urim/issues/detail?id=3

@Hugo-ter-Doest Hugo-ter-Doest added this to In progress in Natural backlog Mar 17, 2018
@Hugo-ter-Doest Hugo-ter-Doest moved this from In progress to To do in Natural backlog Mar 17, 2018
@Hugo-ter-Doest
Copy link
Collaborator

Hugo-ter-Doest commented Apr 7, 2018

Added Porter Stemmer for Dutch. I should say that the Porter algorithm makes mistakes in Dutch and that my implementation fails in 305 cases of 45669 in the snowball file. That is less than 1% failure. Also the Snowball file contains wrong examples; for instance afvalstortplaats is stemmed as afvalstortplat, which is wrong, it should be afvalstortplaats.

Hugo

@webia1
Copy link

webia1 commented Nov 12, 2020

News?

@Hugo-ter-Doest
Copy link
Collaborator

Hi, I could try to do a simple base implementation based on top of the Gist i provided. but i have to check the license first.

I checked the license:

/*
 * Original author: Joder Illi
 * 
 * Copyright (c) 2010, FormBlitz AG
 * All rights reserved.
 * Implementation of the stemming algorithm from http://snowball.tartarus.org/algorithms/german/stemmer.html
 * Copyright of the algorithm is: Copyright (c) 2001, Dr Martin Porter and can be found at http://snowball.tartarus.org/license.php
 *
 * Redistribution and use in source and binary forms, with or without 
 * modification, is covered by the standard BSD license. 
 * 
 */

As I see it BSD licensed code can be integrated in a MIT licensed code base as long as the the added code has the original (BSD) license.

@Hugo-ter-Doest
Copy link
Collaborator

I am also considering jsSnowball transpiled from Java sources. It is licensed with BSD 3.0 which can be combined with MIT license as well.

Source can be found here:
https://github.com/mazko/jssnowball

@Hugo-ter-Doest
Copy link
Collaborator

See #663 for progress

@Hugo-ter-Doest
Copy link
Collaborator

#663 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

7 participants