-
Notifications
You must be signed in to change notification settings - Fork 2.9k
porter stemmer: string index out of range #1581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For future reference, I copy/paste your question here:
|
I have found that this issue is specific to nltk version 3.2.2. Originally, I ran To clarify, I have discovered that the As a side note, I was using python 2 in both cases. My root environment uses python 2.7.11 and my django project's environment uses python 2.7.13 |
@ExplodingCabbage could you please investigate this issue? The only commit I can see on |
This is the code used in the example provided by @jkarimi91. from nltk.stem.porter import PorterStemmer
s = PorterStemmer()
print s.stem('oed') Debugging the code above using >>> rule
(u'at', u'ate', None)
>>> word
u'o' At this point the If I'm not mistaken, in NLTK def _doublec(self, word):
"""doublec(word) is TRUE <=> word ends with a double consonant"""
if len(word) < 2:
return False
if (word[-1] != word[-2]):
return False
return self._cons(word, len(word)-1) As far as I can see, the Changing def _ends_double_consonant(self, word):
"""Implements condition *d from the paper
Returns True if word ends with a double consonant
"""
if len(word) < 2:
return False
return (
word[-1] == word[-2] and
self._is_consonant(word, len(word)-1)
) |
Yikes. Yep, looks like I broke this in d8402e3 :( Will PR a test and a fix tonight. |
Thanks @jkarimi91, @fievelk, @ExplodingCabbage |
Hi, I encountered the exact same issue today. Could you please suggest how I could get a fix to this? Should I update any packages? |
Hi @santoshbs. You can either use the |
@ExplodingCabbage I think you are referring to the |
@fievelk you are quite right. Sorry, yes: you can either use the |
Thanks so much for the pointer. |
see the following stackoverflow post
The text was updated successfully, but these errors were encountered: