Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
porter stemmer: string index out of range #1581
For future reference, I copy/paste your question here:
I have found that this issue is specific to nltk version 3.2.2. Originally, I ran
To clarify, I have discovered that the
As a side note, I was using python 2 in both cases. My root environment uses python 2.7.11 and my django project's environment uses python 2.7.13
Hey, Sorry for this (problem).I mean I never use github, it was accidentally happened. I don't know what I just trigger!…
On Jan 7, 2017 11:47 PM, "jkarimi91" ***@***.***> wrote: I have found that this issue is specific to nltk version 3.2.2. Originally, I ran test.py using ipython not python, as stated above. Somehow, I was able to access the ipython installation in my root environment //anaconda/bin/ipython even though I had not specified ipython in my currently activated virtual environment //anaconda/envs/xkcd/bin/. As a result, ipython must have been using the nltk installtion defined in my root environment as well which runs version 3.2.0. To clarify, I have discovered that the PorterStemmer fails to stem the string 'oed' in nltk version 3.2.2 but not in nltk version 3.2.0. Why I have no idea. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVTBBiywlg5c81StFrrcNOsyuF610y9uks5rP9bLgaJpZM4LdV66> .
This is the code used in the example provided by @jkarimi91.
from nltk.stem.porter import PorterStemmer s = PorterStemmer() print s.stem('oed')
Debugging the code above using
>>> rule (u'at', u'ate', None) >>> word u'o'
At this point the
If I'm not mistaken, in NLTK
def _doublec(self, word): """doublec(word) is TRUE <=> word ends with a double consonant""" if len(word) < 2: return False if (word[-1] != word[-2]): return False return self._cons(word, len(word)-1)
As far as I can see, the
def _ends_double_consonant(self, word): """Implements condition *d from the paper Returns True if word ends with a double consonant """ if len(word) < 2: return False return ( word[-1] == word[-2] and self._is_consonant(word, len(word)-1) )