porter stemmer: string index out of range #1581
Comments
For future reference, I copy/paste your question here:
|
I have found that this issue is specific to nltk version 3.2.2. Originally, I ran To clarify, I have discovered that the As a side note, I was using python 2 in both cases. My root environment uses python 2.7.11 and my django project's environment uses python 2.7.13 |
Hey,
Sorry for this (problem).I mean I never use github, it was
accidentally happened. I don't know what I just trigger!
…On Jan 7, 2017 11:47 PM, "jkarimi91" ***@***.***> wrote:
I have found that this issue is specific to nltk version 3.2.2.
Originally, I ran test.py using ipython not python, as stated above.
Somehow, I was able to access the ipython installation in my root
environment //anaconda/bin/ipython even though I had not specified
ipython in my currently activated virtual environment
//anaconda/envs/xkcd/bin/. As a result, ipython must have been using the
nltk installtion defined in my root environment as well which runs version
3.2.0.
To clarify, I have discovered that the PorterStemmer fails to stem the
string 'oed' in nltk version 3.2.2 but not in nltk version 3.2.0. Why I
have no idea.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1581 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVTBBiywlg5c81StFrrcNOsyuF610y9uks5rP9bLgaJpZM4LdV66>
.
|
@ExplodingCabbage could you please investigate this issue? The only commit I can see on |
This is the code used in the example provided by @jkarimi91. from nltk.stem.porter import PorterStemmer
s = PorterStemmer()
print s.stem('oed') Debugging the code above using >>> rule
(u'at', u'ate', None)
>>> word
u'o' At this point the If I'm not mistaken, in NLTK def _doublec(self, word):
"""doublec(word) is TRUE <=> word ends with a double consonant"""
if len(word) < 2:
return False
if (word[-1] != word[-2]):
return False
return self._cons(word, len(word)-1) As far as I can see, the Changing def _ends_double_consonant(self, word):
"""Implements condition *d from the paper
Returns True if word ends with a double consonant
"""
if len(word) < 2:
return False
return (
word[-1] == word[-2] and
self._is_consonant(word, len(word)-1)
) |
Yikes. Yep, looks like I broke this in d8402e3 :( Will PR a test and a fix tonight. |
Thanks @jkarimi91, @fievelk, @ExplodingCabbage |
Hi, I encountered the exact same issue today. Could you please suggest how I could get a fix to this? Should I update any packages? |
Hi @santoshbs. You can either use the |
@ExplodingCabbage I think you are referring to the |
@fievelk you are quite right. Sorry, yes: you can either use the |
Thanks so much for the pointer. |
see the following stackoverflow post
The text was updated successfully, but these errors were encountered: