New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correct() returns empty object #99
Comments
|
just installed textblob today and having same issue on both python 3 and 2, here is 2.7:
|
I have the same issue. The word level spellcheck() is working though. I have python 2.7. |
I looked into this issue today, it seems like there is a problem with the regex used in the function's
The regex The capture group is actually unnecessary, and could be replaced by However, in testing I noticed that its kind of pointless because the correction isn't very good for common contractions ("can't" is replaced with "canst", "we'll" with "well"). If you want to play around with this manually, correct() basically does this step-by-step:
Anyone have an idea of why the capture group was included, or why it doesn't work in python? |
I have the same issue on python 3. version 0.10.0 |
This seems to be caused by a change in NLTK 3.1 (see #97 (comment)). Downgrading to nltk==3.0.5 should fix the problem. I'll try to look into the compat issue when I get the chance. |
Before :)
After :(
I was using NLTK 3.1 in both examples. |
@iamaziz The bug is due to an incompatibility with NLTK 3.1. Downgrading textblob won't make a difference. The next version of textblob will support nltk>=3.1. I am working on this now. |
This is now fixed on |
That's quick! Thanks @sloria 👍 |
Works just fine, after the update. Thanks a lot! :) |
You just removed the attempt to correct contractions altogether? |
I believe the tokenization on contractions was unnecessary and possibly incorrect. The spelling corrector should correct contractions. |
Its complicated because correct() doesn't seem particularly accurate with contractions. I don't think the new tokenization will fix many contractions because it separates them as different tokens. If the spelling mistake is at the beginning it should get fixed ("cann't"), if the ' is in the wrong place ("ca'nt") it will probably give a wildly inaccurate correction, and if it is at the end ("can'tt") it probably won't correct it. Your commit took the old:
and replaces it with:
if you test it with
the old one returns the weird empty set.:
The new one returns:
So when Correct() runs, you get
But, because of the limitations of Correct(), it results in |
I tried using spell checking but correct() method returns an empty object. Following shows the method call on a terminal:
I couldn't find a fix to this. I'm running Python 2.7.6 on Linux.
The text was updated successfully, but these errors were encountered: