Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
correct() returns empty object #99
I tried using spell checking but correct() method returns an empty object. Following shows the method call on a terminal:
>>> from textblob import TextBlob >>> b = TextBlob("I havv goood speling!") >>> b.correct() TextBlob("") >>> print(b.correct()) >>>
I couldn't find a fix to this. I'm running Python 2.7.6 on Linux.
just installed textblob today and having same issue on both python 3 and 2, here is 2.7:
I looked into this issue today, it seems like there is a problem with the regex used in the function's
The capture group is actually unnecessary, and could be replaced by
However, in testing I noticed that its kind of pointless because the correction isn't very good for common contractions ("can't" is replaced with "canst", "we'll" with "well").
If you want to play around with this manually, correct() basically does this step-by-step:
Anyone have an idea of why the capture group was included, or why it doesn't work in python?
I was using NLTK 3.1 in both examples.
Its complicated because correct() doesn't seem particularly accurate with contractions. I don't think the new tokenization will fix many contractions because it separates them as different tokens. If the spelling mistake is at the beginning it should get fixed ("cann't"), if the ' is in the wrong place ("ca'nt") it will probably give a wildly inaccurate correction, and if it is at the end ("can'tt") it probably won't correct it.
Your commit took the old:
and replaces it with:
if you test it with
the old one returns the weird empty set.:
The new one returns:
So when Correct() runs, you get
But, because of the limitations of Correct(), it results in