New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support typographic apostrophes #93 #94
support typographic apostrophes #93 #94
Conversation
Thanks @TimKam; I agree it would be more efficient to try to add this into the
What happened? Did it error out, or just fail to detect it properly? From a quick look at the code it feels like it should be possible for this to work, at least in the case of a unicode input string. |
68253a3
to
f04c02a
Compare
You are correct, it works for unicode input strings. I changed the implementation accordingly. When comparing with a non-unicode input string, the tokenizer throws the following warning:
I also realized the default English dictionary doesn't seem to support short forms that use typographic apostrophes. |
Am I correct in thinking that if this were merged, it would resolve: https://bitbucket.org/dhellmann/sphinxcontrib-spelling/issues/13/with-sphinx-161-contractions-result-in |
It might help, but I think you could also run into trouble because of:
So just maintaining the apostrophe is not enough, you have to ensure that the resulting word is actually recognized as valid by the underlying dictionary. This seems to work OK for me in initial testing though:
@TimKam did you find some words that were not supported? |
I'm going to go ahead and merge this and add a couple more tests, thanks @TimKam! |
Thanks @rfk! |
I've released v1.6.9 with this change |
@rfk this breaks a lot of my sphinx builds :/ |
Rename Myspell checker to Hunspell
I implemented a fix that might be suboptimal.
I tried adding the typographic apostrophe to the
valid_chars
of the English tokenizer's initialization function, but this didn't lead to the desired results. Afterwards, the spell checker didn't recognize any words with a typographic apostrophe.