-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug report in SequenceTokenizerNew #657
Comments
Srictly speaking the second double quote is in the wrong place, It should be right behind the period like this:
This results in a syntax error as well, so I will try and make it more robust. |
Partly solved with #658 It now can handle multiple sentences surrounded by quotes. It cannot handle the position of the quote symbol in your example. So I will leave the issue open and will try later to make it more robust, |
I totally agree :) that's why I specifically left a note that it is not my text but sci-fi classics I used. |
Won't fix it further since the parser mechanism underlying this sentence tokenizer is not flexible enough for this. |
SequenceTokenizerNew fails on following call:
sentenceTokenizer.tokenize('"All ticketed passengers should now be in the Blue Concourse sleep lounge. Make sure your validation papers are in order. Thank you". The upstairs lounge was not at all grungy.')
(quote from "The Jaunt" by Stephen King)with following message:
The text was updated successfully, but these errors were encountered: