Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tokenizer.perl: split final dots unconditionally
Allow tokenization of non-breaking prefixes at end of sentences. This should be a fair compromise in many cases to construct a cleaner vocabulary. EN-old: So am I. EN-new: So am I . DE-old: ... schwer wie ein iPhone 5. DE-new: ... schwer wie ein iPhone 5 . FR-old: Des gens admirent une œuvre d' art. FR-new: Des gens admirent une œuvre d' art . CS-old: Dvě děti, které běží bez bot. CS-new: Dvě děti, které běží bez bot .
- Loading branch information