Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unicode line breaking algorithm to find words #313

Merged
merged 1 commit into from May 2, 2021

Commits on May 2, 2021

  1. Use Unicode line breaking algorithm to find words

    This adds a new optional dependency on the unicode-linebreak crate,
    which implements the line breaking algorithm from [Unicode Standard
    Annex #14](https://www.unicode.org/reports/tr14/). We can use this to
    find words in non-ASCII text.
    
    The new dependency is enabled by default since these line breaks are
    more correct than what you get by splitting on ASCII space.
    
    This should help address #220 and #80, though I’m no expert on
    non-Western languages. More feedback from the community would be
    needed here.
    mgeisler committed May 2, 2021
    Configuration menu
    Copy the full SHA
    ecbbde4 View commit details
    Browse the repository at this point in the history