We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Because of how we split for what = "faster word", we are running into this problem:
what = "faster word"
> tokens("one\ntwo\tthree", what = "fasterword") tokens from 1 document. text1 : [1] "one\ntwo\tthree"
Those should be split into three tokens.
This behaviour seems to come from stringi:
> stringi::stri_split_regex("one\ntwo\tthree", "\\p{Z}+") [[1]] [1] "one\ntwo\tthree" > stringi::stri_split_regex("one\ntwo\tthree", "\\p{WHITE_SPACE}+") [[1]] [1] "one" "two" "three"
Because the Z unicode category should match for the \p and \n, I filed an issue for this at gagolews/stringi#327.
The text was updated successfully, but these errors were encountered:
ad359e6
Merge pull request #1424 from quanteda/Issue-1420
78ce83b
Fix #1420
kbenoit
koheiw
No branches or pull requests
Because of how we split for
what = "faster word"
, we are running into this problem:Those should be split into three tokens.
This behaviour seems to come from stringi:
Because the Z unicode category should match for the \p and \n, I filed an issue for this at gagolews/stringi#327.
The text was updated successfully, but these errors were encountered: