You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tokens("This: is, a @test!", what="character", remove_punct=FALSE)
# tokens from 1 document.# Component 1 :# [1] "T" "h" "i" "s" ":" "i" "s" "," "a" "_" "a" "s" "_" "t" "e" "s" "t" "!"
tokens("This: is, a @test!", what="character", remove_punct=TRUE)
# tokens from 1 document.# Component 1 :# [1] "T" "h" "i" "s" "i" "s" "a" "a" "s" "t" "e" "s" "t"
It no doubt has to do with our handling of Twitter characters, even though these are not supposed to apply to character segmentation. The replacement after tokenizing is failing because the regex to match the replacement does not work for character segmentation,
I want to overhaul the whole token-segmentation code, but until we do we ought to fix this with a patch.
The text was updated successfully, but these errors were encountered:
It no doubt has to do with our handling of Twitter characters, even though these are not supposed to apply to character segmentation. The replacement after tokenizing is failing because the regex to match the replacement does not work for character segmentation,
I want to overhaul the whole token-segmentation code, but until we do we ought to fix this with a patch.
The text was updated successfully, but these errors were encountered: