Describe the bug
tokens.tokens(x, remove_hyphens = TRUE) does not split the hyphenated words.
> txt <- "Auto-immune system."
> tokens(txt, remove_hyphens = TRUE)
tokens from 1 document.
text1 :
[1] "Auto" "-" "immune" "system" "."
> tokens(txt, remove_hyphens = FALSE)
tokens from 1 document.
text1 :
[1] "Auto-immune" "system" "."
> tokens(txt, remove_hyphens = FALSE) %>% tokens(remove_hyphens = TRUE)
tokens from 1 document.
text1 :
[1] "Auto immune" "system" "."
Expected behavior
> tokens(txt, remove_hyphens = FALSE) %>% tokens(remove_hyphens = TRUE)
tokens from 1 document.
text1 :
[1] "Auto" "-" "immune" "system" "."
## System information
> packageVersion("quanteda")
[1] ‘1.3.14’
Describe the bug
tokens.tokens(x, remove_hyphens = TRUE)does not split the hyphenated words.Expected behavior
## System information