Describe the bug
quanteda::dfm() get's stuck in certain tokenized tweets. I'm getting:
"Error in qatd_cpp_tokens_replace(x, type, ids_pat, ids_repl) :
Not compatible with requested type: [type=NULL; target=double]."
Reproducible code
stucked_tweet <- "@POTUS Lol too funny yes we're all tired of Jim #Acosta too President Trump Gets Angry At Jim Acosta of CNN… https://t.co/gzUw4PF4s8"
t <- tokens(tolower(stucked_tweet), remove_numbers = T, remove_punct = T, remove_url=T)
tt <- tokens_remove(t, pattern = "^@\\b", valuetype = "regex")
t.dfm <- dfm(tt, remove_twitter = T)
## System information
Please run sessionInfo() and paste the output.
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] quanteda_1.5.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 stopwords_1.0 tidyselect_0.2.5 munsell_0.5.0
[7] colorspace_1.4-1 lattice_0.20-38 R6_2.4.0 rlang_0.4.1 fastmatch_1.1-0 stringr_1.4.0
[13] dplyr_0.8.3 tools_3.6.1 grid_3.6.1 data.table_1.12.6 gtable_0.3.0 spacyr_1.2
[19] RcppParallel_4.4.4 lazyeval_0.2.2 assertthat_0.2.1 tibble_2.1.3 crayon_1.3.4 Matrix_1.2-17
[25] purrr_0.3.3 ggplot2_3.2.1 rsconnect_0.8.15 glue_1.3.1 stringi_1.4.3 compiler_3.6.1
[31] pillar_1.4.2 scales_1.0.0 lubridate_1.7.4 pkgconfig_2.0.3
Any idea? thanks in advance!
Describe the bug
quanteda::dfm()get's stuck in certain tokenized tweets. I'm getting:"Error in qatd_cpp_tokens_replace(x, type, ids_pat, ids_repl) :
Not compatible with requested type: [type=NULL; target=double]."
Reproducible code
## System information
Please run
sessionInfo()and paste the output.Any idea? thanks in advance!