Skip to content

dfm Error in qatd_cpp_tokens_replace #1765

@almogsi

Description

@almogsi

Describe the bug

quanteda::dfm() get's stuck in certain tokenized tweets. I'm getting:

"Error in qatd_cpp_tokens_replace(x, type, ids_pat, ids_repl) :
Not compatible with requested type: [type=NULL; target=double]."

Reproducible code

stucked_tweet <-  "@POTUS Lol too funny yes we're all tired of Jim #Acosta too President Trump Gets Angry At Jim Acosta of CNN… https://t.co/gzUw4PF4s8"
t <- tokens(tolower(stucked_tweet), remove_numbers = T, remove_punct = T, remove_url=T)
tt <- tokens_remove(t, pattern = "^@\\b", valuetype = "regex")
t.dfm <- dfm(tt, remove_twitter = T)

## System information

Please run sessionInfo() and paste the output.

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quanteda_1.5.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         rstudioapi_0.10    magrittr_1.5       stopwords_1.0      tidyselect_0.2.5   munsell_0.5.0     
 [7] colorspace_1.4-1   lattice_0.20-38    R6_2.4.0           rlang_0.4.1        fastmatch_1.1-0    stringr_1.4.0     
[13] dplyr_0.8.3        tools_3.6.1        grid_3.6.1         data.table_1.12.6  gtable_0.3.0       spacyr_1.2        
[19] RcppParallel_4.4.4 lazyeval_0.2.2     assertthat_0.2.1   tibble_2.1.3       crayon_1.3.4       Matrix_1.2-17     
[25] purrr_0.3.3        ggplot2_3.2.1      rsconnect_0.8.15   glue_1.3.1         stringi_1.4.3      compiler_3.6.1    
[31] pillar_1.4.2       scales_1.0.0       lubridate_1.7.4    pkgconfig_2.0.3  

Any idea? thanks in advance!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions