I found this weird thing. dfm_lookup is crashing with the above error in certain cases when 'exclusive=TRUE.' Here's a minimum working example. I don't know why "featured_story_content_h2" is important, but it appears to be.
I'm using quanteda 0.99.
corpus(c("featured_story_content_h2", "aaaaa", "bbbbb", "ccccc")) -> testCorpus
#corpus(c("aaaaa", "bbbbb", "ccccc")) -> testCorpus # Works
dictionary(list(foo = c("xxxxx"), bar = c("yyyyy", "zzzzz"))) -> controlDict
dictionary(list(foo = c("aaaaa"), bar = c("yyyyy", "zzzzz"))) -> testDict
dfm(testCorpus, tolower = TRUE,
remove_numbers = TRUE, remove_punct = TRUE, remove_separators = TRUE,
remove_twitter = FALSE, stem = FALSE, ngrams=c(1:2)) -> myDFM
dfm_lookup(myDFM, dictionary=controlDict, exclusive=FALSE) # Succeeds
dfm_lookup(myDFM, dictionary=testDict, exclusive=TRUE) # Succeeds
dfm_lookup(myDFM, dictionary=testDict, exclusive=FALSE) # Fails
#**Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
# invalid character indexing**
Any ideas?
I found this weird thing. dfm_lookup is crashing with the above error in certain cases when 'exclusive=TRUE.' Here's a minimum working example. I don't know why "featured_story_content_h2" is important, but it appears to be.
I'm using quanteda 0.99.
Any ideas?