dfm_weight with weights option throws error #1150

thomasd2 · 2017-12-18T11:22:44Z

Using quanteda 0.99.9027, the following throws an error about "too many replacement values":

testText <- c("brown brown yellow green", "yellow green blue")
(testDfm <- dfm(tokens(testText)))
testWeights <- rnorm(4)
names(testWeights) <- featnames(testDfm)
testWeights

dfm_weight(testDfm, weights=testWeights)

The text was updated successfully, but these errors were encountered:

kbenoit · 2017-12-18T12:47:11Z

Thanks @thomasd2, I just pushed a fix and this should make its way into master by this afternoon.

Fix #1150, bug for named weights in dfm_weight()

koheiw · 2018-03-24T17:43:45Z

In response to https://stackoverflow.com/questions/49467432/using-the-french-anew-dictionary-for-sentiment-analysis

require(quanteda)
w <- c('a' = 0.1, 'b' = 0.4,  'c' = 0.1)
txt <- c('a a b b', 'b b c d', 'c c d')
mt <- dfm(txt)

I think it should weight d by zero

mt %>% 
dfm_weight(weights = w)

# Document-feature matrix of: 3 documents, 4 features (41.7% sparse).
# 3 x 4 sparse Matrix of class "dfm"
#        features
# docs      a   b   c d
#   text1 0.2 0.8 0   0
#   text2 0   0.8 0.1 1
#   text3 0   0   0.2 1

This errors

mt %>% 
dfm_select(names(w)) %>%
dfm_weight(weights = w)

# Error in slot(value, what) : 
#   no slot of name "factors" for this object of class "dtCMatrix"

kbenoit · 2018-03-24T18:17:50Z

It's a good question as to what to do for a feature whose weight is missing. I agree it should be zeroed or removed, or we should provide an option for that.

This looks like a bug in dfm_weight(x, weights = w) - the first two are fine but the third is definitely not!

> dfm_weight(mt[, 1:2], weights = w)
Document-feature matrix of: 3 documents, 2 features (50% sparse).
3 x 2 sparse Matrix of class "dfm"
       features
docs      a   b
  text1 0.2 0.8
  text2 0   0.8
  text3 0   0  
Warning message:
dfm_weight(): ignoring 1 unmatched weight feature 
> dfm_weight(mt[, 1:4], weights = w)
Document-feature matrix of: 3 documents, 4 features (41.7% sparse).
3 x 4 sparse Matrix of class "dfm"
       features
docs      a   b   c d
  text1 0.2 0.8 0   0
  text2 0   0.8 0.1 1
  text3 0   0   0.2 1
> dfm_weight(mt[, 1:3], weights = w)
 Error in slot(value, what) : 
  no slot of name "factors" for this object of class "dtCMatrix"

🤔

koheiw · 2018-03-24T18:27:58Z

I fixed the bug already. Let's remove missing features (unless you can thinking of a case where zeroed features are useful), so that people can apply weighted lexcion in a same way as in dfm_lookup().

kbenoit · 2018-03-25T18:05:05Z

If the weights are

w <- c(a = 0.1, b = 0.4,  c = 0.1, d = 0)

then we would want the zero-weighted d to appear as zeroes in the dfm, but if is omitted, then it's a candidate for exclusion. That mingles the weight and select operations, so it's worth thinking about the use cases carefully.

kbenoit added a commit that referenced this issue Dec 18, 2017

Fix #1150, bug for named weights in dfm_weight()

1eb0ac6

kbenoit mentioned this issue Dec 18, 2017

Fix #1150, bug for named weights in dfm_weight() #1151

Merged

kbenoit added bug dfm labels Dec 18, 2017

kbenoit self-assigned this Dec 18, 2017

kbenoit closed this as completed in #1151 Dec 18, 2017

kbenoit added a commit that referenced this issue Dec 18, 2017

Merge pull request #1151 from kbenoit/fix-1150

c53a9c8

Fix #1150, bug for named weights in dfm_weight()

koheiw added a commit that referenced this issue Dec 18, 2017

Fix #1150

7a02eaf

koheiw reopened this Mar 24, 2018

koheiw added a commit that referenced this issue Mar 24, 2018

Fix error #1150

0f4127d

koheiw added the design label Mar 24, 2018

kbenoit added this to the CRAN release of v. 1.3 milestone May 22, 2018

kbenoit closed this as completed May 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dfm_weight with weights option throws error #1150

dfm_weight with weights option throws error #1150

thomasd2 commented Dec 18, 2017

kbenoit commented Dec 18, 2017

koheiw commented Mar 24, 2018

kbenoit commented Mar 24, 2018 •

edited

Loading

koheiw commented Mar 24, 2018

kbenoit commented Mar 25, 2018

dfm_weight with weights option throws error #1150

dfm_weight with weights option throws error #1150

Comments

thomasd2 commented Dec 18, 2017

kbenoit commented Dec 18, 2017

koheiw commented Mar 24, 2018

kbenoit commented Mar 24, 2018 • edited Loading

koheiw commented Mar 24, 2018

kbenoit commented Mar 25, 2018

kbenoit commented Mar 24, 2018 •

edited

Loading