-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dfm_weight with weights option throws error #1150
Comments
Thanks @thomasd2, I just pushed a fix and this should make its way into |
Fix #1150, bug for named weights in dfm_weight()
In response to https://stackoverflow.com/questions/49467432/using-the-french-anew-dictionary-for-sentiment-analysis require(quanteda)
w <- c('a' = 0.1, 'b' = 0.4, 'c' = 0.1)
txt <- c('a a b b', 'b b c d', 'c c d')
mt <- dfm(txt) I think it should weight mt %>%
dfm_weight(weights = w)
# Document-feature matrix of: 3 documents, 4 features (41.7% sparse).
# 3 x 4 sparse Matrix of class "dfm"
# features
# docs a b c d
# text1 0.2 0.8 0 0
# text2 0 0.8 0.1 1
# text3 0 0 0.2 1 This errors mt %>%
dfm_select(names(w)) %>%
dfm_weight(weights = w)
# Error in slot(value, what) :
# no slot of name "factors" for this object of class "dtCMatrix" |
It's a good question as to what to do for a feature whose weight is missing. I agree it should be zeroed or removed, or we should provide an option for that. This looks like a bug in > dfm_weight(mt[, 1:2], weights = w)
Document-feature matrix of: 3 documents, 2 features (50% sparse).
3 x 2 sparse Matrix of class "dfm"
features
docs a b
text1 0.2 0.8
text2 0 0.8
text3 0 0
Warning message:
dfm_weight(): ignoring 1 unmatched weight feature
> dfm_weight(mt[, 1:4], weights = w)
Document-feature matrix of: 3 documents, 4 features (41.7% sparse).
3 x 4 sparse Matrix of class "dfm"
features
docs a b c d
text1 0.2 0.8 0 0
text2 0 0.8 0.1 1
text3 0 0 0.2 1
> dfm_weight(mt[, 1:3], weights = w)
Error in slot(value, what) :
no slot of name "factors" for this object of class "dtCMatrix" 🤔 |
I fixed the bug already. Let's remove missing features (unless you can thinking of a case where zeroed features are useful), so that people can apply weighted lexcion in a same way as in |
If the weights are w <- c(a = 0.1, b = 0.4, c = 0.1, d = 0) then we would want the zero-weighted |
Using quanteda 0.99.9027, the following throws an error about "too many replacement values":
The text was updated successfully, but these errors were encountered: