You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fcm() objects created with the default of tri = TRUE produce misleading feature frequencies when using topfeatures(), which calls the inherited method for dfm objects that simply sums the columns. This down weights features that occur first.
library("quanteda")
## Package version: 3.1.0## Unicode version: 13.0## ICU version: 69.1## Parallel computing: 12 of 12 threads used.## See https://quanteda.io for tutorials and examples.txt<- c(
"a b c d",
"a a b c d",
"c d"
)
dfmat<- tokens(txt, remove_punct=TRUE) %>%
dfm()
fcmat<- fcm(dfmat)
fcmat## Feature co-occurrence matrix of: 4 by 4 features.## features## features a b c d## a 1 3 3 3## b 0 0 2 2## c 0 0 0 3## d 0 0 0 0
topfeatures(fcmat)
## d c b a ## 8 5 3 1
A solution would be to create a new method fcm.topfeatures() that first forces the matrix to be symmetric and then sums the columns.
topfeatures.fcm<-function(x,
n=10,
decreasing=TRUE,
scheme= c("count", "docfreq"),
...) {
topfeatures(as.dfm(Matrix::forceSymmetric(x)))
}
topfeatures.fcm(fcmat)
## a c d b ## 10 8 8 7
The text was updated successfully, but these errors were encountered:
I'd favour disabling topfeatures() altogether for fcm objects, since they are not defined in the same way. The current man page for topfeatures() refers to dfm objects. It only works for fcm objects because of inheritance.
fcm()
objects created with the default oftri = TRUE
produce misleading feature frequencies when usingtopfeatures()
, which calls the inherited method for dfm objects that simply sums the columns. This down weights features that occur first.A solution would be to create a new method
fcm.topfeatures()
that first forces the matrix to be symmetric and then sums the columns.The text was updated successfully, but these errors were encountered: