New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rank and frequency #1634
Comments
Hi @ManfredBernhard , Do you mean to say that in this section of the output, they should all be ranked as 5 rather than 5, 6 and 7 ? feature frequency rank docfreq group
5 du 20 5 1 all
6 frosch 20 6 1 all
7 er 20 7 1 all |
Good point. It would be better to replace https://github.com/quanteda/quanteda/blob/master/R/textstat_frequency.R#L92-L93 with a call to However it's easy to override this (although it can be a bit trickier if you have used the library("quanteda", warn.conflicts = FALSE)
## Package version: 1.4.9000
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
dfmat <- dfm(c("a a b c d d", "a b b c d"))
dfmat
## Document-feature matrix of: 2 documents, 4 features (0.0% sparse).
## 2 x 4 sparse Matrix of class "dfm"
## features
## docs a b c d
## text1 2 1 1 2
## text2 1 2 1 1
tstat <- textstat_frequency(dfmat)
tstat
## feature frequency rank docfreq group
## 1 a 3 1 2 all
## 2 b 3 2 2 all
## 3 d 3 3 2 all
## 4 c 2 4 2 all
rank(tstat[["frequency"]], ties.method = "last") %>%
rev()
## [1] 1 2 3 4
rank(tstat[["frequency"]], ties.method = "average") %>%
rev()
## [1] 1 3 3 3
set.seed(1)
rank(tstat[["frequency"]], ties.method = "random") %>%
rev()
## [1] 1 4 3 2 So using any of the |
Dear All,
|
Yes, in pull request #1636, we allow you to control this via For now, you can reassign the library("quanteda")
## Package version: 1.4.1
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
times <- c(10, 10, 10)
# the way textstat_frequency works in version <- 1.4.1
rank(times, ties.method = "first")
## [1] 1 2 3
# the way you want it to work
rank(times, ties.method = "min")
## [1] 1 1 1 |
Dear Quanteda-Team,
thank you very much for your help with Quanteda.
Best,
Manfred B. Sellner
Am 20.03.2019 um 19:13 schrieb Stefan Müller <notifications@github.com<mailto:notifications@github.com>>:
Closed #1634<#1634>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1634 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/At6udezugEvG8Qy0BmLvKNAoCWMDjneBks5vYno3gaJpZM4baPHS>.
|
Dear Quanteda,
I realised that identical frequencies in a dfm receive different ranks as shown below.
txtfreq<-textstat_frequency(froschk_dfm)
txtfreq
feature frequency rank docfreq group
1 und 64 1 1 all
2 der 37 2 1 all
3 sie 35 3 1 all
4 die 30 4 1 all
5 du 20 5 1 all
6 frosch 20 6 1 all
7 er 20 7 1 all
8 in 19 8 1 all
9 als 19 9 1 all
10 ich 17 10 1 all
11 war 15 11 1 all
12 ihr 15 12 1 all
13 es 15 13 1 all
14 da 15 14 1 all
15 aber 14 15 1 all
16 so 14 16 1 all
17 ein 13 17 1 all
18 dem 13 18 1 all
Can you please fix this bug?
Best,
ManfredBernhard
The text was updated successfully, but these errors were encountered: