You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The textstat_keyness function can raise an error to warn the user if the target argument entered by the user is all the documents in the dfm. Alternatively, we can have a bug handler that defaults the target to the first document in the dfm.
To test out what was happening under the hood, I ran the parts of the code necessary to generate the groupings and temp in the texstat_keyness function and printed some lines:
There could possibly be a line that defaults the target document to the first document.
# use original docnames only when there are two documents
if (ndoc(x) == 2) {
label <- docnames(x)[order(target, decreasing = TRUE)]
} else {
if (sum(target) == 1 && !is.null(docnames(x)[target])) {
label <- c(docnames(x)[target], "reference")
} else {
if (sum(target) == length(target)) {
target[2:length(target)] = FALSE
target[1] = TRUE
}
label <- c("target", "reference")
}
}
Or we could raise an error
# use original docnames only when there are two documents
if (ndoc(x) == 2) {
label <- docnames(x)[order(target, decreasing = TRUE)]
} else {
if (sum(target) == 1 && !is.null(docnames(x)[target])) {
label <- c(docnames(x)[target], "reference")
} else {
if (sum(target) == length(target)) {
stop('Target documents cannot be the whole corpus')
}
label <- c("target", "reference")
}
}
The text was updated successfully, but these errors were encountered:
Thanks @jiongweilua, this is a good find. How about I let you try fixing it as a PR. Here's the steps.
Fork the master.
Create a branch for the fix.
Add this to test-textstat_keyness.R:
# works for one document
textstat_keyness(d, target= docnames(d)[1])
library("testthat")
test_that("keyness works correctly for default, single, and multiple targets") {
d<- corpus(c(d1="a a a a a b b b c c c c ",
d2="a b c d d d d e f g",
d3="a b c d d e e e e f g")) %>%
dfm()
# default target is first document
expect_identical(
textstat_keyness(d),
textstat_keyness(d, target= docnames(d)[1])
)
# for explicit first target
expect_identical(
as.integer(textstat_keyness(d, target= docnames(d)[1])[1, "n_target"]),
5L
)
# for two documents as targets
expect_equivalent(
textstat_keyness(d, target= docnames(d)[1:2])[1, c("n_target", "n_reference")],
data.frame(n_target=6, n_reference=1)
)
# for all documents as targets
expect_error(
textstat_keyness(d, target= docnames(d)[1:3]),
"number of target documents must be < ndoc"
)
}
Implement an error check for the condition in textstat_keyness.dfm(), and return the error through stop("number of target documents must be < ndoc"), so that new, last test passes.
Describe the bug
The
textstat_keyness
function can raise an error to warn the user if thetarget
argument entered by the user is all the documents in the dfm. Alternatively, we can have a bug handler that defaults the target to the first document in the dfm.Reproducible code
To test out what was happening under the hood, I ran the parts of the code necessary to generate the
groupings
andtemp
in thetexstat_keyness
function and printed some lines:Potential Fixes
There could possibly be a line that defaults the target document to the first document.
Or we could raise an error
The text was updated successfully, but these errors were encountered: