Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing ... logicals as T or F fails through dfm(x, ...) #721

Closed
kbenoit opened this issue May 6, 2017 · 5 comments
Closed

Passing ... logicals as T or F fails through dfm(x, ...) #721

kbenoit opened this issue May 6, 2017 · 5 comments

Comments

@kbenoit
Copy link
Collaborator

kbenoit commented May 6, 2017

This needs to be more robust, since too many people (sadly) use the (sadly) available shortcuts T and F instead of TRUE and FALSE:

> dfm("This: contains punctuation.", remove_punct = TRUE)
Document-feature matrix of: 1 document, 3 features (0% sparse).
1 x 3 sparse Matrix of class "dfmSparse"
       features
docs    this contains punctuation
  text1    1        1           1
> dfm("This: contains punctuation.", remove_punct = T)
 Error in eval(expr, envir, enclos) : 'nthcdr' needs a list to CDR down 
@koheiw
Copy link
Collaborator

koheiw commented May 7, 2017

It seems that both T and F are converted to a string '..1' when passed through ellipsis to fun2. We cannot do much for this problem. The best solution is to stop translating the old arguments to new...

fun1 <- function(a, b, ...) {
    args <- as.list(match.call())
    cat("fun1 args = \n")
    print(args)
    fun2(b, ...)
}

fun2 <- function(b, ...){
    args <- as.list(match.call())
    cat("fun2 args = \n")
    print(args)
    cat("b =", b , "\n")
}

fun1(b = TRUE, c = TRUE) # OK
fun1(b = FALSE, c = FALSE) # OK

fun1(b = T, c = T)
# fun1 args = 
#     [[1]]
# fun1
# 
# $b
# T
# 
# $c
# T
# 
# fun2 args = 
#     [[1]]
# fun2
# 
# $b
# b
# 
# $c
# ..1
# 
# b = TRUE 

fun1(b = F, c = F)
# fun1 args = 
#     [[1]]
# fun1
# 
# $b
# F
# 
# $c
# F
# 
# fun2 args = 
#     [[1]]
# fun2
# 
# $b
# b
# 
# $c
# ..1
# 
# b = FALSE 

@trinker
Copy link
Contributor

trinker commented May 17, 2017

I believe this is related. If not tell me and I'll file an issue elsewhere:

txt <- c("This is software testing: looking for (word) pairs the dog! peas and carrots",  
         "This [is] a software testing again. For. the dog want the book peas and carrots ",
         "Here: this is more Software Testing, want the book looking again for word pairs. peas and carrots ")

x <- !TRUE

quanteda::dfm(txt, stem = FALSE, verbose = FALSE, remove_numbers = FALSE, remove_punct = TRUE)
quanteda::dfm(txt, stem = FALSE, verbose = FALSE, remove_numbers = FALSE, remove_punct = FALSE)
quanteda::dfm(txt, stem = FALSE, verbose = FALSE, remove_numbers = FALSE, remove_punct =!TRUE)
quanteda::dfm(txt, stem = FALSE, verbose = FALSE, remove_numbers = FALSE, remove_punct =x)

The first two dtm calls work. The later two throw the error:

 Error in eval(a, enclos = parent.frame()) : 
  the ... list does not contain 2 elements 

This is particularly problematic when quanteda is being used by another package and needs to pass these arguments in by reference.

@ArthurSpirling
Copy link

ArthurSpirling commented May 20, 2017

Earlier

dfm("This: contains punctuation.", remove_punct = TRUE)
returned

Error in qatd_cpp_tokens_select(x, types, features_id, 2, padding) : expecting a single value

for me. This was using the github version of quanteda. But the latest CRAN version seems ok.

@kbenoit
Copy link
Collaborator Author

kbenoit commented May 20, 2017

Odd... works fine for me:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quanteda_0.9.9-61

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10        lattice_0.20-35     grid_3.4.0          plyr_1.8.4         
 [5] gtable_0.2.0        scales_0.4.1        RcppParallel_4.3.20 ggplot2_2.2.1      
 [9] stringi_1.1.5       lazyeval_0.2.0      data.table_1.10.4   Matrix_1.2-9       
[13] fastmatch_1.1-0     tools_3.4.0         munsell_0.4.3       compiler_3.4.0     
[17] colorspace_1.3-2    tibble_1.3.0       

> dfm("This: contains punctuation.", remove_punct = TRUE)
Document-feature matrix of: 1 document, 3 features (0% sparse).
1 x 3 sparse Matrix of class "dfmSparse"
       features
docs    this contains punctuation
  text1    1        1           1

@kbenoit
Copy link
Collaborator Author

kbenoit commented May 22, 2017

I can fix this, but it will involve rewriting the internal deprecate_argument() in tokens(). I'll get working on this asap.

While personally I believe punishment is warranted for using T and F 😉, @trinker's issue is clearly a problem that needs to be solved.

@kbenoit kbenoit mentioned this issue May 22, 2017
kbenoit added a commit that referenced this issue May 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants