Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection argument in textstat_simil/dist is not working with interger index #1266

Closed
koheiw opened this issue Mar 16, 2018 · 4 comments
Closed

Comments

@koheiw
Copy link
Collaborator

koheiw commented Mar 16, 2018

d1 <- dfm(c("a b c", "a b c d"))
d2 <- dfm(letters[1:6])
dtest <- dfm_select(d1, d2)


textstat_simil(dtest, margin = 'features', selection = 'a')
textstat_simil(dtest, margin = 'features', selection = 1) # error

textstat_simil(dtest, margin = 'documents', selection = 'text1')
textstat_simil(dtest, margin = 'documents', selection = 1) # error

textstat_dist(dtest, margin = 'features', selection = 'a')
textstat_dist(dtest, margin = 'features', selection = 1) # error

textstat_dist(dtest, margin = 'documents', selection = 'text1')
textstat_dist(dtest, margin = 'documents', selection = 1) # error
@koheiw koheiw added the bug label Mar 16, 2018
@koheiw koheiw self-assigned this Mar 16, 2018
@koheiw
Copy link
Collaborator Author

koheiw commented Mar 16, 2018

Probably this is not a bug but design. The documentation actually says if selection is a numeric vector, it is compares with x. However, taking column or raw index for selection is a more natural behavior.

@kbenoit
Copy link
Collaborator

kbenoit commented Mar 16, 2018

By design it should take a numeric index too, so it's a bug. Adding the ability to select from a numeric index should be implemented. I suggest we add logical selection too.

@kbenoit kbenoit added the bug label Mar 16, 2018
@koheiw
Copy link
Collaborator Author

koheiw commented Mar 16, 2018

If it takes numeric index, there will be an ambiguity whether the input to selection is an index for x or an object for which similarity will be calculated. One solution would be to restrict such objects to be a dfm.

@kbenoit
Copy link
Collaborator

kbenoit commented Mar 18, 2018

I agree that we cannot both allow an integer index and an object supplied as a direct numeric for comparison with x. I think the original idea was to supply an object like the y in proxy::similar(). But since it's called selection, it should be a valid index for the margin specified, and this means character, integer, or logical. So for consistency, we should drop the direct supply of an object whose similarity will be computed, and make this argument behave as its name suggests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants