Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow.cartesian when i has no duplicate values #742

Closed
nigmastar opened this issue Jul 25, 2014 · 4 comments
Closed

allow.cartesian when i has no duplicate values #742

nigmastar opened this issue Jul 25, 2014 · 4 comments
Assignees
Milestone

Comments

@nigmastar
Copy link

Please consider the following:

> dt <- data.table(id=rep(letters[1:2], 2), var = rnorm(4), key="id")
> dt
   id       var
1:  a 0.9609685
2:  a 0.1432707
3:  b 1.1276582
4:  b 0.8051821

> dt[letters[1:3], list(var)]
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x),  : 
  Join results in 5 rows; more than 4 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. [...]

> dt[letters[1:3], list(var), by=.EACHI]
   id       var
1:  a 0.9609685
2:  a 0.1432707
3:  b 1.1276582
4:  b 0.8051821
5:  c        NA

The second join results in 5 rows too, shouldn't both joins above be consistent? (Maybe both like the second)

I also wander, the concept behind the implementation of allow.cartesian is simply 1) output rows has not to be more than max(nrow(x),nrow(i)) or 2) to avoid duplicates in key values of i?

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
# data.table installed today from github
@arunsrinivasan
Copy link
Member

Thanks @nigmastar. We'll take a look at this asap.

Regarding your last question: I think this SO post should help.

@arunsrinivasan arunsrinivasan added this to the v1.9.4 milestone Jul 26, 2014
@nigmastar
Copy link
Author

Hi Arun,

thanks for the link. It helped. So, (besides what you say in the link about the error only for 'huge' results) don't you think that when i has no duplicates, like above, no error should be thrown?

Thanks,

Michele.

@arunsrinivasan
Copy link
Member

There've been quite a few issues filed on this topic now. Without giving too much thought, it does appear to me that i having no duplicates need not provide an error.

Bumping all allow.cartesian=. issues to 1.9.4 milestone.

@mgriebe
Copy link

mgriebe commented Aug 14, 2014

Maybe the error can mention "nomatch=0" as a possible solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants