New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
min_rank() and row_number() returning very different results inside a filter() #313
Comments
I'd recommend using |
Good suggestion. However, I'm still getting the same strange result. Here's an example with aaronha01, using min_rank:
I can't make heads or tails of what is being ranked. On the other hand, when I use row_number(), I get the expected result:
For what it's worth, I also tried dense_rank(), cume_dist(), percent_rank(), and ntile(). dense_rank() produced the same result as min_rank, whereas all the others produced the same result as row_number(). |
Update: I put together a simpler (but still large) fake dataset, and am still seeing the same strange behavior with min_rank(): |
Thanks. Can you reproduce the problem with a smaller dataset? |
No problem. I made a dataset with 50 rows: I upgraded to 0.1.3 before running this, just in case. |
I think there was a relic of how grouped data used to be organised (when we used to group groups together). Should be fixed now. @justmarkham can you test the devel version on your to see if this behaves correctly. Even better if you can come up with some tests to add to the test suite. |
Here's an example from the window-functions.Rmd in the vignettes:
The example presented is: "For each player, find the two years with most hits":
Below are the results. Notice that playerID "aaronto01" has 6 records, not 2:
Let's just look at him:
Below are the results. Looks like the years with most hits are 1962 and 1968.
Let's try row_number() instead of min_rank():
These results look correct. Notice that the data frame has 23,854 rows instead of 32,724 rows:
Perhaps this is a bug with min_rank() or filter()? My apologies if the issue is simply that I'm misunderstanding the functions or the data.
Thanks!
The text was updated successfully, but these errors were encountered: