New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report merge stats in joins (enhancement) #25
Comments
|
Hey, x <- inner_join(band_members, band_instruments, by = "name")
#> inner_join: removed one row and added one column (plays)It's tricky to figure out how many rows are dropped from the "right" data frame, because then tidylog needs to figure out which rows have been merged, no? Of course it would be possible to add the percentage share of rows dropped to the left data frame. |
|
I'd be interested to see: I know this is much more extensive and verbose than what you have now, but my thinking goes that mistakes happen often enough that further logging would be very useful. Some code here |
|
I started working on this on a new branch devtools::install_github("elbersb/tidylog@joins")Now the join commands print something like this: > tidylog::left_join(flights[1:10000, ], airlines[1:10, ], by = "carrier")
#>left_join: added one column (name)
#> rows only in x 2,783
#> rows only in y ( 0)
#> matched rows 7,217
#> ========
#> rows total 10,000Any time a number is printed in parentheses, it means that those rows are not included in the result. For instance: > tidylog::anti_join(flights[1:100, ], airlines[1:10, ], by = "carrier")
#>anti_join: added no new columns
#> rows only in x 34
#> rows only in y ( 3)
#> matched rows (66)
#> ====
#> rows total 34Not very well tested until now, so there will be issues... |
|
This is now on the main branch. |
Hey Benjamin,
Thanks for the cool package!
Wouldn't it be useful to report what share of rows have been dropped?
So for this:
show:
33% of left dataframe and 33% of right dataframe's rows dropped.I find that these joins are often a great source for bugs in my programs and it often has to do with losing many rows in the either left or right dataframe.
Best,
Lukas
The text was updated successfully, but these errors were encountered: