Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samukweku/refactor expand grid #1383

Merged
merged 25 commits into from
Aug 3, 2024
Merged

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Jul 4, 2024

PR Description

Please describe the changes proposed in the pull request:

  • cartesian_product function, as a saner alternative to expand_grid
  • introduces a few restrictions, and limits assumptions on user input.
  • addedexpand method, similar to tidyr's expand function
  • updates to complete based on expand method

should also resolve this discussion here

performance YMMV (compared to pd.merge) :

import pandas as pd
import janitor as jn

df1 = pd.DataFrame({'a':range(1,3), 'b':[2,1]})
df2 = pd.DataFrame({"x":[1,2,3],"y":[3,2,1]})
df3 = pd.DataFrame({"r":[2,3],"s":["a","b"]})

df1 = pd.concat([df1]*10_000)
df2 = pd.concat([df2]*200)

A=jn.cartesian_product(df1,df2,df3)
B=df1.merge(df2,how='cross').merge(df3,how='cross')
A.equals(B)
True

# this PR 
%timeit jn.cartesian_product(df1,df2,df3)
353 ms ± 4.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df1.merge(df2,how='cross').merge(df3,how='cross')
1.52 s ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# dev 
%timeit jn.expand_grid(others={'df1':df1,'df2':df2,'df3':df3})
438 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit jn.expand_grid(others={'df1':df1,'df2':df2,'df3':df3}).droplevel(level=0,axis=1)
728 ms ± 8.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR resolves #1293 .

@samukweku samukweku self-assigned this Jul 4, 2024
@ericmjl
Copy link
Member

ericmjl commented Jul 4, 2024

@samukweku samukweku force-pushed the samukweku/refactor_expand_grid branch from 0274f67 to 8978a26 Compare July 12, 2024 11:59
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, @samukweku! Thank you for the creative work!

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 98.75000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.76%. Comparing base (62c57c6) to head (eee05b3).
Report is 34 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1383      +/-   ##
==========================================
- Coverage   94.48%   89.76%   -4.72%     
==========================================
  Files          80       87       +7     
  Lines        4367     5392    +1025     
==========================================
+ Hits         4126     4840     +714     
- Misses        241      552     +311     

@samukweku samukweku merged commit e51d3b2 into dev Aug 3, 2024
4 checks passed
@samukweku samukweku deleted the samukweku/refactor_expand_grid branch August 3, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expand function
2 participants