-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT Add aggregate
and join
functions for Pandas and Polars
#733
FEAT Add aggregate
and join
functions for Pandas and Polars
#733
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vincent, great idea! I think some of these methods can be used inside the fuzzy_join/Joiner (perhaps to do when we add polars compatibility).
Some comments:
@@ -84,6 +84,46 @@ This page lists all available functions and classes of `skrub`. | |||
|
|||
deduplicate | |||
|
|||
.. raw:: html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the API page is getting crowded. Maybe we can try to add subsections with a dropdown menu on the sidebar for .datasets and .dataframe methods (as a follow up PR maybe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting suggestion, WDYT @GaelVaroquaux?
Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com>
@@ -0,0 +1,41 @@ | |||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need an __init__.py
in this directory (at least with the current approach of having test modules be part of the package), otherwise for me locally pytest fails and complains it cannot import skrub.dataframe
References
This PR aims at simplifying #600 by adding the
dataframe._pandas
anddataframe._polars
namespaces prior to it.It also enables solving issues like #730.
This PR takes into account the latest reviews made in #600.
What does this PR implement?
skrub.dataframe
so that thejoin
andaggregate
functions for Pandas and Polars can be shared more easily across skrub.get_df_namespace
function and theDataFrameLike
,SeriesLike
types in separate files withinskrub.dataframe
.Additional comments
This PR doesn't take into account the output of the discussion #719 for now.