Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature proposal] Dataframe merge by ID #690

Open
adamsar opened this issue Aug 5, 2021 · 3 comments
Open

[Feature proposal] Dataframe merge by ID #690

adamsar opened this issue Aug 5, 2021 · 3 comments

Comments

@adamsar
Copy link

adamsar commented Aug 5, 2021

I've got a few different dataframes that I'd like to merge when doing calculating some regression, and right now I do so by converting to a matrix of doubles, aligning the rows by id, and then rebuilding a dataframe. In spark and pandas, they have utility methods that allow you to merge dataframes with a by option to specify which column is used to match the data.

Describe the solution you'd like
Extend the merge method with either a simple by option to specific key to merge on, add a mergeWith method, or a MergeOptions parameter that contains information such as by (key to join on), and mergeType (inner vs outerjoins, left vs right join).

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html

@haifengl
Copy link
Owner

Are you interested in join or a simple merge? You can merge two or more data frames suppose that rows are in the same order with existing API.

@adamsar
Copy link
Author

adamsar commented Aug 10, 2021

More of a join. I've got a lot of dataframes, including some I receive from other departments, and it's sometimes painful to get these into a cohesive, single dataframe that contains the feature set I need.

As an edit: This functionality is exactly what I'd like https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

@haifengl
Copy link
Owner

haifengl commented Apr 6, 2024

We add smile.data.SQL for database management that supports join. The query/join result will be return as DataFrame. See SQLTest for examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants