You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've got a few different dataframes that I'd like to merge when doing calculating some regression, and right now I do so by converting to a matrix of doubles, aligning the rows by id, and then rebuilding a dataframe. In spark and pandas, they have utility methods that allow you to merge dataframes with a by option to specify which column is used to match the data.
Describe the solution you'd like
Extend the merge method with either a simple by option to specific key to merge on, add a mergeWith method, or a MergeOptions parameter that contains information such as by (key to join on), and mergeType (inner vs outerjoins, left vs right join).
More of a join. I've got a lot of dataframes, including some I receive from other departments, and it's sometimes painful to get these into a cohesive, single dataframe that contains the feature set I need.
I've got a few different dataframes that I'd like to merge when doing calculating some regression, and right now I do so by converting to a matrix of doubles, aligning the rows by id, and then rebuilding a dataframe. In spark and pandas, they have utility methods that allow you to merge dataframes with a
by
option to specify which column is used to match the data.Describe the solution you'd like
Extend the merge method with either a simple
by
option to specific key to merge on, add amergeWith
method, or aMergeOptions
parameter that contains information such asby
(key to join on), andmergeType
(inner vs outerjoins, left vs right join).https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html
The text was updated successfully, but these errors were encountered: