Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: also check type of right DataFrame to determine (sub)class of result DataFrame #44054

Open
jorisvandenbossche opened this issue Oct 16, 2021 · 2 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Subclassing Subclassing pandas objects

Comments

@jorisvandenbossche
Copy link
Member

Currently, we use the type of the left object to construct the result of pd.merge:

typ = self.left._constructor
result = typ(result_data).__finalize__(self, method=self._merge_type)

For GeoPandas users, this means that the order of the arguments passed matters: pd.merge(df, gdf, ..) (or df.merge(gdf, ..)) returns a DataFrame while pd.merge(gdf, df, ...) (or gdf.merge(df, ..)) returns a GeoDataFrame. This can be surprising for users.

I think it should be rather easy / safe to also check the type of the right object and use that class (if left is a non-subclassed pandas.DataFrame, otherwise left still has "precedence")

API breaking implications

This can result in a different class type as return value. But for subclasses that are mostly compatible with pandas.DataFrame (and only add functionality), this should not have a big impact.

@erfannariman
Copy link
Member

Just to be sure I understand you correctly, in both cases pd.merge(df, gdf) or pd.merge(gdf, df) you expect a DataFrame returned. What about gdf.merge(df)?

@jorisvandenbossche
Copy link
Member Author

in both cases pd.merge(df, gdf) or pd.merge(gdf, df) you expect a DataFrame returned

Not necessarily (the goal for GeoPandas would be to have it return a GeoDataFrame in both cases), but I would like to defer that decision to the subclass' _constructor, which can decide to return a DataFrame or subclass.
Right now we always use left._constructor, and so I would like to also use right._constructor in case right is a DataFrame subclass and left not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Subclassing Subclassing pandas objects
Projects
None yet
Development

No branches or pull requests

2 participants