COMPAT: add pandas blockmanager alternative apis for _constructor like things #3080

m-richards · 2023-11-20T10:19:12Z

With 6b4857e we are down to a mere 34992 warnings (commits after are fixes for old pandas).

With 5d7d146 we are down to 346

482c989 is a bit more convoluted than the others. Originally I didn't think we needed this, but there are warnings about this in e.g. test_clip.py (and with it we are down to 311). Since we have quite some logic for GeoSeries._constructor_expanddim, I opted to directly construct the dataframe (and fortunately we can actually do this in the DataFrame case as the Series._name being undefined as part of from_mgr is not an issue)

Would not surprise me if this is not the 'right' way of doing some of this, I've kept some tabs on some of the relevant pandas PRs but not across it all.

geopandas/geodataframe.py

martinfleis

While my knowledge of pandas internals is limited, this looks like a plausible solution to me.

martinfleis · 2023-11-20T18:25:01Z

geopandas/geodataframe.py

+    def _constructor_sliced_from_mgr(self, mgr, axes):
+        is_row_proxy = mgr.index.is_(self.columns)
+
+        assert isinstance(mgr, SingleBlockManager)


Do we need this line for any reason? If we need to do this check, can it be a TypeError instead? Removing it would also allow use removing an import from pandas.core.internals.

Yeah, I don't think it is needed, this assertion should be guaranteed by pandas (I wrote it in my PR, so I assume that's why Matt copied it)

Yep, borrowed from Joris, but also the check below only makes sense if len(mgr.blocks)==1 i.e. a singleblockmanager. (if the assertion ever not to hold, I don't think we gain a lot by having the assertion vs having some exotic crash in the subsequent pandas code. Happy to remove.

martinfleis · 2023-11-20T18:28:37Z

geopandas/geoseries.py

@@ -74,6 +68,17 @@ def _geoseries_expanddim(data=None, *args, **kwargs):
    return df


+def _geoseries_expanddim(data=None, *args, **kwargs):
+    # pd.Series._constructor_expanddim == pd.DataFrame


Suggested change

# pd.Series._constructor_expanddim == pd.DataFrame

I think the comment is fine to keep to understand what logic _geoseries_expanddim is supposed to emulate

I'll just add a second sentence to give some context to it.

martinfleis · 2023-11-20T18:29:11Z

geopandas/geoseries.py

+    df = pd.DataFrame(data, *args, **kwargs)
+    if isinstance(data, GeoSeries):
+        # pandas default column name is 0, keep convention
+        geo_col_name = data.name if data.name is not None else 0


Can we add a test covering this line?

This also doesn't seem to be covered on main. This block was added in #2296, which was a complex PR and includes a bunch of tests (including a test that ensures to_frame() returns a GeoDataFrame with "0" as column name), but so maybe this wasn't needed in the final state of that PR (it went through some iterations)

Update: appears to be redundant in my local tests. Seems plausible to me it could have been redundant in that PR in the end.

Had a quick look and it appears we still have codecov corresponding to that branch, which indicates it was covered then (https://app.codecov.io/gh/geopandas/geopandas/commit/aa80aa851ad5ee385aa67e7c153c9f3626b11951/blob/geopandas/geoseries.py).

I'm guessing this was then some compatibility thing for an old version of pandas which we no longer test against.

Yep, #3001 dropped coverage of this https://app.codecov.io/gh/geopandas/geopandas/commit/b51d43ba182d776cbf073c7a796010efc63eee1d/indirect-changes

jorisvandenbossche · 2023-11-21T08:18:09Z

geopandas/geodataframe.py

+        if not any(isinstance(block.dtype, GeometryDtype) for block in mgr.blocks):
+            return pd.DataFrame._from_mgr(mgr, axes)


This is essentially the same logic as what we have in _geodataframe_constructor_with_fallback right?

Yes, that's the intent.
I suppose you could alternatively write this as something like

def _constructor_from_mgr(self, mgr, axes): df = GeoDataFrame._from_mgr(mgr, axes) return _geodataframe_constructor_with_fallback(df)

to make that more explicit (and/ or less confusing) - I suppose that's effectively what pandas is doing right now when it throws the warnings. I suppose this is arguably better? (though if the recommendation for a subclass to implement _constructor_from_mgr is to just call the subclasses _constructor, it seems like the process of splitting this out into its own method is superfluous)

I did also wonder if we now have enough _constructor variants where it makes sense to collect them together into a separate module but I think that would be a mess because of partially initalised imports.

I think it is fine to leave it as is. In any case, your snippet above would pass df to the GeoDataFrame constructor again (inside _geodataframe_constructor_with_fallback), so that would be unnecessary for this case.

though if the recommendation for a subclass to implement _constructor_from_mgr is to just call the subclasses _constructor, it seems like the process of splitting this out into its own method is superfluous

I think the problem is that pandas doesn't necessarily know that (and it also doesn't know if the subclass' _from_mgr will return a fully instantiated object)

jorisvandenbossche · 2023-11-21T08:19:20Z

geopandas/geodataframe.py

+    def _constructor_sliced_from_mgr(self, mgr, axes):
+        is_row_proxy = mgr.index.is_(self.columns)
+
+        assert isinstance(mgr, SingleBlockManager)


Yeah, I don't think it is needed, this assertion should be guaranteed by pandas (I wrote it in my PR, so I assume that's why Matt copied it)

jorisvandenbossche · 2023-11-21T08:22:01Z

geopandas/geoseries.py

@@ -74,6 +68,17 @@ def _geoseries_expanddim(data=None, *args, **kwargs):
    return df


+def _geoseries_expanddim(data=None, *args, **kwargs):
+    # pd.Series._constructor_expanddim == pd.DataFrame


I think the comment is fine to keep to understand what logic _geoseries_expanddim is supposed to emulate

jorisvandenbossche · 2023-11-21T08:30:23Z

geopandas/geoseries.py

+    df = pd.DataFrame(data, *args, **kwargs)
+    if isinstance(data, GeoSeries):
+        # pandas default column name is 0, keep convention
+        geo_col_name = data.name if data.name is not None else 0


This also doesn't seem to be covered on main. This block was added in #2296, which was a complex PR and includes a bunch of tests (including a test that ensures to_frame() returns a GeoDataFrame with "0" as column name), but so maybe this wasn't needed in the final state of that PR (it went through some iterations)

jorisvandenbossche · 2023-11-21T08:34:44Z

geopandas/geoseries.py

@@ -629,6 +634,11 @@ def _constructor_from_mgr(self, mgr, axes):
    def _constructor_expanddim(self):
        return _geoseries_expanddim

+    def _constructor_expanddim_from_mgr(self, mgr, axes):
+        df = pd.DataFrame._from_mgr(mgr, axes)
+        existing_name = mgr.axes[0]


I would have expected this needs to be mgr.axes[0][0] (first [0] to get the first axis (i.e. the columns' Index, in the BlockManager twisted order of (columns, index)) and the second [0] to get the single scalar name out of that Index object).
But if the tests are not failing ..?

I'll check, I think this is the same point as the missing coverage above, that this argument/ value might be redundant.

…ndas into pandas_constructor_apis

martinfleis

lgtm and get green CI again! 🚀

martinfleis · 2023-12-30T16:46:01Z

Thanks @m-richards! I believe that all of the @jorisvandenbossche's comments were resolved so merging to get it all green again.

jorisvandenbossche · 2024-01-25T16:47:49Z

@m-richards we should evaluate how safe it is to backport to this to release in 0.14.x.

It probably then also requires to backport one of your open PRs to address the regression?

…e things (geopandas#3080) Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

m-richards · 2024-01-26T02:07:41Z

I created #3159 to have a look at how safely this could be done (I imagine we don't want to merge that PR though because it might be confusing since it's a collection of PRs on main). But if it looks good I'll split out the relevant commit to add.

…e things (geopandas#3080) Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

…e things (#3080) Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

m-richards added 5 commits November 20, 2023 21:14

MAINT: add _constructor_from_mgr

6b4857e

COMPAT: old pandas versions

2989317

actually pandas 2.1 I think

e89b89e

COMPAT: add _constructor_sliced_from_mge

5d7d146

COMPAT: implement _constructor_expanddim_from_mgr

482c989

m-richards marked this pull request as ready for review November 20, 2023 11:27

CLN

9a5a5c5

martinfleis reviewed Nov 20, 2023

View reviewed changes

geopandas/geodataframe.py Outdated Show resolved Hide resolved

Update geodataframe.py

8ac6bff

martinfleis closed this Nov 20, 2023

martinfleis reopened this Nov 20, 2023

martinfleis reviewed Nov 20, 2023

View reviewed changes

jorisvandenbossche reviewed Nov 21, 2023

View reviewed changes

m-richards added 2 commits November 21, 2023 22:05

CLN: remove redundant logic and clean up

e5066ff

Merge branch 'pandas_constructor_apis' of github.com:m-richards/geopa…

8f4e4d7

…ndas into pandas_constructor_apis

martinfleis approved these changes Nov 27, 2023

View reviewed changes

martinfleis requested a review from jorisvandenbossche November 27, 2023 21:02

Merge remote-tracking branch 'upstream/main' into pr/m-richards/3080

9171793

martinfleis merged commit 1b3a305 into geopandas:main Dec 30, 2023
19 checks passed

m-richards deleted the pandas_constructor_apis branch December 31, 2023 06:23

martinfleis mentioned this pull request Jan 3, 2024

REGR: adding a row to a GeoDataFrame downcasts GeoSeries to Series #3119

Open

jorisvandenbossche mentioned this pull request Jan 5, 2024

MAINT: Reduce warnings in CI #2966

Merged

jorisvandenbossche mentioned this pull request Jan 25, 2024

Release 0.14.x #3068

Open

m-richards added a commit to m-richards/geopandas that referenced this pull request Jan 26, 2024

COMPAT: add pandas blockmanager alternative apis for _constructor lik…

694663b

…e things (geopandas#3080) Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

jorisvandenbossche pushed a commit to jorisvandenbossche/geopandas that referenced this pull request Jan 27, 2024

COMPAT: add pandas blockmanager alternative apis for _constructor lik…

941cea0

…e things (geopandas#3080) Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

jorisvandenbossche mentioned this pull request Jan 27, 2024

0.14.3 backports #3161

Merged

jorisvandenbossche pushed a commit that referenced this pull request Jan 31, 2024

COMPAT: add pandas blockmanager alternative apis for _constructor lik…

5898796

…e things (#3080) Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

snowman2 mentioned this pull request Jan 31, 2024

REGR: geometry column not found after groupby & column select #3165

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: add pandas blockmanager alternative apis for _constructor like things #3080

COMPAT: add pandas blockmanager alternative apis for _constructor like things #3080

m-richards commented Nov 20, 2023 •

edited

martinfleis left a comment

martinfleis Nov 20, 2023

jorisvandenbossche Nov 21, 2023

m-richards Nov 21, 2023

martinfleis Nov 20, 2023

jorisvandenbossche Nov 21, 2023

m-richards Nov 21, 2023

martinfleis Nov 20, 2023

jorisvandenbossche Nov 21, 2023

m-richards Nov 21, 2023

m-richards Nov 21, 2023

jorisvandenbossche Nov 21, 2023

m-richards Nov 21, 2023 •

edited

jorisvandenbossche Nov 21, 2023

jorisvandenbossche Nov 21, 2023

jorisvandenbossche Nov 21, 2023

jorisvandenbossche Nov 21, 2023

jorisvandenbossche Nov 21, 2023

m-richards Nov 21, 2023

martinfleis left a comment

martinfleis commented Dec 30, 2023

jorisvandenbossche commented Jan 25, 2024

m-richards commented Jan 26, 2024

		if not any(isinstance(block.dtype, GeometryDtype) for block in mgr.blocks):
		return pd.DataFrame._from_mgr(mgr, axes)

COMPAT: add pandas blockmanager alternative apis for _constructor like things #3080

COMPAT: add pandas blockmanager alternative apis for _constructor like things #3080

Conversation

m-richards commented Nov 20, 2023 • edited

martinfleis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-richards Nov 21, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinfleis left a comment

Choose a reason for hiding this comment

martinfleis commented Dec 30, 2023

jorisvandenbossche commented Jan 25, 2024

m-richards commented Jan 26, 2024

m-richards commented Nov 20, 2023 •

edited

m-richards Nov 21, 2023 •

edited