BUG: check CRS authority via SRID number when reading from postgis #3329

nicholas-ys-tan · 2024-06-06T15:08:08Z

Should resolve #2545

It looks at the spatial_ref_sys table in postgis to look up the auth name instead of assuming EPSG. However, if for whatever reason the table does not exist, it should default back to current behaviour of assuming epsg.

Not sure if this is the best way though, I didn't want to pass the connection data into _df_to_geodf, so read it into a dataframe first. Would be nice if such a look up for the authority name could be done purely in pyproj.CRS but I couldn't find anything.

Please advise if there is a more aesthetic approach to this.

m-richards

Thanks @nicholas-ys-tan, this approach looks sound to me. To me it makes sense that we treat postgis rather than the proj.db as the point of truth for data coming out of postgis. I think passing in the dataframe rather than the connection makes sense - especially for the chunksize case as it means this is only queried once upfront.

m-richards · 2024-06-09T01:51:53Z

geopandas/io/sql.py

-                crs = "epsg:{}".format(srid)
-
+                if spatial_ref_sys_df is not None:
+                    entry = spatial_ref_sys_df.loc[spatial_ref_sys_df["srid"] == srid]


Given that spatial_ref_sys is not exhaustive, in principle this could be an empty match. It could be worth checking the length of this is zero and falling back to epsg as the authority with a warning. (I expect it's probably not straightfoward to find such an example for test coverage, but I think that's okay as this is an edge case of CRS in proj but not postgis, or messed up postgis where spatial_ref_sys is missing data)

Thanks @m-richards ,

I've added an additional check that the srid is in the spatial_ref_sys table, otherwise falling back to epsg and added a user warning when it does this. I've written a test to check this using mock.patch

geopandas/io/sql.py

jorisvandenbossche · 2024-06-10T09:06:07Z

geopandas/io/sql.py

+        spatial_ref_sys_sql = "SELECT * FROM spatial_ref_sys"
+        spatial_ref_sys_df = (
+            pd.read_sql(spatial_ref_sys_sql, con) if crs is None else None
+        )


Do you have an idea how large this table typically is? Because we are now reading this full table every time when calling read_postgis, and ideally we would already do the filtering step here in the query, instead of doing it on the dataframe. I know we don't know the srid value at this point for specifying it in the query, but I wonder if we could maybe pass con to _df_to_geodf and then do this query there with directly filtering on srid. We would then only have to "cache" this somehow for repeated calls to _df_to_geodf from a single read_postgis call.

I don't know for sure if it's typical, but my test postgis database seems to populate this with 8500 rows, not ideally to be fetching on every call.

Perhaps it would be cleaner to try and pre-process the first row to determine the srid so it / the crs can be passed in? - though this would be a bit messy in the generator case I guess.

(We could maybe try and parse the input sql and try and paste on a limit clause to do a single fetch for the srid and then the full query, but at that point we may as well be trying to parse the sql to call Find_SRID directly. I'd prefer if we didn't have any sql parsing though)

Mine is 8000 rows with approx 7.3mb

postgres=# SELECT pg_size_pretty( pg_total_relation_size('spatial_ref_sys') ); pg_size_pretty ---------------- 7280 kB (1 row)

That's a decent chunk of data to have to transfer each time. I would try to look into querying only the one srid value that is needed.

Thanks @jorisvandenbossche and @m-richards

I've pushed new commits where the connection is passed in and the SRID is queried from the SRS table with a filter in the sql statement, and cached.

geopandas/io/sql.py

jorisvandenbossche · 2024-06-17T09:23:44Z

geopandas/io/.sql.py.swp

This file was added by accident?

apologies, missed that, deleted

jorisvandenbossche · 2024-06-17T09:26:59Z

geopandas/io/sql.py

+                    warnings.warn(warning_msg, UserWarning, stacklevel=2)
+                    crs = "epsg:{}".format(srid)
+
+                if not spatial_ref_sys_df.empty:


This will error if spatial_ref_sys_df = None ?
I think you can put this part in an else clause of try/except/else, so that it is only executed if there was no exception

yes, I missed that would happen

added else and removed the consequent redundant spatial_ref_sys_df = None

jorisvandenbossche · 2024-06-17T09:35:20Z

geopandas/io/tests/test_sql.py

+        assert df.crs == "ESRI:54052"
+
+    @pytest.mark.skipif(not HAS_PYPROJ, reason="pyproj not installed")
+    @mock.patch("shapely.get_srid")


Maybe could also add a test mocking _get_spatial_ref_sys_df to raise an error? (to improve test coverage, otherwise would have to delete the spatial_ref_sys table to get something similiar?)

added additional test that would raise pandas.errors.DatabaseError

jorisvandenbossche · 2024-06-17T09:38:58Z

Can you also update with latest main to get the CI fix from #3339?

Co-authored-by: Matt Richards <45483497+m-richards@users.noreply.github.com>

nicholas-ys-tan · 2024-06-17T15:12:04Z

Can you also update with latest main to get the CI fix from #3339?

Rebased but still seem to have one failing CI due to codecov in windows, is this from my end?

martinfleis · 2024-06-17T15:22:41Z

is this from my end?

ignore that.

jorisvandenbossche

Thanks for the updates, looks good!

geopandas/io/sql.py

nicholas-ys-tan changed the title ~~BUGL check for authority of CRS based on SRID number~~ BUG: check for authority of CRS based on SRID number Jun 6, 2024

nicholas-ys-tan marked this pull request as ready for review June 7, 2024 12:50

m-richards approved these changes Jun 9, 2024

View reviewed changes

m-richards changed the title ~~BUG: check for authority of CRS based on SRID number~~ BUG: check CRS authority via SRID number when reading from postgis Jun 9, 2024

jorisvandenbossche reviewed Jun 10, 2024

View reviewed changes

m-richards reviewed Jun 16, 2024

View reviewed changes

geopandas/io/sql.py Outdated Show resolved Hide resolved

jorisvandenbossche reviewed Jun 17, 2024

View reviewed changes

nicholas-ys-tan and others added 9 commits June 18, 2024 00:49

BUG: check for authority based on SRID number

d746175

clean up

73323ba

catch if shapely returns srid not in spatial_ref_sys

234ce44

update crs extraction

9e480cf

Co-authored-by: Matt Richards <45483497+m-richards@users.noreply.github.com>

filter srid when reading SRS table

bc219a3

cache SRID, update docstring, update changelog

068d53d

remove duplicate code

942a5c0

Co-authored-by: Matt Richards <45483497+m-richards@users.noreply.github.com>

Fix failing test

52be520

add test, fix for spatial_ref_sys_df = None

0c13933

nicholas-ys-tan force-pushed the issue2545 branch from 73e5935 to 0c13933 Compare June 17, 2024 14:54

jorisvandenbossche approved these changes Jun 17, 2024

View reviewed changes

geopandas/io/sql.py Outdated Show resolved Hide resolved

geopandas/io/sql.py Outdated Show resolved Hide resolved

Apply suggestions from code review

df08a56

jorisvandenbossche merged commit fcf9bc2 into geopandas:main Jun 17, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: check CRS authority via SRID number when reading from postgis #3329

BUG: check CRS authority via SRID number when reading from postgis #3329

nicholas-ys-tan commented Jun 6, 2024 •

edited

Loading

m-richards left a comment

m-richards Jun 9, 2024

nicholas-ys-tan Jun 9, 2024

jorisvandenbossche Jun 10, 2024

m-richards Jun 10, 2024 •

edited

Loading

nicholas-ys-tan Jun 10, 2024

jorisvandenbossche Jun 10, 2024

nicholas-ys-tan Jun 11, 2024

jorisvandenbossche Jun 17, 2024

nicholas-ys-tan Jun 17, 2024

jorisvandenbossche Jun 17, 2024

nicholas-ys-tan Jun 17, 2024

jorisvandenbossche Jun 17, 2024 •

edited

Loading

nicholas-ys-tan Jun 17, 2024

jorisvandenbossche commented Jun 17, 2024

nicholas-ys-tan commented Jun 17, 2024 •

edited

Loading

martinfleis commented Jun 17, 2024

jorisvandenbossche left a comment

BUG: check CRS authority via SRID number when reading from postgis #3329

BUG: check CRS authority via SRID number when reading from postgis #3329

Conversation

nicholas-ys-tan commented Jun 6, 2024 • edited Loading

m-richards left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-richards Jun 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jun 17, 2024

nicholas-ys-tan commented Jun 17, 2024 • edited Loading

martinfleis commented Jun 17, 2024

jorisvandenbossche left a comment

Choose a reason for hiding this comment

nicholas-ys-tan commented Jun 6, 2024 •

edited

Loading

m-richards Jun 10, 2024 •

edited

Loading

jorisvandenbossche Jun 17, 2024 •

edited

Loading

nicholas-ys-tan commented Jun 17, 2024 •

edited

Loading