Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: GeoArrow import #3301

Merged
merged 20 commits into from
Jun 7, 2024

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented May 20, 2024

Counterpart of #3219 for import Arrow data, adding from_arrow class method to GeoSeries/GeoDataFrame (mimicking other constructor methods we have like from_wkb)

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review May 24, 2024 15:12
@martinfleis martinfleis added this to the 1.0 milestone May 27, 2024
Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor notes regarding publicity of the helper functions.

Otherwise it looks okay to my eyes with a big caveat that I am unable to verify arrow-specific parts of the code. Maybe @kylebarron can help out here?

return None


def arrow_to_geopandas(table, geometry=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally public function? All the other IO helpers are considered private.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this file is public, so maybe we should start with the habit of naming the file with an underscore then?

(by using no underscore for this function, there is some distinction in this file between what are purely helpers only used in this file, and what are the functions actually used elsewhere)

But can certainly just add a underscore for now ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in this file is public, so maybe we should start with the habit of naming the file with an underscore then?

yeah, but there's no indication of that as of now :). Naming the files with an underscore will help, if you want to keep function names without.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on making the file name private

return GeoDataFrame(df, geometry=geometry or geom_fields[0][1])


def arrow_to_geometry_array(arr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Shouldn't this be private?


Specifically for GeoSeries.from_arrow.
"""
schema_capsule, _ = arr.__arrow_c_array__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use this approach instead: https://github.com/developmentseed/lonboard/blob/5812fcc4957f84436a97c79580cddcff25655f31/lonboard/_viz.py#L434-L460

This only imports the array once instead of twice, which may be preferred in the case that there's some costly transformation to geoarrow.

@@ -425,6 +549,9 @@ def construct_shapely_array(arr: pa.Array, extension_name: str):
with GeoArrow extension type.

"""
if isinstance(arr, pa.ExtensionArray):
arr = arr.storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arr.storage is guaranteed to always exist on pa.ExtensionArray? That's not added by geoarrow-pyarrow, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arr.storage is guaranteed to always exist on pa.ExtensionArray?

Indeed, that is defined in pyarrow on the base class

@jorisvandenbossche
Copy link
Member Author

Failure is a "urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>", so going to assume this is not related.

@jorisvandenbossche jorisvandenbossche merged commit 600e4f1 into geopandas:main Jun 7, 2024
19 of 20 checks passed
@jorisvandenbossche jorisvandenbossche deleted the geoarrow-import branch June 7, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants