Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython attempt #210

Closed
wants to merge 7 commits into from
Closed

Cython attempt #210

wants to merge 7 commits into from

Conversation

caspervdw
Copy link
Member

@caspervdw caspervdw commented Oct 10, 2020

@brendan-ward Based on your work and our discussion in #197, the example project in http://github.com/caspervdw/cygeos and your working branch https://github.com/brendan-ward/pygeos/tree/add_cython I put together this PR that (at least locally) works.

And it also uncovers the major shortcoming of having "pygeos.c" in two Extensions: the GeometryObject in pygeos.lib does not equal the GeometryObject in pygeos.cythondemo.

So I guess we'll have to resort to C pointer arrays in PyCapsules for the objects that we need to share between the two extensions. (https://docs.python.org/3/extending/extending.html#providing-a-c-api-for-an-extension-module)

edit: Also this needs Cython >= 0.29

@caspervdw
Copy link
Member Author

@brendan-ward I copied your capsules approach from #197. It seems that adjusting the call signature of GetGeom from GeometryObject * to PyObject * solved the issue of having to define GeometryObject in Cython.

Copy link
Contributor

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caspervdw thanks for continuing to work on this. Based on further testing in my non-capsule branch, I came to the same conclusion. If we include pygeom.c as a source in multiple extensions, we get multiple types. Failures then range between validation errors in get_geom and outright segfaults in GeometryObject_FromGEOS.

Because I saw those segfaults on the creation side, I think this minimal test here needs to do the same, so that we are confident we can both get and set geometries. I think you should just be able to copy / adapt the C API I setup for this in #197 for PyGEOSCreateGeom and associated Cython wrappings.

However, it struck me that #197 gives us a more complete testing and demonstration of using Cython, since it exercises both getting and setting geometries using the C API, and exposing those into Python. Since there are gotchas throughout the stack, I think our first roughout of Cython in pygeos probably should exercise the major parts (#197 does).

Our other major differences are file layout and naming conventions (formatting changes could be reverted). Would you prefer instead for me to bring #197 in line with your layout / naming conventions here, but otherwise keep the implementation of get_parts intact for review?

I don't think there are issues with defining GeometryObject - the C struct - in Cython; it isn't the actual Geometry type, so it doesn't cause all the issues we saw around the type being created twice. Personally, I would keep GeometryObject in Cython because the C type helps make our variables and function signatures more clear - we are dealing in GeometryObjects rather than PyObjects where we have to track implicitly what type we're working with.

Originally I was going to try and make a better case for namespacing all C extension and Cython extension modules under pygeos.lib as a package than module. Your approach here is more in line with other Python / Cython projects (fiona, rasterio, etc).

Putting them all in a package has the advantage that downstream Python usage only needs to know that the functions are defined in pygeos.lib (and when modules are added / updated the relevant things can be imported into pygeos.lib.__init__.py, with some wrapping to avoid circular imports at cythonize / compile time). That feels like a cleaner API, and also gives us the ability to migrate some easy things from C to Cython while keeping all downstream usage exactly as is. Otherwise, downstream usage needs to know what specific Cython module they come from. I'm thinking this is mostly for other pygeos devs adding functions in the Python layer from existing Cython functions; for user-space functions, we always wrap what comes from Cython into Python functions.

Using the pygeos.lib package approach does lead to a slightly bigger changeset.

@caspervdw
Copy link
Member Author

However, it struck me that #197 gives us a more complete testing and demonstration of using Cython, since it exercises both getting and setting geometries using the C API, and exposing those into Python. Since there are gotchas throughout the stack, I think our first roughout of Cython in pygeos probably should exercise the major parts (#197 does).

You're right. Let's focus the work back on #197, I'll close this and start reviewing #197.

I don't think there are issues with defining GeometryObject - the C struct - in Cython; it isn't the actual Geometry type, so it doesn't cause all the issues we saw around the type being created twice. Personally, I would keep GeometryObject in Cython because the C type helps make our variables and function signatures more clear - we are dealing in GeometryObjects rather than PyObjects where we have to track implicitly what type we're working with.

Not sure if I agree on this. Defining C structs on two locations is a liability, and I think that the struct itself might better be opaque outside of pygeom.c. No need to access the ptr if you can use the get_geom function as well. Concerning the get_geom function, I think its call signature should be refactored into (PyObject *, GEOSGeometry **), because we actually use it to typecheck any PyObject. Take for instance

if (!get_geom(*(GeometryObject**)ip1, &in1)) {

Here, the PyObject is cast to a GeometryObject, but we actually can't know at that point if it is in fact a GeometryObject. So I think get_geom should not have GeometryObject in its signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants