Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEOSPrepared<PRED>XY functions to C API #677

Merged
merged 5 commits into from Sep 12, 2022

Conversation

dbaston
Copy link
Member

@dbaston dbaston commented Sep 9, 2022

This PR adds a function GEOSPreparedXY to the C API to allow point-in-polygon tests without the overhead of point construction. The significance of this overhead depends on the data. When checking many points against a complex boundary, this PR makes almost no difference:

bin/perf_geospreparedcontains ~/data/australia.txt 100000

# Performing 100000 point-in-polygon tests.
# Reading shapes from /home/dan/data/australia.txt
# Read 1 geometries.
# GEOSPreparedContains: 32931 hits from 100000 points in 202,552
# GEOSPreparedContainsXY: 32931 hits from 100000 points in 196,158

With smaller polygons, it makes a measurable improvement (about 20%)

bin/perf_geospreparedcontains ~/data/wsa_individual.wkt 500

# Performing 500 point-in-polygon tests.
# Reading shapes from /home/dan/data/wsa_individual.wkt
# Read 2768 geometries.
# GEOSPreparedContains: 672198 hits from 500 points in 239,358
# GEOSPreparedContainsXY: 672198 hits from 500 points in 186,093

The C++ code surrounding prepared geometries is relatively complex, so rather than modify it to thread a Coordinate object through, I modified the C API GEOSContextHandle to hold a mutable Point object.

The existing API for modifying a Point is rather complex and slow. This is the best I could do:

struct SetCoordinateValue : public geos::geom::CoordinateFilter {
        SetCoordinateValue(double x, double y) : m_x(x), m_y(y) {}

        void filter_rw(Coordinate *c) const override {
            c->x = m_x;
            c->y = m_y;
        }

        double m_x;
        double m_y;
    };

    SetCoordinateValue filter(x, y);
    extHandle->point2d->apply_rw(&filter);
    extHandle->point2d->geometryChanged();

So I added a Point::setXY method to make this easier and faster.

If there is general support for this approach, I can document the new function and add XY variants in other cases where they would be useful (GEOSPreparedIntersectsXY, etc.)

Copy link
Member

@pramsey pramsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks A-OK to me.

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

Do the following make sense?

  • GEOSPreparedContainsProperlyXY
  • GEOSPreparedCoveredByXY
  • GEOSPreparedDisjointXY
  • GEOSPreparedIntersectsXY
  • GEOSPreparedTouchesXY

@pramsey
Copy link
Member

pramsey commented Sep 9, 2022

GEOSPreparedCoveredByXY feels a little pointless (ha ha ha) but perhaps in a completionist sense, might as well do it? It is possible to have both true and false results from it I guess.

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

Is it a synonym for GEOSPreparedContains if the second argument is a point?

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

Disregard, I think I'm confused by the wording of our docstrings.

geos/capi/geos_c.h.in

Lines 4299 to 4309 in c68119c

/**
* Using a \ref GEOSPreparedGeometry do a high performance
* calculation of whether the provided geometry is covered by.
* \param pg1 The prepared geometry
* \param g2 The geometry to test
* \returns 1 on true, 0 on false, 2 on exception
* \see GEOSCoveredBy
*/
extern char GEOS_DLL GEOSPreparedCoveredBy(
const GEOSPreparedGeometry* pg1,
const GEOSGeometry* g2);

We are testing if the prepared geometry is covered by the provided geometry, but the text implies (to me) the reverse.

So I should add GEOSPreparedCoversXY, not GEOSPreparedCoveredByXY.

And it is probably misleading to add GEOSPreparedTouchesXY, since it just defaults to the underlying Geometry implementation.

@pramsey
Copy link
Member

pramsey commented Sep 9, 2022

Is this not actually all achievable (performance-wise) by going:

  • Create a prepared geometry based on polygon input.
  • Create a coordinate sequence with one dummy entry.
  • Create a point based on that coordinate sequence.
  • For each point in my data set
    • Call GEOSCoordSeq_setXY() on the coordinate sequence
    • Call GEOSPrepared*() on the polygon and the point

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

There's an assumption that you stop touching the coordinate sequence once you use it to create a geometry. (We should document this!) There are at least two reasons for this:

  • There is no guarantee that the coordinate sequence you provide lives on in the geometry (we might move its contents into a new coordinate sequence.)
  • If you manipulate the coordinates directly, the geometry has no way to know that the envelope should be updated, so any calculation relying on the envelope will be wrong.

@pramsey
Copy link
Member

pramsey commented Sep 9, 2022

In that case does the existence of a CAPI GEOSPoint_setXY() abrogate the need for the multiple GEOSPrepared*XY() signatures?

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

It would meet the same need with fewer signatures. I think it's a bit less discoverable, though.

@pramsey
Copy link
Member

pramsey commented Sep 9, 2022

Another take on the signatures is: is there any substantial use for anything other than PreparedIntersectsXY? (if we want discoverability... all other use cases could be handled with Point_setXY). Or we could just add an example of using Point_setXY() to the examples fleet. At this point I'm just flailing. What does your spidey sense tell you?

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

Point_setXY seems like a useful addition regardless. I think the Prepared*XY signatures are useful because a user doesn't have to know that Point creation overhead can be important, and that reusing a point with Point_setXY is a way around it. You just start looking for how to do a PIP, and the simplest thing to call happens to also be the fastest. Do we need Covers, Contains, ContainsProperly, Intersects, and Disjoint? No, but do they hurt? (I don't think so, but it's a genuine question)

@dr-jts
Copy link
Contributor

dr-jts commented Sep 9, 2022

Do we need Covers, Contains, ContainsProperly, Intersects, and Disjoint? No, but do they hurt? (I don't think so, but it's a genuine question)

More to test, maintain and document.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 9, 2022

If you manipulate the coordinates directly, the geometry has no way to know that the envelope should be updated, so any calculation relying on the envelope will be wrong.

There is a Geometry::geometryChanged method that allows forcing envelope update. It's not exposed in the C API, except indirectly by GEOSGeom_transformXY_r.

Usually I'm not crazy about mutating geometries, but this seems like one case where it's justified.

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

More to test, maintain and document.

Yes, though keep in mind we're talking about five two-line functions.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 9, 2022

Because of the much more limited semantics of Point relationships I suggest the following functions:

  • GEOSPreparedIntersectsXY
  • GEOSPreparedContainsXY
  • GEOSPreparedContainsProperlyXY

Disjoint is just ! Intersects, and I suspect is rarely used.

Touches can be determined by Intersects(Boundary(geom), point). PreparedGeometry could provide a fast implementation for this for Point inputs - but doesn't at present.

@dbaston
Copy link
Member Author

dbaston commented Sep 9, 2022

I removed GEOSPreparedCoversXY and GEOSPreparedDisjointXY.

@dr-jts dr-jts added the Enhancement New feature or feature improvement. label Sep 9, 2022
@dbaston dbaston merged commit 194aed1 into libgeos:main Sep 12, 2022
@jorisvandenbossche
Copy link
Contributor

This is a nice addition! (working on exposing this in shapely)

I am wondering one thing: since this is restricted to checking against a Point, can there ever be a difference between Contains and ContainsProperly?

Contains allows common points on the boundary but requires at least one point in the interior, while ContainsProperly disallows common boundary points (only intersecting the interior).
But since in this case there is only a single point, also for Contains, this point needs to be in the interior, and thus this is equivalent to ContainsProperly?

@dr-jts
Copy link
Contributor

dr-jts commented Sep 24, 2022

I am wondering one thing: since this is restricted to checking against a Point, can there ever be a difference between Contains and ContainsProperly?

Great point (so to speak). You are right, there is no difference between Contains and ContainsProperly for single points. I suggest the GEOSPreparedContainsProperlyXY functions be removed (and some doc pointing out this equivalence be added to GEOSPreparedContainsXY).

dbaston added a commit that referenced this pull request Sep 24, 2022
@dr-jts dr-jts changed the title Add GEOSPreparedContainsXY to C API Add GEOSPrepared_Op_XY functions to C API Oct 5, 2022
@dr-jts dr-jts changed the title Add GEOSPrepared_Op_XY functions to C API Add GEOSPrepared_PRED_XY functions to C API Oct 5, 2022
@dr-jts dr-jts changed the title Add GEOSPrepared_PRED_XY functions to C API Add GEOSPrepared<PRED>XY functions to C API Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or feature improvement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants