Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create GeoSeries.contains_properly method using point_in_polygon. #749

Merged
merged 104 commits into from
Nov 30, 2022

Conversation

thomcom
Copy link
Contributor

@thomcom thomcom commented Oct 21, 2022

Closes #743
Closes #744

Description

This PR closes the above named issues relating to creating a .contains method and, more importantly, resolving boundary case inconsistency with point_in_polygon.

As it stands the colinearity test I've added to is_point_in_polygon doubles the runtime of brute-force point_in_polygon and has no visible effect on the runtime of quadtree_point_in_polygon.

- Note I need to double check the above benchmark, having set this project down for the last few weeks.

This depends on #750, please do not review the C++ code here until that PR is merged. Please do review the python code.

Benchmark

Benchmark results are in, looks like there's no measurable speed difference between 22.12 pre-boundary exclusion and our current implementation:

(rapids) rapids@compose:~/cuspatial/python/cuspatial/benchmarks$ pytest api/bench_api.py::bench_point_in_polygon
================================================== test session starts ===================================================
platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/tcomer/mnt/NVIDIA/rapids-docker/cuspatial/python/cuspatial/benchmarks, configfile: pytest.ini
plugins: cov-4.0.0, benchmark-4.0.0, cases-3.6.13, xdist-3.0.2, anyio-3.6.2, hypothesis-6.58.1
collected 1 item                                                                                                         

api/bench_api.py .                                                                                                 [100%]


---------------------------------------------- benchmark: 1 tests ---------------------------------------------
Name (time in s)              Min     Max    Mean  StdDev  Median     IQR  Outliers     OPS  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------
bench_point_in_polygon     1.9636  1.9749  1.9678  0.0043  1.9660  0.0045       1;0  0.5082       5           1
---------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
=================================================== 1 passed in 16.28s ===================================================
(rapids) rapids@compose:~/cuspatial/python/cuspatial/benchmarks$ git status
On branch feature/GeoSeries.contains

vs branch-22.12

(rapids) rapids@compose:~/cuspatial/python/cuspatial/benchmarks$ pytest api/bench_api.py::bench_point_in_polygon
================================== test session starts ===================================
platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/tcomer/mnt/NVIDIA/rapids-docker/cuspatial/python/cuspatial/benchmarks, configfile: pytest.ini
plugins: cov-4.0.0, benchmark-4.0.0, cases-3.6.13, xdist-3.0.2, anyio-3.6.2, hypothesis-6.58.1
collected 1 item                                                                         

api/bench_api.py .                                                                 [100%]


---------------------------------------------- benchmark: 1 tests ---------------------------------------------
Name (time in s)              Min     Max    Mean  StdDev  Median     IQR  Outliers     OPS  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------
bench_point_in_polygon     1.9516  1.9843  1.9730  0.0126  1.9760  0.0127       1;0  0.5068       5           1
---------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
=================================== 1 passed in 16.61s ===================================
(rapids) rapids@compose:~/cuspatial/python/cuspatial/benchmarks$ git status
On branch benchmark/branch-22.12

Still adding:

  • Detailed description of xfail result.
  • Self-review existing .contains implementation in python.
  • Update .contains docs when necessary.
  • Benchmark again and document here.
  • Move binops_with_quadtree.py to next branch.
  • .contains Examples

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added libcuspatial Relates to the cuSpatial C++ library Python Related to Python code labels Oct 21, 2022
@github-actions github-actions bot added the cmake Related to CMake code or build configuration label Oct 21, 2022
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some more style requests, with an open question in the end. Great work!

python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Show resolved Hide resolved
python/cuspatial/cuspatial/core/binops/contains.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
python/cuspatial/cuspatial/core/geoseries.py Show resolved Hide resolved
python/cuspatial/cuspatial/tests/test_contains.py Outdated Show resolved Hide resolved
expected = gpdlhs.contains(gpdrhs).values
assert (got == expected).all()
got = rhs.contains_properly(lhs).values_host
expected = gpdrhs.contains(gpdlhs).values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, shouldn't you be using shapely.contains_properly() for the expected result as we discussed in the meeting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interfaces are not the same. Using shapely.contains_properly(x, y) is a method that takes two Shapely geometries and returns True or False. .contains is a GeoSeries method that operates on self and other. Refactoring these tests to use shapely only is not comparing apples to oranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively you can use a for-loop...

expected = pd.Series()
for lhs, rhs in zip(gpdlhs, gpdrhs):
    expected = pd.concat([expected, [shapely.contains_properly(lhs, rhs)])

Copy link
Member

@harrism harrism Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But comparing cuspatial.contains_properly to geopandas.contains is comparing apples to oranges.

@harrism
Copy link
Member

harrism commented Nov 30, 2022

BTW, does this support multipoint in polygon?

thomcom and others added 7 commits November 30, 2022 08:38
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Michael Wang <isVoid@users.noreply.github.com>
Co-authored-by: Mark Harris <mharris@nvidia.com>
@thomcom thomcom changed the title Create GeoSeries.contains method using point_in_polygon. Create GeoSeries.contains_properly method using point_in_polygon. Nov 30, 2022
@thomcom thomcom added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 30, 2022
@thomcom
Copy link
Contributor Author

thomcom commented Nov 30, 2022

BTW, does this support multipoint in polygon?

yes, there are tests for it.

@harrism
Copy link
Member

harrism commented Nov 30, 2022

@thomcom looks like you may have accidentally deleted all the tests (test_contains.py) in 4a651f1?

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few other tiny things.

are properly contained within the corresponding polygon. Polygon A contains Point B
properly if B intersects the interior of A but not the boundary (or exterior).

Note that polygons must be closed: the first and last vertex of each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in a Note section as well?

python/cuspatial/cuspatial/core/geoseries.py Outdated Show resolved Hide resolved
thomcom and others added 2 commits November 30, 2022 15:29
Co-authored-by: Mark Harris <mharris@nvidia.com>
@thomcom
Copy link
Contributor Author

thomcom commented Nov 30, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4ca88ff into rapidsai:branch-22.12 Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Related to CMake code or build configuration improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Related to Python code
Projects
Status: Done
3 participants