ENH: Method to sample points randomly from within geometries #2860

martinfleis · 2023-04-04T20:07:38Z

Ad discussed in person and during the dev call, to help the review process of #2363, it was decided to split the PR into multiple smaller ones dealing with one task per PR.

This PR implements the sampling based on samples either from a uniform distribution or using pointpats. Mostly based off #2363, with some minor changes and exposure of seed and random generator for better control.

martinfleis · 2023-04-04T21:07:31Z

The CI failure is unrelated and is also on main.

jorisvandenbossche

Looking good!

geopandas/base.py

geopandas/tools/_random.py

geopandas/tests/test_geom_methods.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…andas into simple_sampling

geopandas/tools/_random.py

m-richards

Looks good! A couple of minor documentation comments.

This is something I will be quite keen to use myself, and replace where I've written sampling code by hand.

doc/source/docs/user_guide/sampling.ipynb

geopandas/base.py

m-richards · 2023-04-14T10:07:09Z

geopandas/tools/_random.py

+    xmin, ymin, xmax, ymax = geom.bounds
+    candidates = []
+    while len(candidates) < size:
+        batch = points_from_xy(


I think this is fine as a first implementation, but this sampling is potentially quite wasteful, if you have a lot of points, and your first sample gets say 95% of size, you would only need to target for another 5% but this will then try to draw the full length of size again. (But this is perhaps better than drawing too few points by guessing how many sampled points will be accepted on the next iteration).

It is tricky to get the heuristic right if we wanted to use different size. It super depends on the convexity of each polygon. Maybe a less wasteful option would be to go with a number larger that size initially to have a higher chance of hitting the size at one go. But I'd leave that for later if needed.

Agree, sounds good

Yes, I kept the chunk size constant as a first pass to be conservative.

I think the statistically efficient method is to sample each round proportional to the hit rate times the remaining sample size, but that can cause the size of a round to get very large very quickly.

jorisvandenbossche · 2023-04-15T07:59:26Z

doc/source/docs/user_guide/sampling.ipynb

+   "source": [
+    "## Variable number of points\n",
+    "\n",
+    "You can also sample different number of points from different geometries if you pass an array specifying the size of the sample per geometry."


Do we allow to pass a column name as well? (instead of just the values, so I assume gdf.sample_points(gdf["col"]) works for sure)

As we have in plotting? Not now. Do we want to?

jorisvandenbossche

Can you just add a changelog note?

martinfleis · 2023-05-01T09:35:04Z

Can you just add a changelog note?

Done.

jorisvandenbossche · 2023-05-01T11:21:00Z

Failures are unrelated: the py38 is one is related to pyogrio (if that keeps failing, might be something with the 0.6.0 release), and the dev build is failing because of scikit-learn/scikit-learn#26290

jorisvandenbossche · 2023-05-01T11:21:28Z

Thanks @ljwolf and @martinfleis!

…as#2860)

random sampling based on geopandas#2363

ac16159

martinfleis added this to the 0.13 milestone Apr 4, 2023

martinfleis added 4 commits April 4, 2023 22:25

pointpats tests

638ca6e

fix test

f35c511

docs

7fb6e5d

clear notebook

662180f

martinfleis mentioned this pull request Apr 4, 2023

ENH: Method to sample points from within geometries #2363

Open

fix the skipif condition

db5cad8

martinfleis requested a review from jorisvandenbossche April 4, 2023 21:06

jorisvandenbossche reviewed Apr 6, 2023

View reviewed changes

martinfleis and others added 8 commits April 6, 2023 10:33

Apply suggestions from code review

4120b9b

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

rm generator

b3e308c

Merge branch 'simple_sampling' of https://github.com/martinfleis/geop…

af61379

…andas into simple_sampling

review comments

a16ec1c

simplify line sampling

6a94beb

fix generator test

01cb667

remove generator

e075c1c

Merge remote-tracking branch 'upstream/main' into simple_sampling

4be549d

jorisvandenbossche reviewed Apr 6, 2023

View reviewed changes

geopandas/tools/_random.py Show resolved Hide resolved

martinfleis requested a review from m-richards April 10, 2023 18:52

m-richards approved these changes Apr 14, 2023

View reviewed changes

martinfleis added 3 commits April 14, 2023 14:26

rtd env, fix docs

8ba7be9

Merge remote-tracking branch 'upstream/main' into simple_sampling

52d6b8b

remove problematic cell

d1e568e

jorisvandenbossche reviewed Apr 15, 2023

View reviewed changes

jorisvandenbossche approved these changes May 1, 2023

View reviewed changes

jorisvandenbossche changed the title ~~ENH: random sampling based on #2363~~ ENH: Method to sample points randomly from within geometries May 1, 2023

changelog

b2009b5

Merge remote-tracking branch 'upstream/main' into simple_sampling

92a16e5

jorisvandenbossche merged commit e8ddf25 into geopandas:main May 1, 2023
15 of 17 checks passed

This was referenced Aug 9, 2023

ENH: implement st_sample equivalent #2362

Open

seed keyword for random distributions? pysal/pointpats#114

Open

JohnMoutafis pushed a commit to JohnMoutafis/geopandas that referenced this pull request Nov 16, 2023

ENH: Method to sample points randomly from within geometries (geopand…

ba6933c

…as#2860)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Method to sample points randomly from within geometries #2860

ENH: Method to sample points randomly from within geometries #2860

martinfleis commented Apr 4, 2023 •

edited

Loading

martinfleis commented Apr 4, 2023

jorisvandenbossche left a comment

m-richards left a comment

m-richards Apr 14, 2023

martinfleis Apr 14, 2023

m-richards Apr 14, 2023

ljwolf Apr 15, 2023 •

edited

Loading

jorisvandenbossche Apr 15, 2023

martinfleis Apr 15, 2023

jorisvandenbossche left a comment

martinfleis commented May 1, 2023

jorisvandenbossche commented May 1, 2023

jorisvandenbossche commented May 1, 2023

ENH: Method to sample points randomly from within geometries #2860

ENH: Method to sample points randomly from within geometries #2860

Conversation

martinfleis commented Apr 4, 2023 • edited Loading

martinfleis commented Apr 4, 2023

jorisvandenbossche left a comment

Choose a reason for hiding this comment

m-richards left a comment

Choose a reason for hiding this comment

m-richards Apr 14, 2023

Choose a reason for hiding this comment

martinfleis Apr 14, 2023

Choose a reason for hiding this comment

m-richards Apr 14, 2023

Choose a reason for hiding this comment

ljwolf Apr 15, 2023 • edited Loading

Choose a reason for hiding this comment

jorisvandenbossche Apr 15, 2023

Choose a reason for hiding this comment

martinfleis Apr 15, 2023

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

martinfleis commented May 1, 2023

jorisvandenbossche commented May 1, 2023

jorisvandenbossche commented May 1, 2023

martinfleis commented Apr 4, 2023 •

edited

Loading

ljwolf Apr 15, 2023 •

edited

Loading