Skip to content

Commit

Permalink
ENH: Method to sample points randomly from within geometries (#2860)
Browse files Browse the repository at this point in the history
  • Loading branch information
martinfleis committed May 1, 2023
1 parent 35f7004 commit e8ddf25
Show file tree
Hide file tree
Showing 15 changed files with 573 additions and 7 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@ New features and improvements:
- Added a ``to_wgs84`` keyword to ``to_json`` allowing automatic re-projecting to follow
the 2016 GeoJSON specification (#416).
- ``to_json`` output now includes a ``"crs"`` field if the CRS is not the default WGS84 (#1774).
- Improve error messages when accessing the `geometry` attribute of GeoDataFrame without an active geometry column
- Improve error messages when accessing the `geometry` attribute of GeoDataFrame without an active geometry column
related to the default name `"geometry"` being provided in the constructor (#2577)
- Added ``sample_points`` method to sample random points from Polygon or LineString
geometries (#2860).

Deprecations and compatibility notes:

Expand Down
1 change: 1 addition & 0 deletions ci/envs/310-latest-conda-forge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ dependencies:
- xyzservices
- scipy
- geopy
- pointpats
# installed in tests.yaml, because not available on windows
# - postgis
- SQLalchemy<2
Expand Down
1 change: 1 addition & 0 deletions ci/envs/311-latest-conda-forge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dependencies:
- xyzservices
- scipy
- geopy
- pointpats
# installed in tests.yaml, because not available on windows
# - postgis
- SQLalchemy>=2
Expand Down
1 change: 1 addition & 0 deletions ci/envs/38-latest-conda-forge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ dependencies:
- libspatialite
- pyarrow
- pyogrio
- pointpats
1 change: 1 addition & 0 deletions ci/envs/39-latest-conda-forge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ dependencies:
- xyzservices
- scipy
- geopy
- pointpats
# installed in tests.yaml, because not available on windows
# - postgis
- SQLalchemy<2
Expand Down
1 change: 1 addition & 0 deletions doc/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ dependencies:
- pygeos
- xyzservices
- packaging
- pointpats
- pip
- pip:
- sphinx-toggleprompt
Expand Down
1 change: 1 addition & 0 deletions doc/source/docs/reference/geoseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ Constructive methods and attributes
GeoSeries.make_valid
GeoSeries.minimum_bounding_circle
GeoSeries.normalize
GeoSeries.sample_points
GeoSeries.simplify

Affine transformations
Expand Down
1 change: 1 addition & 0 deletions doc/source/docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ Advanced topics can be found in the :doc:`Advanced Guide <advanced_guide>` and f
Aggregation with dissolve <user_guide/aggregation_with_dissolve>
Merging data <user_guide/mergingdata>
Geocoding <user_guide/geocoding>
Sampling points <user_guide/sampling>
268 changes: 268 additions & 0 deletions doc/source/docs/user_guide/sampling.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "f02b5a40-29b6-4d46-abb5-f84d5ee4da56",
"metadata": {},
"source": [
"# Sampling Points"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f2fa64d9-7781-4357-8381-6ff64eff7379",
"metadata": {},
"source": [
"Learn how to sample random points using GeoPandas. \n",
"\n",
"The example below shows you how to sample random locations from shapes in GeoPandas GeoDataFrames."
]
},
{
"cell_type": "markdown",
"id": "ae0cc935-8940-4cfb-9a62-d3174fc77687",
"metadata": {},
"source": [
"## Import Packages\n",
"\n",
"To begin with, we need to import packages we'll use: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b3b0e6e-221c-4b8e-baf8-92bf07e806ea",
"metadata": {},
"outputs": [],
"source": [
"import geopandas\n",
"import geodatasets"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9e5ee97a-6647-4686-a3a0-f3dfb7228cd1",
"metadata": {},
"source": [
"For this example, we will use the New York Borough example data (`nybb`) provided by geodatasets. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d013124f-4af5-4380-bf45-aa5fd4887c63",
"metadata": {},
"outputs": [],
"source": [
"nybb = geopandas.read_file(geodatasets.get_path(\"nybb\"))\n",
"# simplify geometry to save space when rendering many interactive maps\n",
"nybb.geometry = nybb.simplify(200) "
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d4703589-f6b5-46a3-9540-ec9a52716747",
"metadata": {},
"source": [
"To see what this looks like, visualize the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec65cc25-2baa-431f-adea-9275c231ac47",
"metadata": {},
"outputs": [],
"source": [
"nybb.explore()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "99883485-2e67-4da8-a3d8-d4724ef8b2a1",
"metadata": {},
"source": [
"## Sampling random points\n",
"\n",
"To sample points from within a GeoDataFrame, use the `sample_points()` method.\n",
"To specify the sample sizes, provide an explicit number of points to sample. For example, we can sample 200 points randomly from each feature: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76d8658d-7ee8-4ef0-84b4-b8883c921687",
"metadata": {},
"outputs": [],
"source": [
"n200_sampled_points = nybb.sample_points(200)\n",
"m = nybb.explore()\n",
"n200_sampled_points.explore(m=m, color='red')"
]
},
{
"cell_type": "markdown",
"id": "14b8628f-3d5e-4fbb-b26e-28cc042ff755",
"metadata": {},
"source": [
"This functionality also works for line geometries. For example, let's look only at the boundary of Manhattan Island:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c47472fc-7ca5-4f69-b86f-93c25a9f2b03",
"metadata": {},
"outputs": [],
"source": [
"manhattan_parts = nybb.iloc[[3]].explode(ignore_index=True)\n",
"manhattan_island = manhattan_parts.iloc[[30]]\n",
"manhattan_island.boundary.explore()"
]
},
{
"cell_type": "markdown",
"id": "54280acd-e449-4528-ac7c-c1b2dbbcdc2f",
"metadata": {},
"source": [
"Sampling randomly from along this boundary can use the same `sample_points()` method:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "358822ef-a0c6-40cd-9e35-9a688c56f361",
"metadata": {},
"outputs": [],
"source": [
"manhattan_border_points = manhattan_island.boundary.sample_points(200)\n",
"m = manhattan_island.explore()\n",
"manhattan_border_points.explore(m=m, color='red')"
]
},
{
"cell_type": "markdown",
"id": "fa0f5a05-e7cc-44cf-b6ac-40afd9753f23",
"metadata": {},
"source": [
"Keep in mind that sampled points are returned as a single multi-part geometry, and that the distances over the line segments are calculated *along* the line. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0ccad29-d29f-4da1-9151-692bfd20d533",
"metadata": {},
"outputs": [],
"source": [
"manhattan_border_points"
]
},
{
"cell_type": "markdown",
"id": "125f19d9-b3f0-4ab9-b44a-41be1a8cc388",
"metadata": {},
"source": [
"If you want to separate out the individual sampled points, use the `.explode()` method on the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "265f2194-a94f-4a3d-9ae8-f7da55893e90",
"metadata": {},
"outputs": [],
"source": [
"manhattan_border_points.explode(ignore_index=True).head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "21a1a9d5",
"metadata": {},
"source": [
"## Variable number of points\n",
"\n",
"You can also sample different number of points from different geometries if you pass an array specifying the size of the sample per geometry."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b76e4cd8",
"metadata": {},
"outputs": [],
"source": [
"variable_size = nybb.sample_points([10, 50, 100, 200, 500])\n",
"m = nybb.explore()\n",
"variable_size.explore(m=m, color='red')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7f041b05-7c12-4a2c-a360-35bc5c79f1f4",
"metadata": {},
"source": [
"## Sampling from more complicated point pattern processes\n",
"\n",
"Finally, the `sample_points()` method can use different sampling processes than those described above, so long as they are implemented in the `pointpats` package for spatial point pattern analysis. For example, a \"cluster-poisson\" process is a spatially-random cluster process where the \"seeds\" of clusters are chosen randomly, and then points around these clusters are distributed according again randomly. \n",
"\n",
"To see what this looks like, consider the following, where ten points will be distributed around five seeds within each of the boroughs in New York City:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7868221c-ad0a-44e6-9f2f-41c4d6ac0fcf",
"metadata": {},
"outputs": [],
"source": [
"sample_t = nybb.sample_points(method='cluster_poisson', size=50, n_seeds=5, cluster_radius=7500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abd12d6e-533d-4808-86d7-9df4c097c077",
"metadata": {},
"outputs": [],
"source": [
"m = nybb.explore()\n",
"sample_t.explore(m=m, color='red')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "geopandas_dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"vscode": {
"interpreter": {
"hash": "b1fe2ae8565152c84d3dbd08488d3746f754c9bdf2de9b61cf939da5306d3793"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
5 changes: 4 additions & 1 deletion doc/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Further, optional dependencies are:
- `psycopg2`_ (optional; for PostGIS connection)
- `GeoAlchemy2`_ (optional; for writing to PostGIS)
- `geopy`_ (optional; for geocoding)
- `pointpats`_ (optional; for advanced point sampling)


For plotting, these additional packages may be used:
Expand Down Expand Up @@ -267,4 +268,6 @@ More specifically, whether the speedups are used or not is determined by:

.. _PyGEOS: https://github.com/pygeos/pygeos/

.. _packaging: https://packaging.pypa.io/en/latest/
.. _packaging: https://packaging.pypa.io/en/latest/

.. _pointpats: https://pointpats.readthedocs.io/en/latest/
3 changes: 3 additions & 0 deletions environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,6 @@ dependencies:
- mapclassify
# spatial access methods
- rtree>=0.9
# point sampling
- pointpats

Loading

0 comments on commit e8ddf25

Please sign in to comment.