# Sampling Points

Learn how to sample random points using GeoPandas. 

The example below shows you how to sample random locations from shapes in GeoPandas GeoDataFrames.

## Import Packages

To begin with, we need to import packages we'll use: 

In [1]:
import geopandas
import geodatasets


import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas


For this example, we will use the New York Borough example data (`nybb`) provided by geodatasets. 

In [2]:
nybb = geopandas.read_file(geodatasets.get_path("nybb"))
# simplify geometry to save space when rendering many interactive maps
nybb.geometry = nybb.simplify(200) 

  return lib.simplify_preserve_topology(geometry, tolerance, **kwargs)


To see what this looks like, view the dataframe:

In [3]:
nybb

Unnamed: 0,BoroCode,BoroName,Shape_Leng,Shape_Area,geometry
0,5,Staten Island,330470.010332,1623820000.0,"MULTIPOLYGON (((970217.022 145643.332, 970547...."
1,4,Queens,896344.047763,3045213000.0,"MULTIPOLYGON (((1029606.077 156073.814, 103074..."
2,3,Brooklyn,741080.523166,1937479000.0,"MULTIPOLYGON (((1021176.479 151374.797, 102064..."
3,1,Manhattan,359299.096471,636471500.0,"MULTIPOLYGON (((981219.056 188655.316, 980873...."
4,2,Bronx,464392.991824,1186925000.0,"MULTIPOLYGON (((1012821.806 229228.265, 101250..."


Or visualize the data:

In [4]:
nybb.explore()

## Sampling random points

To sample points from within a GeoDataFrame, use the `sample_points()` method.
To specify the sample sizes, provide an explicit number of points to sample. For example, we can sample 200 points randomly from each feature: 

In [5]:
n200_sampled_points = nybb.sample_points(100)
m = nybb.explore()
n200_sampled_points.explore(m=m, color='red')

This functionality also works for line geometries. For example, let's look only at the boundary of Manhattan Island:

In [6]:
manhattan_parts = nybb.iloc[[3]].explode(ignore_index=True)
manhattan_island = manhattan_parts.iloc[[30]]
manhattan_island.boundary.explore()

Sampling randomly from along this boundary can use the same `sample_points()` method:

In [7]:
manhattan_border_points = manhattan_island.boundary.sample_points(200)
m = manhattan_island.explore()
manhattan_border_points.explore(m=m, color='red')

Keep in mind that sampled points are returned as a single multi-part geometry, and that the distances over the line segments are calculated *along* the line. 

In [8]:
manhattan_border_points

30    MULTIPOINT (979056.964 196224.446, 979296.473 ...
Name: sampled_points, dtype: geometry

If you want to separate out the individual sampled points, use the `.explode()` method on the dataframe:

In [9]:
manhattan_border_points.explode(ignore_index=True).head()

0    POINT (979056.964 196224.446)
1    POINT (979296.473 195799.014)
2    POINT (979308.191 195778.200)
3    POINT (979314.663 198841.730)
4    POINT (979364.258 199076.565)
Name: sampled_points, dtype: geometry

## Variable number of points

You can also sample different number of points from different geometries if you pass an array specifying the size of the sample per geometry.

In [10]:
variable_size = nybb.sample_points([10, 50, 100, 200, 500])
m = nybb.explore()
variable_size.explore(m=m, color='red')

## Sampling from more complicated point pattern processes

Finally, the `sample_points()` method can use different sampling processes than those described above, so long as they are implemented in the `pointpats` package for spatial point pattern analysis. For example, a "cluster-poisson" process is a spatially-random cluster process where the "seeds" of clusters are chosen randomly, and then points around these clusters are distributed according again randomly. 

To see what this looks like, consider the following, where ten points will be distributed around five seeds within each of the boroughs in New York City:

In [11]:
sample_t = nybb.sample_points(method='cluster_poisson', size=50, n_seeds=5, cluster_radius=7500)

In [12]:
m = nybb.explore()
sample_t.explore(m=m, color='red')