# Spatial joins

This type of operation is often referrd to as a join in both database and desktop GIS terminology and GeoPandas also uses the term join when referring to a spatial join.

We have seen previously that GeoPandas allows us to use a spatial predicate in many places where a logical expression based on attribute values are called for. The same applies to join operation.

Consider the raptor nest table and the county table.  If we had the name of the county in the raptor nest table we could join the county table to the raptor table on the base of the county name attribute.  We do not have the county name in the raptor table however.  What we have is the geometry of both the raptor nest and the county so we can join the two tables on the basis of the spatial relationship rather than an attribute relationship.

In [1]:
%matplotlib inline
import geopandas as gpd

raptor = gpd.read_file("data/Raptor_Nests.shp")
county = gpd.read_file("data/colorado_counties.shp")

We perform a spatial join using GeoPandas sjoin method.  Like the Pandas merge method it is called on the GeoPandas object itself and takes a left dataframe and a right dataframe as parameters.  We also have to specify a how parameter which can take the values "inner", "left", and "right". There is no "outer" option for the how parameter in the sjoin method.

With the sjoin method the how parameter determines not only the type of join but which GeoDataFrame's geometry will be used in the resulting GeoDataFrame.  With an inner or left join the left dataframe's geometry will be used.  With a right join the right dataframes geometry will ne used.

Finally we specify the spatial predicate in the op parameter.  Currently the only predicates allowed are "intersects", "contains", and "within"

In [2]:
raptor_cnty = gpd.sjoin(raptor, county, how="left", op="within")
raptor_cnty

  if await self.run_code(code, result, async_=asy):


Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,index_right,OBJECTID,COUNTYFP,Shape_Leng,Shape_Area,low,lowmod,LMMI,lowmoduniv,lowmod_pct,NAMELSAD10
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),62.0,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,Weld County
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),62.0,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,Weld County
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),22.0,23.0,043,3.087440,0.410131,10815.0,18520.0,26160.0,36180.0,0.511885,Fremont County
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),62.0,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,Weld County
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),62.0,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,Weld County
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
874,911.0,40.006950,-104.894370,2015-08-18,Red-tail Hawk,INACTIVE NEST,911,POINT (-104.98394 40.00297),7.0,8.0,014,1.174635,0.009158,9045.0,17315.0,29050.0,60355.0,0.286886,Broomfield County
875,912.0,39.998876,-104.900128,2015-09-01,Red-tail Hawk,INACTIVE NEST,912,POINT (-104.84766 39.96975),0.0,1.0,001,4.321021,0.322758,132545.0,231255.0,334950.0,467200.0,0.494981,Adams County
876,,,,2020-05-08,Northern Harrier,INACTIVE NEST,9991,POINT (-104.95039 40.24432),62.0,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,Weld County
877,,,,2020-05-05,SWHA,INACTIVE NEST,1001,POINT (-104.94502 40.24443),62.0,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,Weld County


Notice now that the raptor GeoDataFrame contains all the atribute data for the county that contains the point.  If we only want a few of the columns from the county GeoDataFrame we can simply subset that GeoDataFrame by column when we pass it to the sjoin method.

In [3]:
raptor_cnty = gpd.sjoin(raptor, county[['NAMELSAD10', 'geometry']], how="left", op="within")
raptor_cnty

  if await self.run_code(code, result, async_=asy):


Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry,index_right,NAMELSAD10
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),62.0,Weld County
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),62.0,Weld County
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),22.0,Fremont County
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),62.0,Weld County
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),62.0,Weld County
...,...,...,...,...,...,...,...,...,...,...
874,911.0,40.006950,-104.894370,2015-08-18,Red-tail Hawk,INACTIVE NEST,911,POINT (-104.98394 40.00297),7.0,Broomfield County
875,912.0,39.998876,-104.900128,2015-09-01,Red-tail Hawk,INACTIVE NEST,912,POINT (-104.84766 39.96975),0.0,Adams County
876,,,,2020-05-08,Northern Harrier,INACTIVE NEST,9991,POINT (-104.95039 40.24432),62.0,Weld County
877,,,,2020-05-05,SWHA,INACTIVE NEST,1001,POINT (-104.94502 40.24443),62.0,Weld County


Notice also that the result includes a index_right column that contains the index of the right GeoDataFrame that satisfied the spatial predicate.

Now we can do things like see how many nests are in each county using basic summary methods.

In [4]:
import pandas as pd

pd.pivot_table(raptor_cnty, index=['NAMELSAD10', 'recentspec'], values='Nest_ID', aggfunc='count')

Unnamed: 0_level_0,Unnamed: 1_level_0,Nest_ID
NAMELSAD10,recentspec,Unnamed: 2_level_1
Adams County,Red-tail Hawk,22
Adams County,Swainsons Hawk,15
Arapahoe County,Red-tail Hawk,3
Arapahoe County,Swainsons Hawk,5
Boulder County,Red-tail Hawk,21
Boulder County,Swainsons Hawk,19
Broomfield County,Red-tail Hawk,4
Broomfield County,Swainsons Hawk,2
Cheyenne County,Swainsons Hawk,1
Denver County,Swainsons Hawk,2


Notice also that now we DO have a column in the raptor_cnty table with the county name and so we COULD join with the colorado_county GeoDataFrame on the basis of an attribute relationship rather than a spatial relationship which would be more efficient.

In [5]:
pd.merge(raptor_cnty, county, how="left", on="NAMELSAD10")

Unnamed: 0,postgis_fi,lat_y_dd,long_x_dd,lastsurvey,recentspec,recentstat,Nest_ID,geometry_x,index_right,NAMELSAD10,OBJECTID,COUNTYFP,Shape_Leng,Shape_Area,low,lowmod,LMMI,lowmoduniv,lowmod_pct,geometry_y
0,361.0,40.267502,-104.870872,2012-03-16,Swainsons Hawk,INACTIVE NEST,361,POINT (-104.79595 40.29891),62.0,Weld County,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,"MULTIPOLYGON (((-104.97628 40.03305, -104.9752..."
1,362.0,40.264321,-104.860255,2012-03-16,Swainsons Hawk,INACTIVE NEST,362,POINT (-104.78897 40.22089),62.0,Weld County,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,"MULTIPOLYGON (((-104.97628 40.03305, -104.9752..."
2,1.0,38.650081,-105.494251,2014-07-28,Swainsons Hawk,INACTIVE NEST,1,POINT (-105.50223 38.68694),22.0,Fremont County,23.0,043,3.087440,0.410131,10815.0,18520.0,26160.0,36180.0,0.511885,"POLYGON ((-105.39542 38.69753, -105.39281 38.6..."
3,2.0,40.309574,-104.932604,2011-01-06,Swainsons Hawk,INACTIVE NEST,2,POINT (-104.84889 40.35215),62.0,Weld County,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,"MULTIPOLYGON (((-104.97628 40.03305, -104.9752..."
4,3.0,40.219343,-104.729246,2014-07-03,Swainsons Hawk,ACTIVE NEST,3,POINT (-104.74466 40.18571),62.0,Weld County,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,"MULTIPOLYGON (((-104.97628 40.03305, -104.9752..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
874,911.0,40.006950,-104.894370,2015-08-18,Red-tail Hawk,INACTIVE NEST,911,POINT (-104.98394 40.00297),7.0,Broomfield County,8.0,014,1.174635,0.009158,9045.0,17315.0,29050.0,60355.0,0.286886,"MULTIPOLYGON (((-105.09996 39.95799, -105.0999..."
875,912.0,39.998876,-104.900128,2015-09-01,Red-tail Hawk,INACTIVE NEST,912,POINT (-104.84766 39.96975),0.0,Adams County,1.0,001,4.321021,0.322758,132545.0,231255.0,334950.0,467200.0,0.494981,"POLYGON ((-103.86177 40.00123, -103.86147 40.0..."
876,,,,2020-05-08,Northern Harrier,INACTIVE NEST,9991,POINT (-104.95039 40.24432),62.0,Weld County,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,"MULTIPOLYGON (((-104.97628 40.03305, -104.9752..."
877,,,,2020-05-05,SWHA,INACTIVE NEST,1001,POINT (-104.94502 40.24443),62.0,Weld County,63.0,123,5.099761,1.106108,56122.0,100164.0,150579.0,264445.0,0.378771,"MULTIPOLYGON (((-104.97628 40.03305, -104.9752..."


If you expect to join these two tables frequently, especially if the tables are large, then you should consider doing the spatial join once and storing the common field permanently and then do an attribute join when needed in the future.