**[Geospatial Analysis Home Page](https://www.kaggle.com/learn/geospatial)**

---


# Introduction

In this tutorial, you'll explore several techniques for **proximity analysis**.  In particular, you'll learn how to do such things as:
- measure the distance between points on a map, and
- select all points within some radius of a feature.

In [None]:

import folium
from folium import Marker, GeoJson
from folium.plugins import HeatMap

import pandas as pd
import geopandas as gpd

# Function for displaying the map
def embed_map(m, file_name):
    from IPython.display import IFrame
    m.save(file_name)
    return IFrame(file_name, width='100%', height='500px')

You'll work with a dataset from the US Environmental Protection Agency (EPA) that tracks releases of toxic chemicals in Philadelphia, Pennsylvania, USA.

In [None]:
releases = gpd.read_file("../input/geospatial-learn-course-data/toxic_release_pennsylvania/toxic_release_pennsylvania/toxic_release_pennsylvania.shp") 
releases.head()

You'll also work with a dataset that contains readings from air quality monitoring stations in the same city.

In [None]:
stations = gpd.read_file("../input/geospatial-learn-course-data/PhillyHealth_Air_Monitoring_Stations/PhillyHealth_Air_Monitoring_Stations/PhillyHealth_Air_Monitoring_Stations.shp")
stations.head()

# Measuring distance

To measure distances between points from two different GeoDataFrames, we first have to make sure that they use the same coordinate reference system (CRS).  Thankfully, this is the case here, where both use EPSG 2272.

In [None]:
print(stations.crs)
print(releases.crs)

We also check the CRS to see which units it uses (meters, feet, or something else).  In this case, EPSG 2272 has units of feet.  (_If you like, you can check this [here](https://epsg.io/2272)._)

It's relatively straightforward to compute distances in GeoPandas.  The code cell below calculates the distance (in feet) between a relatively recent release incident in `recent_release` and every station in the `stations` GeoDataFrame.

In [None]:
# Select one release incident in particular
recent_release = releases.iloc[360]

# Measure distance from release to each station
distances = stations.geometry.distance(recent_release.geometry)
distances

Using the calculated distances, we can obtain statistics like the mean distance to each station.

In [None]:
print('Mean distance to monitoring stations: {} feet'.format(distances.mean()))

Or, we can get the closest monitoring station.

In [None]:
print('Closest monitoring station ({} feet):'.format(distances.min()))
print(stations.iloc[distances.idxmin()][["ADDRESS", "LATITUDE", "LONGITUDE"]])

# Creating a buffer

If we want to understand all points on a map that are some radius away from a point, the simplest way is to create a buffer.

The code cell below creates a GeoSeries `two_mile_buffer` containing 12 different Polygon objects.  Each polygon is a buffer of 2 miles (or, 2\*5280 feet) around a different air monitoring station.

In [None]:
two_mile_buffer = stations.geometry.buffer(2*5280)
two_mile_buffer.head()

We use `folium.GeoJson()` to plot each polygon on a map.  Note that since folium requires coordinates in latitude and longitude, we have to convert the CRS to EPSG 4326 before plotting.

In [None]:
# Create map with release incidents and monitoring stations
m = folium.Map(location=[39.9526,-75.1652], zoom_start=11)
HeatMap(data=releases[['LATITUDE', 'LONGITUDE']], radius=15).add_to(m)
for idx, row in stations.iterrows():
    Marker([row['LATITUDE'], row['LONGITUDE']]).add_to(m)
    
# Plot each polygon on the map
GeoJson(two_mile_buffer.to_crs(epsg=4326)).add_to(m)

# Show the map
embed_map(m, 'm1.html')

Now, to test if a toxic release occurred within 2 miles of **any** monitoring station, we could run 12 different tests for each polygon (to check individually if it contains the point).

But a more efficient way is to first collapse all of the polygons into a **MultiPolygon** object.  We do this with the `unary_union` attribute.

In [None]:
# Turn group of polygons into single multipolygon
my_union = two_mile_buffer.geometry.unary_union
print('Type:', type(my_union))

# Show the MultiPolygon object
my_union

We use the `contains()` method to check if the multipolygon contains a point.  We'll use the release incident from earlier in the tutorial, which we know is roughly 3781 feet to the closest monitoring station.

In [None]:
# The closest station is less than two miles away
my_union.contains(releases.iloc[360].geometry)

But not all releases occured within two miles of an air monitoring station!

In [None]:
# The closest station is more than two miles away
my_union.contains(releases.iloc[358].geometry)

# Your turn

In the **[final exercise](https://www.kaggle.com/kernels/fork/5822783)**, you'll investigate hospital coverage in New York City.

---
**[Geospatial Analysis Home Page](https://www.kaggle.com/learn/geospatial)**





*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum) to chat with other Learners.*