
Knowing the distance to coastline for an exposure is crucial for insurance rating applications because it helps insurers assess the risk of hazards like hurricanes, storm surges, and flooding, which are much more prevalent in coastal areas. This information allows insurers to make informed decisions about pricing, underwriting and reinsurance. Properties closer to the coast are generally at higher risk, leading to higher premiums for these properties. Insurance rating plans may use distance to coastline directly as an explanatory variable, with factors inversely proportional to distance to coastline. 

This article walks through how GeoPandas can be used to calculate distance to coastline for a collection of simulated latitude-longitude pairs in the Florida region, and how these exposure locations can be assigned to different risk levels based on the distance calculation. 

<br>


### Coastal Shapefiles

The United States Census Bureau provides shapefiles for state, county and ZCTA boundaries as well as roads, rails an coastlines (see full list [here](https://www2.census.gov/geo/tiger/TIGER2024/2024_TL_Shapefiles_File_Name_Definitions.pdf)). Shapefiles are a widely-used geospatial vector data format that store the geometric location and attribute information of geographic features, which can be represented as points, lines, or polygons. 


We being by downloading the COASTLINE zip archive available on the Census Bureau's [FTP site](https://www2.census.gov/geo/tiger/TIGER2024/COASTLINE/). The COASTLINE shapefile is loaded into GeoPandas (the STATE shapefile is also loaded for later use). We limit our analysis to the continental United States and filter out the Great Lakes. Inspecting the first few records:



In [None]:

import numpy as np
import pandas as pd
import geopandas as gpd

np.set_printoptions(suppress=True, precision=5)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


coastline_shp = "tl_2024_us_coastline.zip"
us_shp = "tl_2024_us_state.zip"


# Bounding box of lower 48 states. Remove Great Lakes.
xmin, ymin, xmax, ymax = -125, 24.6, -65, 50
coast = gpd.read_file(coastline_shp)
coast = coast.cx[xmin:xmax, ymin:ymax]
coast = coast[coast.NAME!="Great Lakes"].reset_index(drop=True)

# State boundaries.
states = gpd.read_file(us_shp)[["NAME", "geometry"]]
states = states.cx[xmin:xmax, ymin:ymax].reset_index(drop=True)

print(f"coast.shape : {coast.shape}")
print(f"states.shape: {states.shape}")

coast.head(10)



<br>

The coastline shapefile is comprised of ~3,000 LINESTRING objects. Let's get a count of geometries by NAME:

In [None]:

coast["NAME"].value_counts().sort_index()



<br>


<br> 

We can visualize the coastline by calling the `coast` GeoDataFrame's `plot` method:


In [None]:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(8, 6), tight_layout=True)
ax.set_title("Lower 48 Coastline", fontsize=9)
coast.plot(ax=ax, edgecolor="red", linewidth=1.0)
ax.axis("off")
plt.show()



<br> 

To overlay the coastline along with state boundaries, download the STATE shapefile from the Census Bureau's FTP site and plot them together:

In [None]:

fig, ax = plt.subplots(1, 1, figsize=(8, 6), tight_layout=True)
ax.set_title("Lower 48 States with Coastline", fontsize=9)
coast.plot(ax=ax, edgecolor="red", linewidth=1.50, linestyle="--")
states.boundary.plot(ax=ax, edgecolor="black", linewidth=0.50)
ax.axis("off")
plt.show()


<br>

Let's next generate synthetic latitude-longitude pairs from within the Florida bounding envelope. The envelope bounds can be obtained from Florida's geometry as follows:

In [None]:


# Get bounding box for each state.
states["bbox"] = states.geometry.map(lambda gg: gg.envelope.bounds)

# Put coordinates in separate columns.
states[["lon0", "lat0", "lon1", "lat1"]] = pd.DataFrame(states.bbox.tolist(), index=states.index)

states.head()



<br>



Let's draw the bounding region using folium:



In [None]:

import folium 

# Florida bounding box. 
lon0, lat0, lon1, lat1 = states[states.NAME=="Florida"].bbox.item()

mlat, mlon = (lat0 + lat1) / 2, (lon0 + lon1) / 2

m = folium.Map(
    location=[mlat, mlon], 
    zoom_start=6, 
    zoom_control=True, 
    no_touch=True,
    tiles="OpenStreetMap"
    )

folium.Rectangle(
    [(lat0, lon0), (lat1, lon1)], 
    fill_color="blue", fill_opacity=.05
    ).add_to(m)

m



<br>

Sampling from the bounding region highlighted above will result in many points in the Gulf. Let's narrow the sampling space:

In [None]:

lon0, lat0, lon1, lat1 = (-80.5, 26, -81.75, 28)

mlat, mlon = (lat0 + lat1) / 2, (lon0 + lon1) / 2

m = folium.Map(
    location=[mlat, mlon], 
    zoom_start=7, 
    zoom_control=True, 
    no_touch=True,
    tiles="OpenStreetMap"
    )

folium.Rectangle(
    [(lat0, lon0), (lat1, lon1)], 
    fill_color="blue", fill_opacity=.05
    ).add_to(m)

m

In [9]:

# Sample within bounds defined by lat0, lon0, lat1, lon1. 

nbr_locations = 50

rng = np.random.default_rng(516)

rlats = rng.uniform(low=lat0, high=lat1, size=nbr_locations)
rlons = rng.uniform(low=lon1, high=lon0, size=nbr_locations)
points = list(zip(rlats, rlons))



<br>


Visualizing the synthetic locations:


In [None]:

m = folium.Map(location=[mlat, mlon], zoom_start=8)

for lat, lon in points:

    folium.CircleMarker(
        location=[lat, lon], 
        radius=5, 
        color="red", 
        fill_color="red", 
        fill=True,
        fill_opacity=1
        ).add_to(m)

m



<br>

Next the `points` list needs to be represented as a GeoDataFrame, using the generated points as the geometry. We set `"crs=EPSG:4326"` representing longitude-latitude pairs. A policy_id is included as an identifier for each point. 


In [None]:

dfpoints = pd.DataFrame({
    "policy_id": [str(ii).zfill(7) for ii in range(len(points))],
    "lat": rlats, 
    "lon": rlons, 
})

# Create GeoDataFrame.
points = gpd.GeoDataFrame(
    dfpoints,
    geometry=gpd.points_from_xy(dfpoints.lon, dfpoints.lat),
    crs="EPSG:4326"
)

points.head(10)



<br>

With both the coastline shapefile and point data represented as GeoDataFrames, we execute the `sjoin_nearest` spatial join to get the distance from each point to the nearest coastline. First we need to set the crs to a projected coordinate system so the distances are returned in units of meters instead of degrees. Projected coordinate systems use linear units like meters or feet, which makes it easier to perform precise spatial measurements. Here we opt for the Conus Albers equal area conic projection (EPSG:5069).

In the call to `sjoin_nearest`, we specify "meters" for the `distance_col` argument. This column will hold the distance to the coastline for each point in `points` in units of meters. A miles column is added after the join. 


In [None]:

# Convert from GPS to  Conus Albers. 
points = points.to_crs("EPSG:5069")
coast = coast.to_crs("EPSG:5069")

# Perform spatial join. Covert meters to miles. 
gdf = gpd.sjoin_nearest(points, coast, how="left", distance_col="meters")
gdf["miles"] = gdf["meters"] * 0.000621371

# Get min, max and average distance to coast line.
min_dist = gdf.miles.min()
max_dist = gdf.miles.max()
avg_dist = gdf.miles.mean()

print(f"min. distance to coastline: {min_dist}")
print(f"max. distance to coastline: {max_dist}")
print(f"avg. distance to coastline: {avg_dist}")



<br>

### Rate Group Based on Distance to Coastline

Let's imagine a hypothetical rating plan that uses the following distances from the coastline to determine rates:

* 0 - 5 miles: very high risk
* 5 - 25 miles: high risk
* 25 - 50 miles: medium risk
* greater than 50 miles: low risk

<br>

A rolling join via `merge_asof` can be used to select the last row in the right DataFrame (group thresholds) whose `on` key is less than or equal to `gdf`'s key, which will be "miles" in both DataFrames.  

In [None]:

# Create dfgroups DataFrame. 
dfgroups = pd.DataFrame({
    "risk": ["very high", "high", "medium", "low"],
    "miles": [0., 5., 25., 50.]
})

# Assign risk group to each policy location.
gdf = gdf.sort_values("miles", ascending=True)
gdf = pd.merge_asof(gdf, dfgroups, on="miles", direction="backward")

gdf.head(10)



<br> 

Counting the number of policies per risk group:

In [None]:

gdf.risk.value_counts().sort_index()



<br>

Finally, we can assign each risk a different color and visualize the resulting risk groups with folium:


In [None]:

# Colors for each group.
dcolors = {
    "very high": "#f40002",
    "high": "#ff721f",
    "medium": "#fafc15",
    "low": "#268a6d"
}

gdf["color"] = gdf["risk"].map(dcolors)

m = folium.Map(location=[mlat, mlon], zoom_start=8)

for tt in gdf.itertuples():
    lat, lon, color = tt.lat, tt.lon, tt.color
    folium.CircleMarker(
        location=[lat, lon],
        radius=6, 
        color=color, 
        fill_color=color, 
        fill=True,
        fill_opacity=1
        ).add_to(m)

m



<br>
