### VERGE: Vector-mode Regional Geospatial Embedding

# Prepare coastline data for the region we are handling

For the VERGE effort,
one thing we want to know is the overall "land/water" polygon for a given tile. 
But OSM makes it a bit tricky to get that. Part of the problem is that 
world-wide coastlines are not included in OSM. 
Instead, we have to use a separately available
shapefile that gives that information. But since it's global it's unweildy.
So in this notebook we pull out the parts that are relevant for 
our study region. 


In [None]:
import geopandas
from rtree import index
import folium

In [None]:
# Define our region of interest.

# New Hampshire and Vermont
lat0, lon0 = 42.670095, -73.419252
lat1, lon1 = 45.386662, -70.897890


In [None]:
# Read the big file containing world-wide land/water areas.
fname = 'data/land-polygons-split-4326/land_polygons.shp'
global_gdf = geopandas.read_file(fname)
print('%d polygons globally' % len(global_gdf))

In [None]:
# Put all of those into a spatial index.
spatial_index = index.Index()

for idx, geom in enumerate(global_gdf.geometry):
    if geom is not None:
        spatial_index.insert(idx, geom.bounds)  # bounds = (minx, miny, maxx, maxy)


In [None]:
# Get the polygons covering our region.
query_bounds = (lon0, lat0, lon1, lat1)
matches = list(spatial_index.intersection(query_bounds))
regional_gdf = global_gdf.iloc[matches]
print('%d land/water polygons in our region' % len(regional_gdf))


In [None]:
# See what we got.
center_lon = (lon0 + lon1) / 2.0
center_lat = (lat0 + lat1) / 2.0

map_center = [center_lat, center_lon]
m = folium.Map(location=map_center, zoom_start=7)
for _, row in regional_gdf.iterrows():
    if row['geometry'].geom_type in ['Polygon', 'MultiPolygon']:
        geo_json = folium.GeoJson(row.geometry)
        geo_json.add_to(m)
m


In [None]:
# Save it.
fname = 'data/coastlines'
regional_gdf.to_file(fname, driver="ESRI Shapefile")