# Notebook 2: Intro to vector data

In Notebook 1, we created a basic land use raster plot of the OU campus and the area surrounding it. In this notebook, we'll start to work with vector data. In particular, we'll work with a shapefile of Michigan city boundaries so that we can see which municipalities are part of the OU campus. As described in the section on vector files, we can find boundary shapefiles at [https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html). I've already downloaded the one for Michigan.

In [None]:
from pathlib import Path
import geopandas as gpd


In [None]:
mi_places_file = Path('../data', 'cb_2022_26_place_500k', 'cb_2022_26_place_500k.shp')
mi_places_gdf = gpd.read_file(mi_places_file)
mi_places_gdf

Notice the `geometry` column contains POLYGON objects corresponding to the boundary for each place.

### Question

Let's find the records for Auburn Hills and Rochester Hills as these are relevant for the OU campus. We can use the pandas `query` method on a `GeoDataFrame` since it's just an extension of a pandas `DataFrame`. Store the answer in a new `GeoDataFrame` named `ou_places_gdf`.

In [None]:
# Get records for Auburn Hills and Rochester Hills

### Answer

In [None]:
ou_places_gdf = mi_places_gdf.query('NAME == "Auburn Hills" or NAME == "Rochester Hills"')
ou_places_gdf

### Plotting vector data in a `GeoDataFrame`

GeoPandas provides a `plot()` function for `GeoDataFrame` objects. As you might have guessed, it's using matplotlib to actually generate the plots. In general, any style options that you use in matplotlib can be passed to `plot()`. A few useful resources from the GeoPandas docs are:

- https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html
- https://geopandas.org/en/stable/docs/user_guide/mapping.html

In [None]:
ou_places_gdf.plot()

It's easy to create a [chloropleth map]() by passing in a column name to use as a basis for color selection. Often we might use something like population or some other metric of interest. We don't really have such a column in our `GeoDataFrame` but can use any column that has different values for the cities if we want to simply make the individual cities clear. I'll use "GEOID".

In [None]:
ou_places_gdf.plot(column="GEOID")

You can tell by the axis labels that we are working in longitude (x) and latitude (y).

Let's explore the Rochester Hills polygon.

In [None]:
rh_polygon = ou_places_gdf.iloc[1]['geometry']

Autoprinting a **shapely** object results in a little shape plot.

In [None]:
rh_polygon

If you want to see the actual verticies, use `print`.

In [None]:
print(rh_polygon)

Polygons are just a collection of points with the first point and last point being identical. The `POLYGON` object is actually defined as a class in the [shapely]() library. Shapely makes it easy to work with points, curves, and surfaces with Python. Under the hood, Shapely uses the [GEOS](https://libgeos.org/) library:

> GEOS is a C/C++ library for [computational geometry](https://en.wikipedia.org/wiki/Computational_geometry) with a focus on algorithms used in [geographic information systems](https://en.wikipedia.org/wiki/Geographic_information_system) (GIS) software. It implements the OGC [Simple Features](https://en.wikipedia.org/wiki/Simple_Features) geometry model and provides all the spatial functions in that standard as well as many others. GEOS is a core dependency of [PostGIS](https://postgis.net/), [QGIS](), [GDAL](), [Shapely]() and many others.


Let's do some simple shape manipulations to understand vector data a little better.

### Points, lines and polygons

The fundamental building blocks of vector data are points, lines, and polygons. In Shapely, these correspond to the `Point`, `LineString`, and `Polygon` classes. 

In [None]:
from shapely import Point, LineString, LinearRing, Polygon

Let's start with some simple features in the standard x-y coordinate system centered at (0,0).

In [None]:
point_1 = Point(2, 3)
point_2 = Point(1, 4)

Shapely has a bunch of built in methods and properties for working with geometric objects. For example, points have zero length and zero area.

In [None]:
print(f'point_1: {point_1}')
print(f'point_1 length: {point_1.length}')
print(f'point_1 area: {point_1.area}')
print(f'point_1 type: {point_1.geom_type}')

In [None]:
line_1 = LineString([point_1, point_2])
line_1

Its x-y bounding box is a (minx, miny, maxx, maxy) tuple.

In [None]:
line_1.bounds

In [None]:
print(f'line_1: {line_1}')
print(f'line_1 length: {line_1.length}')
print(f'line_1 area: {line_1.area}')
print(f'line_1 type: {line_1.geom_type}')

In [None]:
line_1.coords

In [None]:
list(line_1.coords)

In [None]:
line_1.coords[0]

It should be noted that there are no truly "curved" lines in shapely. Curves are approximated with piecewise linear splines.

Let's create a triangle.

In [None]:
polygon_1 = Polygon([(0, 0), (1, 1), (1, 0)])
polygon_1

In [None]:
list(polygon_1.exterior.coords)

Notice how the first point is duplicated as the last point. 

Can make holes by passing second list of point lists.

In [None]:
hole = LinearRing([(1, 0.50), (1.5, 0.50), (1.5, 0.75), (1, 0.75)])
hole

In [None]:
hole.length

In [None]:
polygon_2 = Polygon([(0, 0), (2, 2), (2, 0)], holes=[hole])
polygon_2

### Challenge: Creating a bounding box

Bounding boxes are commonly used in geospatial analysis to restrict a plot or some analysis to an area of interest. A bounding box is a rectangle (a type of POLYGON). Given what we learned about working with geometric objects above, create a minimal bounding box as a POLYGON object that contains Auburn Hills and Rochester Hills. I'm sure there are multiple ways to do this. Then plot the bounding box as well as the Auburn Hills and Rochester Hills polygons on a single plot. Hint: Shapely has some useful plotting methods. 

### Answer

In [None]:
ah_polygon = ou_places_gdf.iloc[0]['geometry']

In [None]:
print(ah_polygon.bounds)
print(rh_polygon.bounds)

In [None]:
bbox_xmin = min(ah_polygon.bounds[0], rh_polygon.bounds[0])
bbox_ymin = min(ah_polygon.bounds[1], rh_polygon.bounds[1])
bbox_xmax = max(ah_polygon.bounds[2], rh_polygon.bounds[2])
bbox_ymax = max(ah_polygon.bounds[3], rh_polygon.bounds[3])

# Create the POLYGON box from the bounds
bbox_ah_rh = Polygon([(bbox_xmin, bbox_ymin), 
                      (bbox_xmax, bbox_ymin), 
                      (bbox_xmax, bbox_ymax), 
                      (bbox_xmin, bbox_ymax)])

bbox_ah_rh

## Plotting a bounding box with additional vector data

Now let's plot the bounding box around RH and AH.

In [None]:
import matplotlib.pyplot as plt
from shapely.plotting import plot_polygon

In [None]:
fig, ax = plt.subplots()
ou_places_gdf.plot(ax=ax)
plot_polygon(bbox_ah_rh, ax=ax, add_points=False, color='green')

Another approach would be to try to merge the two polygons into a single polygon and then use the `bounds` property.

In [None]:
from shapely import MultiPolygon

In [None]:
ah_rh_polygon = MultiPolygon([ah_polygon, rh_polygon])
ah_rh_polygon

In [None]:
ah_rh_polygon.bounds

In [None]:
bbox_ah_rh.bounds