## 04.5: Working With Vector Data Using `Geopandas`

In this notebook, we illustrate how to carry out some basic operations on geospatial data that is in the form of vector data. Vector data differs significantly from raster data, which is the format of geospatial data we've been working with so far. The easiest way to conceptualize vector data is as a table of data containing columns representing variables or fields, and rows that represent geographic records or "features." Importantly, those each geographic records or features can consist of only one fundamental data type – points, lines (or segments), or polygons. In GIS platforms like ArcGIS or QGIS, this table is what's known as the "attribute table" that you can open when you right click on a shapefile/layer/coverage in the data explorer window. Each of the geographic records is associated with information that defines the location of that record in space. For example, point records will be associated (or "related") with their x- and y-coordinates in the coordinate reference system of the dataset. Line records or segments will be associated with two vertices that have associated x- and y- coordinates, as well as the length or distance between those vertices. Finally, a polygon record will be associated with the vertices and lines that enclose the polygon, as well as the area encompassed by those line segments. Line segments of polygons are also a bit more complicated in that they also contain information about which side of the line segment the polygon is located as you would "navigate" from one vertext to the next in a defined direction along the exterior of the polygon. Most commonly, the line segments are navigated such that the polygon is always to the *_right_* of a line segment when moving clockwise from one line segment to the next. 

With this in mind, it becomes clear why we call GIS data a "relational database." The database (table of data) has records (rows) that relate to some geographic feature (a point, line, or polygon). We have already seen taht the `pandas` library is very powerful for dealing with tabular data (in our case in the form of time series, where every record consisted of one interval/point in time). The `geopandas` library is simply an extension of the `pandas` framework that then associates each record in a data table with some geographic feature. While not a powerful and comprehensive GIS platform, it does allow us some pretty helpful utilities for creating, editing, reading, and performing some basic geospatial operations – like reprojection, clipping, and buffering – in Python; without the computational overhead of a full GIS GUI. In the notebook below, we carry out a common workflow that a hydrologist might need to do. Specifically, we will use an existing watershed boundary dataset (a polygon dataset) to clip another dataset that contains the delineated river network (a line dataset) over a broader region. 

## 1. Imports and Definitions

In [None]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt

nhd_file = '../data/NHD_H_1705_HU4_GPKG.gpkg' # Middle Snake (inc. Boise)
ws_file = '../data/ubrb/upper_boise_geometry.shp'

epsg_utm = 32613

In [None]:
ws_gdf = gpd.read_file(ws_file)
ws_gdf

In [None]:
ws_gdf.explore()

In [None]:
gpd.list_layers(nhd_file)

In [None]:
nhd_gdf = gpd.read_file(nhd_file, layer='NHDFlowline')
nhd_gdf

In [None]:
fig, ax = plt.subplots(figsize=(14,14))
ws_gdf.plot(color='lightblue', edgecolor='black', ax=ax)
nhd_gdf.plot(color='blue', linewidth=0.5, ax=ax)
ax.set_title('Upper Boise River Basin Hydrography', fontsize=16)
ax.set_xlabel('Longitude', fontsize=14)
ax.set_ylabel('Latitude', fontsize=14)
plt.show()

In [None]:
ws_gdf.crs

In [None]:
nhd_gdf.crs

In [None]:
ws_utm_gdf = ws_gdf.to_crs(epsg=epsg_utm)
ws_utm_gdf.crs

In [None]:
nhd_utm_gdf = nhd_gdf.to_crs(epsg=epsg_utm)
nhd_utm_gdf.crs

In [None]:
nhd_clipped_gdf = gpd.clip(nhd_utm_gdf, ws_utm_gdf)
nhd_clipped_gdf

In [None]:
fig, ax = plt.subplots(figsize=(12,12))
ws_utm_gdf.plot(color='lightblue', edgecolor='black', ax=ax)
nhd_clipped_gdf.plot(color='blue', linewidth=0.5, ax=ax)
ax.set_title('Clipped Upper Boise River Basin Hydrography', fontsize=16)
ax.set_xlabel('Easting (m)', fontsize=14)
ax.set_ylabel('Northing (m)', fontsize=14)
plt.show()