# Peak Summaries from USGS STN Flood Event Data
The United States Geological Survey (USGS) maintains a database of flood event data known as [Short-Term Network (STN)](https://stn.wim.usgs.gov/stnweb/#/). This database has a convenient [web front-end](https://stn.wim.usgs.gov/FEV/) and also a [RESTFul API](). Available data types include instruments, peak summaries, high water marks (HWMs), and sites. This notebook will investigate how to interact with peak summaries specifically on how to associate peak summaries with HWMs.

In [102]:
import pandas as pd
import folium

from pygeohydro import STNFloodEventData
from pygeohydro.us_abbrs import CONTIGUOUS

We will do filtered queries for every point in the contiguous US.

In [103]:
# turns list of contiguous states abbreviations into a comma separated string
states = ",".join(CONTIGUOUS)

# disable caching
ar_kwargs = {"disable" : True}

This code block retrieves the filtered data for both HWMs and peak summaries. 

In [104]:
hwms_fltrd = STNFloodEventData.get_filtered_data("hwms", async_retriever_kwargs=ar_kwargs, query_params={"States" : states})
peaks_fltrd = STNFloodEventData.get_filtered_data("peaks", async_retriever_kwargs=ar_kwargs, query_params={"States": states})

We want to be explicit as to the data type so we assign columns with that information.

In [105]:
hwms_fltrd["data_type"] = "hwm"
peaks_fltrd["data_type"] = "peak"

This section is intended to remove rows in the hwm and peak dataframes that don't have intersecting peak summary id's.

In [106]:
id_field = "peak_summary_id"

ids = set(hwms_fltrd.loc[:, id_field]).intersection(set(peaks_fltrd.loc[:, id_field]))

hwms_with_ids = (
    hwms_fltrd
    .loc[hwms_fltrd.loc[:,id_field].isin(ids), :]
    .reset_index(drop=True)
)
peaks_with_ids = (
    peaks_fltrd
    .loc[peaks_fltrd.loc[:,id_field].isin(ids), :]
    .reset_index(drop=True)
)

print(len(ids), len(hwms_with_ids), len(peaks_with_ids))

10819 15235 10819


The intent of the following section is to remove geometries that occur in both peaks and hwms datasets.

In [107]:
id_field = 'geometry'
ids = set(hwms_fltrd.loc[:, id_field]).symmetric_difference(set(peaks_fltrd.loc[:, id_field]))

hwms_with_ids = (
    hwms_with_ids
    .loc[hwms_with_ids.loc[:,id_field].isin(ids), :]
    .reset_index(drop=True)
)
peaks_with_ids = (
    peaks_with_ids
    .loc[peaks_with_ids.loc[:,id_field].isin(ids), :]
    .reset_index(drop=True)
)

print(len(ids), len(hwms_with_ids), len(peaks_with_ids))

17793 2146 423


Now we can plot the HWMs with peak summaries and peak summary points. All geometry duplicates have been removed across the two datasets.

In [108]:
map = hwms_with_ids.explore(color = "green")
map = peaks_with_ids.explore(color = "red", m=map)
map

Generally speaking, peaks appear to be located within waterbodies if one zooms in to neighborhood scales.

Now we will merge peak stages and discharges with the HWMs datasets.

In [115]:
hwm_peaks = hwms_with_ids.join(
    peaks_with_ids.loc[:,["peak_summary_id", "peak_stage", "peak_discharge"]],
    on='peak_summary_id',
    how='inner',
    lsuffix='_hwm',
    rsuffix='_peak'
)
hwm_peaks

Unnamed: 0,peak_summary_id,latitude,longitude,eventName,hwmTypeName,hwmQualityName,verticalDatumName,verticalMethodName,approvalMember,markerName,...,hwm_label,files,approval_id,uncertainty,hwm_uncertainty,geometry,data_type,peak_summary_id_peak,peak_stage,peak_discharge
