# Time Series Analysis with Pandas Cont.

In hydrology and other Earth and environmental sciences, we're often interested in statistical outliers in our time series data. These often represent extreme values like floods and droughts for which we either need to be designing infrastructure (like bridges and culverts) around or developing management strategies to guard against. Below we explore some of the helpful tools that `Pandas` offers to perform time series analysis. One example involves estimating the 100-year flood in the Boise River, while the other example involves estimating a low-flow metric called the 7Q10 in the Boise River. 

Let's start by getting a relatively long (>100 years) of daily streamflow record.

In [None]:
from dataretrieval import nwis
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

startDate = '1920-01-01'
endDate = '2024-12-31'

gage = '13185000' # Boise River Near Twin Springs

BoiseRiverQ = nwis.get_dv(sites=gage, parameterCd='00060', start=startDate, end=endDate)[0]
BoiseRiverQ['00060_Mean'].plot()
plt.xlabel('Date')
plt.ylabel('Flow [ft\u00b3/s]')
plt.show()

## Annual Maximum Series for Flood Estimation

There are a number of ways to estimate the 100-year flood, but by definition it is the maximum annual daily flow that is equaled or exceeded (on average) no more than once every 100 years. We will use a relatively straightforward, non-parametric method called the Weibull plotting position formula. The Weibull plotting method involves assigning a non-exceedance probability to every annual maximum flow, based on its rank order. The Weibull probability for each annual maximum flow is calculated as:

$$
P_{NE} = \frac{r}{N + 1}
$$

where $r$ is the rank of that flow in the annual maximum series, and $N$ is the number of years in the analyzed data record. So, first we need to get the annual maximum series using the helpful `.groupby()` method:

In [None]:
BR_AnnMaxQ = BoiseRiverQ['00060_Mean'].groupby(BoiseRiverQ.index.year).max().values
BR_AnnMaxQ

Next, we need to sort these annual maximum streamflows:

In [None]:
BR_AnnMaxQsort = np.sort(BR_AnnMaxQ)
BR_AnnMaxQsort

Now, we compute a Weibull plotting position non-exceedance probability. Note that the calculation of this probability does __not__ involve the annual maximum flow itself.  

In [None]:
BR_weibull =(np.arange(BR_AnnMaxQsort.size) + 1)/(BR_AnnMaxQsort.size + 1)
BR_weibull

What does involve the annual maximum flows is how we plot the Weibull probabilities and the annual maximum values. Here, we plot the annual maximum values on the x-axis and the Weibull probability on the y-axis. Note we are plotting $1 - P_{NE}$, which is the __exceedance__ probability. We can then use this plot to read off, or interpolate for the value that is equalled or exceeded in only 1 out of 100 years, on average. 

In [None]:
fig = plt.figure(figsize=(12,10))
plt.rcParams.update({'font.size': 14})
plt.plot(BR_AnnMaxQsort, 1-BR_weibull,'k.')
plt.hlines(0.01, xmin=1000, xmax=BR_AnnMaxQsort.max(), colors='r')
plt.title('Boise River 100-year Return Period Flood Estimation')
plt.xlabel('Annual Maximum Streamflow [ft\u00b3/s]')
plt.ylabel('Exceedance Probability')
plt.grid(':')
plt.show()


## Boise River 7Q10 Estimation

The so-called 7Q10 is a streamflow time series metric that is often used to assess ecological risk for aquatic ecosystems. It has been shown to correlate with the occurrence of fish kills and other negative habitat occurrences. By definition the 7Q10 is the annual minimum 7-day average flow that occurs, on average, once every 10 years. Explicitly turning this into a set of instructions that we need to perform on a time series of daily streamflow data, we need to:

1. Compute the 7-day running average streamflow,
2. Find the minimum value for every year of record,
3. Perform a frequency analysis to determine the corresponding 7-day average annual minimum that occurs once every 10 years

In practice, this only differs from the flood frequency analysis in that we're interested in: (1) annual minimums of (2) 7-day running average streamflow. For the latter point, fortunately `Pandas` offers us the `.rolling()` method.

In [None]:
BR_7dayAnnMinQ = BoiseRiverQ['00060_Mean'].rolling(window=7).mean().groupby(BoiseRiverQ.index.year).min().values
BR_7dayAnnMinQ

Again, let's sort these, although it's not strictly necessary for the method we'll use. 

In [None]:
BR_7dayAnnMinQsort = np.sort(BR_7dayAnnMinQ)
BR_7dayAnnMinQsort

Let's use a histogram to get the empirical density and cumulative density functions.

In [None]:
[count, qbins_edge] = np.histogram(BR_7dayAnnMinQsort, bins=25)

Use the histogram information to get the cumulative density function:

In [None]:
CDF_7qAnnMin = count.cumsum()/count.sum()
CDF_7qAnnMin

Compute the bin centers:

In [None]:
qbins_ctr = (qbins_edge[0:-1] + qbins_edge[1:])/2

Plot the result and mark the 10% non-exceedence probability. 

In [None]:
fig = plt.figure(figsize=(12,10))
plt.rcParams.update({'font.size': 14})
plt.plot(qbins_ctr, CDF_7qAnnMin,'k-')
plt.hlines(0.1, xmin=150, xmax=qbins_edge.max(), colors='r')
plt.title('Boise River 7Q10 Estimation')
plt.xlabel('7-day Average Minimum Streamflow [ft\u00b3/s]')
plt.ylabel('Non-Exceedance Probability')
plt.grid(':')
plt.show()