[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jrkasprzyk/CVEN5393/blob/main/Colab%20Notebooks/streamflow_index-sequential.ipynb)

*This notebook is part of course notes for CVEN 5393: Water Resource Systems and Management, by Prof. Joseph Kasprzyk at CU Boulder.*

This notebook is an implementation of the Index Sequential Method, as published below:

Ouarda, TBMJ, JW Labadie, DG Fontane (1997) "Indexed Sequential Hydrologic Modeling for Hydropower Capacity Estimation" *Journal of the American Water Resources Association* 33(6): 1337-1349. [DOI](https://doi.org/10.1111/j.1752-1688.1997.tb03557.x)

**Getting Ready**

As usual, we import our libraries, and read in the data (make sure the Excel file is in your Colab; a copy is provided on Github).

In order to make the illustrative plots easier to see, we will just work on the Des Moines data from 1920-1940.

In [None]:
import pandas as pd #for dataframes and data processing
import numpy as np #for numerical computation
import matplotlib.pyplot as plt #for plotting
import sys #system functions
from scipy import interpolate #bring in only the interpolate function
import plotly.express as px #plotly express for fast interactive plotting
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [None]:
def convert_cms_to_mcm(val):
  # converts from cubic meters per second to millions of cubic meters
  # (assumes a length of time equal to one day)
  return val*24*60*60/1e6

In [None]:
dm_data = pd.read_excel('Des_Moines_River_flow.xls', index_col=0)

dm_inflow = convert_cms_to_mcm(dm_data['Average flow (m3/s)'].to_numpy(copy=True))

# only work with the data from 1920 to 1940
dm_short = convert_cms_to_mcm(dm_data['1920':'1940']['Average flow (m3/s)'].to_numpy(copy=True))

# Index Sequential Method

In the index sequential method (ISM), successive 'chunks' of data are taken from different starting points, allowing the record to 'wrap around' to create new sequences.

In the function, we 'double' the original record (i.e., concatenate it with itself), similar to when we did the Sequent Peak method. In addition to creating the sequence, we also save the index where the original value came from -- we use this to create an illustrative plot in this notebook, but this functionality also helps with more complicated methods such as K-Nearest Neighbor that also stores indices of various flow values.

Parameters of the ISM include `k`, the index that is used to skip the different starting points; and `trace_length` which is the desired length of the new sequence. The length of the trace can be related to the time horizon of the plan or decision that you are evaluating, such as the design life of new infrastructure.

In [None]:
def create_ism_traces(inflow, k, trace_length):

  #inputs
  # inflow: a numpy array
  # k: the value k indicates the number of timesteps to skip when creating new traces
  # trace_length: the desired length of the traces created

  #return: a 2d numpy array with columns as traces and rows as timesteps

  # the number of traces is a known function of
  # the index k and the total length of the inflow record
  num_traces = int(np.floor(len(inflow)/k))

  #print("num_traces=%d"%num_traces)

  traces = np.zeros((trace_length,num_traces))

  indices = np.zeros((trace_length,num_traces))

  # because the traces 'wrap around', we need a doubled record
  inflow_doubled = np.concatenate((inflow, inflow), axis=None)

  j = 0 # the starting point of this trace
  for i in range(num_traces):
    traces[:, i] = inflow_doubled[j:j+trace_length]
    indices[:, i] = range(j, j+trace_length)
    j = j+k

  return traces, indices


We implement the method on the 1920-1940 Des Moines timeseries. This is daily data. A value of `k=3*365` and `trace_length=5*365` means a start period that skips 3 years, and creates sequences that are 5 years long.

In [None]:
dm_traces, dm_indices = create_ism_traces(dm_short, 3*365, 10*365)

num_traces=6


Now we plot our results, converting the results into Data Frames for convenience.

In [None]:
trace_df = pd.DataFrame(dm_traces)
index_df = pd.DataFrame(dm_indices)

In [None]:
num_traces = len(trace_df.columns)

# length of the original record
length_original = len(dm_short)

fig = make_subplots(rows=num_traces,
                    cols=1,
                    shared_yaxes='columns',
                    shared_xaxes=True)

# the data colored blue comes from the original record,
# and the red data comes from after the record begins to repeat
for i in range(num_traces):
  fig.add_trace(go.Scatter(x=trace_df.index[index_df[i]<length_original],
                           y=trace_df[i][index_df[i]<length_original],
                           line_color='blue'
                           ),
              row=i+1, col=1)
  fig.add_trace(go.Scatter(x=trace_df.index[index_df[i]>=length_original],
                           y=trace_df[i][index_df[i]>=length_original],
                           line_color='red'
                           ),
              row=i+1, col=1)

fig.update_layout(height=0.75*1024, width=0.75*1280,
                  showlegend=False,
                  title_text="Index Sequential Results (Red: Repeating Data)")
fig.update_xaxes(tick0=1, dtick=365)

fig.show()

In [None]:
# a plot of the original data used to create traces
#fig_historical = px.line(dm_data['1920':'1940'])
#fig_historical.show()

# the below plots each trace in its own subplot
# the plot works, but it is busy!
#trace_df.plot(subplots=True)

# the below uses plotly express to plot
# all the traces on top of one another
# very busy!
#fig_flow = px.line(trace_df)
#fig_flow.show()