<img src="Images/HSP2.png" />

$\textbf{HSP}^{\textbf{2}}\ \text{and}\ \textbf{HSP2}\ $ Copyright 2017 by RESPEC INC. and released under this [License](LegalInformation/License.txt)

# INTRODUCTION: MIGRATION OF HSPF TO HSP2 and Running HSP2

This tutorial notebook demonstrates how to use legacy UCI and WDM files to create
an HDF5 file for the new Python HSPF (HSP2). It shows how to use legacy PLTGEN files to plot results. Finally, it demonstrates at a high level the calibration process with HSP2 and IPython.

**Tutorial Contents**

 + Legacy HSPF Migration and File Functionality
     + Section 1: Importing UCI Files into HDF5
     + Section 2: Importing WDM Files into HDF5
     + Section 3: Importing PLTGEN Files and Pandas Module Functionality
 + Section 4: HSP2 Calibration Process
 

### Required Python imports  and setup 

##### Pandas is used for time series analysis, visualization, and interacting with the HDF5 file

In [None]:
import os
import site
site.addsitedir(os.getcwd().rsplit('\\',1)[0] + '\\')  # adds your path to the HSP2 software.

import numpy as np
import pandas as pd
pd.options.display.max_rows    = 18
pd.options.display.max_columns = 10
pd.options.display.float_format = '{:.3f}'.format  # display 3 digits after the decimal poin

import matplotlib.pyplot as plt
%matplotlib inline

from wdmtoolbox import wdmutil

import HSP2
import HSP2tools
HSP2tools.reset_tutorial()    # make a new copy of the tutorial's data
HSP2tools.versions()          # display version information below

Now that HSP2 is imported we can run the model and review the code through Introspection and Tab Completion to find the modules.

In [None]:
HSP2.run?

Delete the documentation panel by clicking on its **X** in the upper right corner.

### Setup filenames
Just for this preview Notebook.

In [None]:
uciname = 'TutorialData/TEST10.UCI'          
hdfname = 'TutorialData/tutorial.h5'
wdmname = 'TutorialData/TEST.WDM'

pltname = 'TutorialData/RCH900.6'

Note: make sure no HDF5 files exist at the start of this tutorial (since we want
to make them here!)

In [None]:
!del TutorialData\*.h5
!dir TutorialData\*.h5

## Section 1: Importing UCI Files into HDF5<a id='section1'></a>

HSPF was developed for a user input file (the UCI file) based on 80 column punch cards.
The format of data on each card was specific to the type of data it contained.
The sequence files for PERLND, IMPLND, and REACHES contain the format specifications to read the text file lines along
with other information such as default values (for unspecified data), maximum and minimum limits per data element, message
strings defining the meaning/use of each data element, and units type (English or Metric.)
The UCI Reader uses these sequence files to parse the user's UCI file.

The UCI Reader function currently is very nearly complete except for a few "tables" in the agri-chem modules that span multiple "cards".  These "multiple cards" are not yet combined into one table. These tables will be fixed when the associated HSPF modules are converted. 

A few data elements that are obsolete generate error messages to warn that they are being skipped.

The UCI Reader will write its results to the specified HDF5 file. It will create the HDF5 file if necessary.
If the HDF5 file already exists, it overwrites the UCI corresponding information.

The HDF5 file includes the UCI file information (except obsolete elements) plus some new tables.  For example, there is now a SAVE table for each module in PERLND,
IMPLND, and REACHES that specifies almost every computed timeseries and each segment. A one in the intersection
of a named timeseries and a segment will save the results to the HDF5 file, otherwise it is not saved. This allows fine control for saving results for post run analysis. By default, only the output flux timeseries are saved.
This tutorial shows how to save everything as an example of modifying the SAVE tables from the default.

A few timeseries which can be trivially computed from the other timeseries are not explictly named or saved. The View Perlnd, View Implnd, and View Reaches notebooks provide examples of calculating the "missing" timeseries.

### Run the  UCI Reader

Run the UCI Reader
The following shows the command to build the HSP2 HDF5 default file. This is a maintenance process and is discussed in the HSP2 Maintence Manual. It is shown for completeness, but is NOT intended to be executed in this tutorial.
HSP2.makeH5(h2file)

In [None]:
HSP2tools.readUCI(uciname, hdfname)

Use HDFView to examine the resulting HDF5 file. 

Some UCI tables not currently needed by HSP2 will be missing. They will be added when the associated HSPF modules are converted to HSP2.

## Section 2: Importing WDM Files into HDF5<a id='section2'></a>

The WDM reader must be run after the UCI reader because it uses the EXT_SOURCES table from the HDF5 file to determine
which timeseries to extract from the WDM file.

For each timeseries named in the EXT_SOURCES table, the entire timeseries is extracted from the WDM file, truncated to the simulation's start and ending date/time, converted (aggregated/disaggregated) to the required simulation timestep, and saved in the specified HDF5 file. Its name is converted from a number to a string prefaced by "TS". The UCI reader has already adjusted all references to the timeseries datasets to use this naming convention.

The WDM reader also extracts the metadata from the WDM file and attaches it to the timeseries group in the HDF5 file along with some other metadata.  Pandas also inserts its own metadata.

### WDM Reader LICENSE and Copyright

The UCI Reader uses Tim Cera's **wdmtoolbox**.  This code (version 0.8.2) was modified by RESPEC because HSPF UCI files that didn't contain a Constituent, Location, or Scenario attrributes will terminate with an error. Since the default HSPF test files don't have these attributes,  HSP2 can't run the test cases without this fix.

Some later wdmtoolbox releases do not easily install under Windows, so this is the only version known to work.

The wdmtoolbox is released under a GPL2 license and Tim Cera retains all rights to his module.

The  wdmtoolbox is only used by the optional WDM Reader HSP2 module.

### Run the  WDM Reader

In [None]:
HSP2tools.ReadWDM(wdmname, hdfname)

Other tools for legacy files will be discussed in later Tutorials.

### Run test10
Now show that this HDF5 file can run the HSPF Test10 simulation. The argument is the name of the HDF5 file containing all the information to define the run and to store the results.

In [None]:
HSP2.run(hdfname)

Note: The first time HSP2 is run, it takes a bit longer in order to perform a Just In Time, JIT, compilation.
Soon, HSP2 will cache this compiled code so this step will only happen once. Currently, each time you start an HSP2 Notebook, the JIT function will happen

Run this again to see the difference.

In [None]:
HSP2.run(hdfname)

## Section 3: HSP2 PLTGEN Functionality & Pandas Module

**PLTGEN File Format Assumptions**

 + Text, not binary file
 + Initial 4 characters can be ignored in each line
 + First 25 lines are header information
 
To find the column header information

 + The line containing the word "LINTYP" immediately proceeds the column header lines
 + The column headers stop at the first of
     + line 26
     + blank line (ignoring the first 4 characters)
     + finding a line starting with "Time series" (ignoring the first 4 characters)

To find the data (columns of time series data)

 + Line 26 is dummy data
 + Line 27 and on are actual lines with time series data
 + No entry is blank => all lines have the same number of entries (columns)

     

In [None]:
df = HSP2tools.readPLTGEN(pltname)
df.head()

Change column Headings to be shorter and Write to HDF5 File

In [None]:
df.columns = ['RO', 'ROVOL', 'SSED', 'ROSED', 'TEMP']

In [None]:
df.to_hdf('pltgen.h5', 'RCH600', data_columns=True, format='table')

In [None]:
df = pd.read_hdf('pltgen.h5','RCH600')
df.head()

In [None]:
df.describe()

In [None]:
df['RO'].plot(label = 'simulated flow (cfs)',figsize = (18,6))

Accessing a column from the DataFrame (like above), produces a time series
(Pandas Series). This allows resampling to other periods such as monthly and
annually (as shown in Tutorial 3). The Pandas resampling methods include mean, sum, last, first, max, and min which cover the PLTGEN methods.  (PLTGEN AVER is Pandas mean.)  Pandas actually provides many more methods and allows user defined methods.

In [None]:
# resample to monthly mean flow
df['RO'].resample('M').mean()

In [None]:
# resample to annual mean, meadian, and standard deviation of flow
dff = pd.DataFrame()
dff['mean']   = df['RO'].resample('A').mean()
dff['median'] = df['RO'].resample('A').median()
dff['std']    = df['RO'].resample('A').std()
dff

In [None]:
# resample for a time period and transform to 7-Day mean flow and then plot it 
df['RO']['1990-01-01' : '1995-06-30'].resample('7D').mean().plot(style='r--',figsize = (18,6))

In [None]:
# Create a function and call it to plot mean annual average flow
def ybar(df,label):
    grouped = df.groupby(lambda x: x.year).mean().plot(kind='bar',figsize = (18,6))
    plt.ylabel(label)
    return grouped

ybar( df['RO'],'cfs')

In [None]:
# Create a function and call it to plot mean monthly average flow
def mbar(df,label):
    grouped = df.groupby(lambda x: x.month).mean().plot(kind='bar',figsize = (18,6))
    plt.ylabel(label)
    return grouped

mbar(df['RO'],'cfs')

## Section 4: HSP2 Calibration Process

In [None]:
# review simulated flow from master hdf5 file prior to commencing calibration
tsMaster = pd.read_hdf(hdfname, '/RESULTS/RCHRES_R001/HYDR')
tsMaster.RO.describe()

In [None]:
simname  = 'SIM1'
datapath = '/PERLND/PWATER/PARAMETERS'

Read the INFILT and LZSN values currently in hdfname

In [None]:
df = pd.read_hdf(hdfname, datapath)
df[['INFILT', 'LZSN']]

Make a copy of this data, modify the values for the new simulation run

In [None]:
dfsim = df[['INFILT', 'LZSN']].copy()
dfsim.INFILT = 0.3
dfsim.LZSN = 12
dfsim

Save just the modified sim data in a subdirectory named for the simulation. Note, the subdirectory is automatically created.

In [None]:
dfsim.to_hdf(hdfname, simname+datapath, data_columns=True, format='table')

It might be good to check how this looks in the HDF5 file using HDFView or Compass.

Then run the simulation.

In [None]:
HSP2.run(hdfname, simpath=simname, reload=True, saveall=True)

In [None]:
tsSim  = pd.read_hdf(hdfname, simname + '/RESULTS/RCHRES_R001/HYDR')
tsSim.RO.describe()

In [None]:
plt.figure(figsize=(16,8))
plt.plot('RO', 'b--', data=tsMaster, label='Master')
plt.plot('RO', 'r',   data=tsSim,    label='Sim')
plt.title('Flow at Reach 1')
plt.ylabel('Flow {CFS}')
plt.legend(loc='best') 

### Water - Fluxes (Exteral Inflows and Outflows)

In [None]:
path = '/RESULTS/PERLND_P001/PWATER'

In [None]:
dfperlnd1 = pd.read_hdf(hdfname, simname+path)
dfperlnd1

The read returned all the data for PERLND1.

Water - Fluxes (Exteral Inflows and Outflows)

In [None]:
dfwex = dfperlnd1.resample('M').sum()
dfwex.T

Sum the months

In [None]:
dfsum = pd.DataFrame()
dfsum['Annual Total']= dfwex.sum()
dfsum

In [None]:
# sum the months
dfSum = pd.DataFrame()
dfSum['Annual Total'] = dfwex.T.sum(axis=1)
dfSum

In [None]:
with open('print_example.txt', 'w') as f:
    print >>f, dfwex.T
    print
    print >>f, dfSum
    

In [None]:
%pycat print_example.txt