<img src="Images/HSP2.png" />
This Jupyter Notebook Copyright 2017 by RESPEC, INC.  All rights reserved.

$\textbf{HSP}^{\textbf{2}}\ \text{and}\ \textbf{HSP2}\ $ Copyright 2017 by RESPEC INC. and released under this [License](LegalInformation/License.txt)

# TUTORIAL 7: Advanced $\textbf{HSP}^\textbf{2}$  Functionality

  + Section 1: [Using CSV files to update HDF5 UCI information](#csv)
  
  + Section 2: [Restarting a simulation at any time within a previous run](#restart)

  + Section 3: [General user information about data and code modules in HSP$^2$](#general)
  
  + Section 4: [Adding new user data to HSP$^2$](#newdata)
  
  + Section 5: [Adding a new module to HSP$^2$](#newmodule)
  
  + Section 6: [Special Functionality](#special)
 
  + Section 7: [Run HSP2 with a workflow (including QA/QC)](#workflow)

### Required Python imports  and setup 

In [1]:
import os
import site
site.addsitedir(os.getcwd().rsplit('\\',1)[0] + '\\')  # adds your path to the HSP2 software.

hdfname = os.path.join('TutorialData', 'Tutorial.h5')

import shutil
import numpy as np
from ipywidgets import Dropdown, Checkbox
import pandas as pd
pd.options.display.max_rows    = 17
pd.options.display.max_columns = 10
pd.options.display.float_format = '{:.2f}'.format  # display 2 digits after the decimal point

from matplotlib import pyplot as plt
%matplotlib inline

import HSP2
import HSP2tools

HSP2tools.reset_tutorial()    # make a new copy of the tutorial's data
HSP2tools.versions()          # display version information below

Unnamed: 0,Version
HSP2,0.7.7
HSP2tools,0.7.6
,
PYTHON,"2.7.14 |Anaconda custom (64-bit)| (default, Oc..."
IPYTHON,5.4.1
,
H5PY,2.7.0
MATPLOTLIB,2.1.0
NETWORKX,2.0
NUMBA,0.35.0+10.g143f70e.dirty


## Section 1: Using CSV files to update HDF5 UCI information
<a id='csv'></a>
CSV and XLSX (Excel Spreedsheets) files can be used to create, update, or add data to the HDF5 file.

### EXAMPLE: Modify one Parameter, one Flag, and one State Variable for selected segments

This example uses RECHRES HYDR, but works for any PERLND, IMPLND, or RCHRES operation.

First Read the original contents of the HDF5 file to review what is there.

In [2]:
pd.read_hdf(hdfname, 'RCHRES/HYDR/PARAMETERS')

Unnamed: 0,DB50,DELTH,FTBUCI,FTBW,KS,LEN,STCOR,IREXIT,IRMINV
R001,0.01,1.0,FT001,0.0,0.5,0.5,0.0,0,0.0
R002,0.01,20.0,FT002,0.0,0.5,0.25,0.0,0,0.0
R003,0.01,30.0,FT003,0.0,0.5,0.25,0.0,0,0.0
R004,0.01,40.0,FT004,0.0,0.5,2.0,0.0,0,0.0
R005,0.01,40.0,FT005,0.0,0.5,3.0,0.0,0,0.0


In [3]:
pd.read_hdf(hdfname, 'RCHRES/HYDR/STATE')

Unnamed: 0,COLIN1,COLIN2,COLIN3,COLIN4,COLIN5,...,OUTDG2,OUTDG3,OUTDG4,OUTDG5,VOL
R001,4.0,5.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,30.0
R002,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,0.0
R003,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,0.0
R004,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,0.0
R005,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,0.0


In [4]:
pd.read_hdf(hdfname, 'RCHRES/HYDR/FLAGS')

Unnamed: 0,AUX1FG,AUX2FG,AUX3FG,FUNCT1,FUNCT2,...,ODGTF2,ODGTF3,ODGTF4,ODGTF5,VCONFG
R001,1,1,1,1,1,...,0,0,0,0,0
R002,1,1,1,1,1,...,0,0,0,0,0
R003,1,1,1,1,1,...,0,0,0,0,0
R004,1,1,1,1,1,...,0,0,0,0,0
R005,1,1,1,1,1,...,0,0,0,0,0


### Now read CSV (XLSX, etc.) being used to update the HDF5 file
This file will have only one HYDR Parameter, one HYDR State, and one HYDR Flag for RCHRES segments.  Only three of the five RCHRES segments are specified. Any other segments will remain unchanged. This simulates a user's desire to modify the HDF5 file's data.

Now look at the CSV file in the TutorialData:

In [5]:
csvFile = os.path.join('TutorialData', 'rchres.csv')
pd.read_csv(csvFile)

Unnamed: 0,SEGMENT,DB50,VOL,AUX3FG
0,R001,0.01,25,0
1,R003,0.03,5,0
2,R005,0.05,10,0


See that there is a correspoding XLSX file:

In [7]:
xlsFile = os.path.join('TutorialData', 'rchres.xlsx')
pd.read_excel(xlsFile)

Unnamed: 0,SEGMENT,DB50,VOL,AUX3FG
0,R001,0.01,25,0
1,R003,0.03,5,0
2,R005,0.05,10,0


The csvReader has the following documentation string.

In [8]:
HSP2tools.csvReader?

You can kill the documentation screen using the X in the upper right hand corner.

Now given a csv file, the HDF5 file can be updated using this function.

In [9]:
HSP2tools.csvReader(hdfname, csvFile, 'RCHRES', 'HYDR')

##### Show the final results by reading the HDF5 file

In [10]:
pd.read_hdf(hdfname, 'RCHRES/HYDR/PARAMETERS')

Unnamed: 0,DB50,DELTH,FTBUCI,FTBW,KS,LEN,STCOR,IREXIT,IRMINV
R001,0.01,1.0,FT001,0.0,0.5,0.5,0.0,0,0.0
R002,0.01,20.0,FT002,0.0,0.5,0.25,0.0,0,0.0
R003,0.03,30.0,FT003,0.0,0.5,0.25,0.0,0,0.0
R004,0.01,40.0,FT004,0.0,0.5,2.0,0.0,0,0.0
R005,0.05,40.0,FT005,0.0,0.5,3.0,0.0,0,0.0


In [11]:
pd.read_hdf(hdfname, 'RCHRES/HYDR/STATE')

Unnamed: 0,COLIN1,COLIN2,COLIN3,COLIN4,COLIN5,...,OUTDG2,OUTDG3,OUTDG4,OUTDG5,VOL
R001,4.0,5.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,25.0
R002,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,0.0
R003,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,5.0
R004,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,0.0
R005,4.0,4.0,4.0,4.0,4.0,...,0.0,0.0,0.0,0.0,10.0


In [12]:
pd.read_hdf(hdfname, 'RCHRES/HYDR/FLAGS')

Unnamed: 0,AUX1FG,AUX2FG,AUX3FG,FUNCT1,FUNCT2,...,ODGTF2,ODGTF3,ODGTF4,ODGTF5,VCONFG
R001,1,1,0,1,1,...,0,0,0,0,0
R002,1,1,1,1,1,...,0,0,0,0,0
R003,1,1,0,1,1,...,0,0,0,0,0
R004,1,1,1,1,1,...,0,0,0,0,0
R005,1,1,0,1,1,...,0,0,0,0,0


Compare to the original values above to see that the parameter, flag, and state variable were changed.

Note: the CSV/Excel file may contain any combinations of PARAMETERS, STATES, or FLAGS. The content will be placed in the appropriate location in the HDF5 file automatically.

You can restart this tutorial and subsitute the XLSX file to see that this works as well. The command should look like

``` Python
HSP2tools.csvReader(hdfname, 'TutorialData/RCHRES.xlsx', 'RCHRES', 'HYDR')```

### FTABLES
This utility will also allow creating, modifying, or adding to FTABLES.

Again, read an FTable to see the current contents.

In [13]:
pd.read_hdf(hdfname, 'FTABLES/FT001')

Unnamed: 0,Depth,Area,Volume,Disch1,Disch2,Disch3
0,0.0,0.0,0.0,0.0,0.0,0.0
1,2.0,1.21,1.21,0.0,0.0,0.0
2,4.0,2.42,4.85,0.0,0.0,0.0
3,6.0,3.64,10.91,0.0,0.0,0.0
4,8.0,4.85,19.39,0.0,0.0,0.0
5,10.0,6.06,30.3,0.0,0.0,0.0
6,12.0,7.27,43.64,5.0,3.5,0.0
7,14.0,8.48,59.4,6.25,4.38,0.0
8,16.0,9.7,77.58,7.5,5.25,0.0
9,18.0,10.91,98.18,8.75,6.12,0.0


#### This example will modify the line with index 1 (change Depth from 2.0 to 2.5) and will add two new rows to the table
 
View the CSV file to see its contents.  The row with Index 1 will modify the depth. The other two rows add to the end of the FTable.

In [14]:
csvFile = os.path.join('TutorialData', 'ft1.csv')
pd.read_csv(csvFile)

Unnamed: 0,Index,Depth,Area,Volume,Disch1,Disch2,Disch3
0,14,25.0,14.0,200.0,15,9,0
1,15,30.0,25.0,500.0,16,10,20
2,1,2.5,1.21,1.21,0,0,0


Note, the column named Index is required in addition to the other columns found in this FTable. This insures the rows are placed in the correct location in the HDF5 table.  The rows can be in any order in the CSV or Excel file.

Now use the CSV file to update the FTable.

In [15]:
HSP2tools.csvReader(hdfname, csvFile, 'FTABLES', 'FT001')

In [16]:
pd.read_hdf(hdfname, 'FTABLES/FT001')

Unnamed: 0,Depth,Area,Volume,Disch1,Disch2,Disch3
0,0.0,0.0,0.0,0.0,0.0,0.0
1,2.5,1.21,1.21,0.0,0.0,0.0
2,4.0,2.42,4.85,0.0,0.0,0.0
3,6.0,3.64,10.91,0.0,0.0,0.0
4,8.0,4.85,19.39,0.0,0.0,0.0
5,10.0,6.06,30.3,0.0,0.0,0.0
6,12.0,7.27,43.64,5.0,3.5,0.0
7,14.0,8.48,59.4,6.25,4.38,0.0
8,16.0,9.7,77.58,7.5,5.25,0.0
9,18.0,10.91,98.18,8.75,6.12,0.0


Compare to the orginal to check that the proper modifications and additions were made.

Restore the tutorial data to the original state for the next section.

In [17]:
HSP2tools.reset_tutorial()    # make a new copy of the tutorial's data

WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'TutorialData'

## Section 2: Restarting a simulation at any time within a previous run
<a id='restart'></a>

This Section will show how to restart a simulation at any date between the original start and stop dates. 

The simulation start at the specified datetime (or closest available data point) and run to the simulation end datetime.

If the timeseries data have been extended first,  you can modify the simulation to run further than the original run.

This utility can be used to
+ run the simulation from a later time to cut the time amount of run time for parameter fitting perhaps by also shorting the simulation stop datetime.
+ use HSP2 in a "real-time" scenario where the timeseries have been extended with real or forecast data and the simulation started a bit before the new data to stabilize.

This utility works by finding the datetime closest to an existing datetime in a previous run of the simulation. It then extracts the values of the calculated state variables at this existing datetime and updates the HDF5 state variables with these new state values. It updates the simulation start datetime.

Note: some previous run needs to save all computed state timeseries.

#### First run the tutorial HDF5 file to insure calculated timeseries are present.

In [18]:
HSP2.run(hdfname, saveall=True)

TutorialData\Tutorial.h5 HDF5 File Not Found, QUITTING


Look at the one calculated timeseries.

In [19]:
ts = pd.read_hdf(hdfname, 'RESULTS/PERLND_P001/SNOW')['RAINF']
ts.plot()

IOError: File TutorialData\Tutorial.h5 does not exist

In many use cases, the original HDF5 simulation file might be first copied to another name. Then the new run with a later start datetime will be used to run a new simulation while the orginal data is preserved in the first file.

For simplicity, this tutorial will use the same HDF5 file. This updates the simulation "in place".

Now set the new start datetime, update the internal state tables, then run.

In [None]:
start = '04/01/1976 00:00'

HSP2tools.update_state(hdfname, start)

HSP2.run(hdfname)

Now display the resulting simulation data

In [None]:
ts2 = pd.read_hdf(hdfname, 'RESULTS/PERLND_P001/SNOW')['RAINF']
ts2.plot()

#### Check
Plot both timeseries together to show agreement.

In [None]:
ts.plot(color='y')
ts2.plot(color='r')

Reverse the order of plots to show one is not shielding the other.

In [None]:
ts2.plot(color='r')
ts.plot(color='y')

In [None]:
HSP2tools.reset_tutorial()    # make a new copy of the tutorial's data

## Section 3: General information about data and code modules in $\textbf{HSP}^\textbf{2}$ 
<a id='general'></a>

At the beginning of each OPSEQ command, the time series required for all the activities in that OPSEQ command are made available to all activities within the command. Many of the times series are specified by the EXT_SOURCES table.  The LINK table is used to determine if previously calculated results need to also be made available to join segments.

When each activity (like PWATER) within an OPSEQ command is started, the remaining UCI-like
required data is automatically made available to it.  This data includes the FLAGS, INITIALIZATIONS, PARAMETERS, FTABLE, and MONTHLY tables (as appropriate). $\textbf{HSP}^\textbf{2}$ adds the activity's SAVE table and allows other information defiend by the user.  Only the UCI-like data for the specific activity and segment is made available to the current code module.

Three Python dictionaries are used to transfer data to an activity's module when it is called:
+ **general** - this dictionary contains simulation level information such as the simulation start date/time and current simulation timestep.
+ **ui** - this dictionary contains the FLAGS, INITIALIZATIONS, PARAMETERS, MONTHLY, FTABLE tables data (as appropriate), and the SAVE tables information specific to the activity and segment.
+ **ts** - this dictionary contains time series for the current OPSEQ command. 
  + ts includes the traditional HSPF external time series (like precipitation). These time series may be stored at different frequencies and time intervals but are aggregated/disaggrated to the current timestep and truncated to the simulation start and stop datetimes. These are made available using the EXT_SOURCES table.
  + ts includes time series that allow parameters that are constant in HSPF to vary over time in $\textbf{HSP}^\textbf{2}$ as discussed in later in Section 4. They are also specified as entries in the EXT_SOURCES table.
  + ts includes previously calculated time series based on the LINKS table (which combines the HSPF SCHEMATIC and NETWORK tables.) These were stored at the timesteps specified by the DELT value in the OP_SEQUENCE table, but are aggregated/disaggregated to the DELT value for the current timestep.
  + ts includes some special time series datasets used internally by $\textbf{HSP}^\textbf{2}$ like LAPSE24, SEASONS, and Saturated Vapor Pressure table in order to easily allow users to substitute their own datasets if needed.
  + ts also contains all computed timeseries from each module in that current OPSEQ command so that later activity modules in same OPSEQ command have this data automatically available.  The SAVE table will transfer only the user selected time series to the HDF5 file.

At the completion of each OPSEQ command, all three dictionaries are recreated. (The ui directory is also recreated for each activity within the OPSEQ command.)

The  $\textbf{HSP}^\textbf{2}$ code modules that perform the real engineering are in a subdirectory name *HSP2/HSP2core*. These "core" modules are named for their HSPF counterparts (for example, himpwat.py).
(The other HSP2 code modules provide supporting non-engineering functionality and are found in the directory *HSP2/HSP2support*.)

All core $\textbf{HSP}^\textbf{2}$  modules have this required signature using the dictionaries defined above:
``` Python
    errcnt, errmsg = coremodule(store, general, ui, ts)```

The store argument is the "file handle" to the simulation's HDF5 file in for reading or writing data.

The errcnt and errmsg returned values are both Python lists. The corresponding elements in the lists give the total count of one type of error/warning that occured when that module ran and the associated error message for printing and logging.  All the run time HSPF warnings and errors are still used in $\textbf{HSP}^\textbf{2}$, but additonal messages are used.

Generally, like HSPF, the $\textbf{HSP}^\textbf{2}$  core modules do not terminate the simulation run if it is possible to continue.

## Section 4: Add New Data to an Existing Module in $\textbf{HSP}^\textbf{2}$<a id='newdata'></a>

**NOTE**  It is recommended that you don't change a built-in module, but instead copy the module to a new name, modify that copy, and 
then treat it as a new module. The process of adding a new module to $\textbf{HSP}^\textbf{2}$ is 
discussed in Section 3, below.

But sometimes we might just need a temporary fix, a work around, or need to quickly try out a new concept. 

### Use Case 1: An existing module was modified and requires additional data

In this example, imagine an existing module, IWATER, was modified and now requires additional data (PARAMETER, FLAG, INITIALIZATION, FTABLE, or MONTHLY) to operate in a new way.

Actually, this is trivial since all the data in an activity's FLAGS, INITIALIZATIONS, PARAMTERS, and MONTHLY tables are automatically included in the ui dictionary when it is created just prior to calling the activity's code module. The appropriate FTABLES are also included. 

So, to add new data, add any number of new data columns to the appropriate tables and you are done. (Remember to use names for your data that are distinct from the existing names in that activity's tables.)

In order to demonstrate this use case, the HSP2.run() module was copied to HSP2tools.run_tutorial and a print messages were added to show any occurance of the data element named 'TUTORIALDATA' anywhere. Additional prints were added to demonstrate the other use cases in this seciton.

During normal $\textbf{HSP}^\textbf{2}$  runs, this message is never displayed since no such data exists. This can be verified by running the tutorial.h5 HDF5 file and insuring this message is not displayed.

In [None]:
HSP2tools.run_Tutorial(hdfname)

Now pick any table in the HDF5 file. For example, in 
IMPLND IWATER directory to add a column for the new data and the save the table back to HDF5.  The new data column must be named *TUTORIALDATA* to trigger the message. You will provide a value for each segment in the column. For the tutorial.h5 example, there is only one segment, I001.

The modified code can access the new data using the ui dictionary where needed.
``` Python
ui['TutorialData']
```
To add this data to the PARAMTERS table:

First get the original data

In [None]:
df = pd.read_hdf(hdfname, '/IMPLND/IWATER/PARAMETERS')
df

Now add this TUTORIALDATA to the table.

In [None]:
df['TUTORIALDATA'] = 3.14 
df

Save this back to the hdf5 file, tutorial.h5

In [None]:
df.to_hdf(hdfname, '/IMPLND/IWATER/PARAMETERS', data_columns=True, format='table')

Now run to confirm this new data is seen by IWATER

In [None]:
HSP2tools.run_Tutorial(hdfname)

The "FOUND" message above demonstrates that the new data was made available as expected and has the correct value. 

**NOTE** When the data is imported from legacy UCI and WDM files, the data types of the columns are automatically set. FLAGS tables are set to *int*, and INITIALIZATIONS, PARAMETERS, and MONTHLY tables are set to *float*,  and SAVE tables are set to *bool*.  When you add the data yourself, either set the datatype explicitly or in the modified code cast the data to the correct type - **if necessary.**

For example, this line sets the data type for the TUTORIALDATA column:
``` Python
df.TUTORIALDATA = df.TUTORIALDATA.astype(float)          # existing column can use this syntax
```

or when used in the modified code, this is how to cast the data:
``` Python
mydata = float(ui['TUTORIALDATA'])
```

Adding a new MONTHLY table is essentially the same - except you must add the 12 monthly values for each segment by constructing a MONTHLY table and placing it in the MONTHLY directory with the rest of the MONTHLY tables. Your name for the monthly table is used to access the data in the code.

In test10, there are 4 tables placed under the PERLND PWATER MONTHLY directory. They are named CEPSCM, LZETPM, NSURM, and UZSNM. The PWATER code gets all 12 values in calendar order for NSURM using the ui dictionary:
``` Python
ui['NSURM']
```

You might add a MONTHLY table named, *MyMonthly*.  It should look like one of those existing tables. For example, view the NSURM table:

In [None]:
pd.read_hdf(hdfname, '/PERLND/PWATER/MONTHLY/NSURM')

## Section 5: Add a new module to $\textbf{HSP}^\textbf{2}$

The process to add a new module to $\textbf{HSP}^\textbf{2}$ is treated in detail in the $\textbf{HSP}^\textbf{2}$  *Maintenance Manual* (in progress). However, it seems appropriate to discuss the procedure at this point (without all the details) since it is part of the new advanced functionality's design.

#### It is not necessary to modify $\textbf{HSP}^\textbf{2}$ in order to add a new module!

### Use Case 2:  Add a  new "activity"" module  to  $\textbf{HSP}^\textbf{2}$ 

The hard part is to write the module and create the UCI-like data for it. The easy part is to enable  $\textbf{HSP}^\textbf{2}$ to run the module at the appropriate time.

For this discussion, assume the new code file is named *newactivity.py* with a function *newactity()* and the activity's name is *NEWACTIVITY*.  These correspond to to the IMPLND IWATER's names: himpwat.py, iwater(), IWATER respectively. The steps must be completed, but in any order.

##### Write you new activity's code
The new activity's function, newactivity() must have the following signature as discussed above:

```
errcnt, errmsg = newactivity(store, general, ui, ts)
```

This new code's file, newactivity.py,  may contain any number of other supporting routines called by newactivity(). Support files do not have a required signature.

##### Create the data for this activity 

In the watershed's HDF5 file, create a new directory named *NEWACTIVITY* under the appropriate operation (PERLND, IMPLND, or RCHRES) for your data. (Actually, the first Pandas DataFrame you save will create the directory if it didn't exist, so this is easy.)

Add your data tables (Pandas DataFrames) to the HDF5 file:
+ Add tables for FLAGS, PARAMETERS, INITIALIZATIONS and MONTHLY data as needed.
+ Add a SAVE table to this directory.
+ Add required time series to the /Timeseries directory.
+ Add rows to the NETWORK, SCHEMATIC, and MASS_LINK tables as necessary.
+ Add FTables, if needed, to the /FTABLE directory.

#### Now make the new activity work in $\textbf{HSP}^\textbf{2}$
This is the easy part:
+ Add a column to the operation's ACTIVITY table named NEWACTIVITY with appropriate values (True/False) for each segment. If this activity is to "replace" an existing activity, don't forget to mark that activity's column to False to prevent it from running.
+ Put the code file, newactivity.py, in the appropriate directory $\textbf{HSP}^\textbf{2}$ directory.
+ Add one line to the $\textbf{HSP}^\textbf{2}$  "init" file to make your new function available:

```
from newactivity import newactivity
```

+ Append one row to the /CONTROL/CONFIGURATION table to "register" the new activity.
+ The **first** time you run $\textbf{HSP}^\textbf{2}$, use the **reload=True** option to force $\textbf{HSP}^\textbf{2}$ to discover the new tables.

DONE!

So almost all of the work is in in writing the new code and setting up the new data tables.

#### Examine the  /CONTROL/CONFIGURATION table. 
The forthcoming $\textbf{HSP}^\textbf{2}$ *Maintenance Manual* will explain the details about this table, but looking at a few row will provide you with a sense of the new row's content:

In [None]:
pd.read_hdf(hdfname, '/HSP2/CONFIGURATION')

So this should NOT be too difficult.

## Section 6: $\textbf{HSP}^\textbf{2}$  Special Functionality<a id='special'></a>

The original design of HSPF was quite good. However the limitations of available memory and CPU performance in those days (1980s) required some compromises. Some, but not all parameters, were allowed to vary over time by use of the MONTHLY tables based on some FLAG values. The purpose of HSPF Special Functions was to bypass some of these restrictions selectively while still allowing HSPF to run on the machines of the time. These types of operatons can be performed by HSPF special function methods, but $\textbf{HSP}^\textbf{2}$  makes this easier and more obvious.

$\textbf{HSP}^\textbf{2}$ must remain backward compatible to the core HSPF functionality, but it is designed to remove HSPF limitations since even a simple modern laptop has over 5 orders of magnitude more fast memory and speed than the mainframe computers available when HSPF was designed. Technology such as multicore processors and scientific GPUs can provide even greater performance.

This section will discuss some new features in  $\textbf{HSP}^\textbf{2}$.

###  Constant parameters in HSPF can be replaced by time series in $\textbf{HSP}^\textbf{2}$ 


Some HSPF parameters were optionally allow to vary in time using FLAG and MONTHLY tables. The other HSPF parameters were constants.
HSPF used the following algorithm to determine the parameter's value at any time in the simulation when it was allowed to vary:
+ First interpolate monthly table values to get daily values. 
+ The values at timesteps within each day are set to the day's daily value.

However, the HSPF Special Functions capability could be used to allow any HSPF parameter to vary over time.

This capability to vary any parameter over time is made more integral to $\textbf{HSP}^\textbf{2}$.

#### IMPLIMENTATION in $\textbf{HSP}^\textbf{2}$ 

Internally, $\textbf{HSP}^\textbf{2}$, creates a time series for each parameter over the entire simulation interval at the start of each activity's code. 

The rules for creating a time series are simple:
+ Whenever the EXT_SOURCES table directs a time series with the name of an HSPF parameter (in TMEMN) to the current OPSEQ operation and segment (TVOL and TVOLNO), then this time series will be used in place of the parameter. (Because this is different bahavior than HSP2, a logged message
alerts the user whenever this is done.)
+ Otherwise, if the flag and monthly table information used by HSPF to allow a parameter to vary over time is found in the $\textbf{HSP}^\textbf{2}$ tables, then the HSPF algorithm is used to create a time series over the entire simulation interval.
+ Otherwise, this was is constant parameter in HSPF. The constant value found in for the parameter from PARAMETERS table will be used to fill the array.

There is no additional performance hit to specify a time series for a parameter since all parameters are already treated as time series internally anyway.

### Use Case 3:  Use a time series for the INFILT parameter

#### First, create a time series for INFILT and save it in the HDF5 file's /Timeseries directory.

Get the simulation's GLOBAL data to create a time index for this simulation. The new series must at least contain the simulations start, stop boundaries.

In [None]:
gdata = pd.read_hdf(hdfname, '/CONTROL/GLOBAL')['Data']
gdata

The frequency does not need to be at any fixed value - HSP2 will resample (up or down) to make it correct.

In [None]:
start = pd.to_datetime(gdata['sim_start'])
stop  = pd.to_datetime(gdata['sim_end'])

tindex = pd.date_range(start, stop, freq='h')
tindex

Just set some values.

In [None]:
infilt = pd.Series(0.15, index=tindex)                 # set the value of 0.15 at each timestep
infilt['1976-01-01 03:00':'1976-01-01 05:00'] = 0.20   # overwrite for all datetimes in this interval (end points included)
infilt['1976-12-31 18:00':] = 0.10                     # another change.

infilt

Save to the HDF5 file

In [None]:
infilt.to_hdf(hdfname, 'TIMESERIES/infilt')

####  Second, add a row to the EXT_SOURCES table to send this time series to PERLND INFILT for segment P001.

In [None]:
ext = pd.read_hdf(hdfname, '/CONTROL/EXT_SOURCES')
nrows, ncols = ext.shape

nrows, ncols

In [None]:
ext.loc[nrows] = ['*', 'infilt', '', '', 1.0, '', 'PERLND', 'INFILT', '', 'P001', '', 'Adding New series to control infilt']
ext.tail()

In [None]:
ext.to_hdf(hdfname, '/CONTROL/EXT_SOURCES',  data_columns=True, format='table')

Now run the simulation

In [None]:
HSP2tools.run_Tutorial(hdfname)

The infilt timeseries was found twice because it was made available to both SNOW and PWATER.

### MFACTOR and AFACTR, may replaced by a time series

+ The MFACTOR table column is found in the MASS_LINK and EXT_SOURCES tables
+ The AFACTR table column is found in the LINKS table

If an AFACTR or MFACTOR element in a table is a string that starts with an asterisk, then the string after the asterisk is the name of a timeseries to be found in the HDF5 TIMESERIES directory.  It is treated as a sparse array and padded appropriately (aggregation method SAME).

Otherwise, the AFACTOR or MFACTOR element should be a floating point number or string that can be converted into a floating point number. Internally, a timeseries is created with this value in every position.

So either way, any AFACTOR or MFACTOR is a timeseries for internal calculation. They are multiplied pointwise times the data timeseries specified by the table.


### Use Case 4: Simulate a town growing and replacing  farm land during a simulation.

This scenario is a town (IMPLND segment I001) growing over time replacing farm land (PERLND segment P001). The total area of the two segments must remain constant. This example uses the HSPF test10 HDF5, tutorial.h5.

The total area of the two segments P001 and I001 is 9000 acres. 

The IMPLND area will increase linearly by 20% over the simulation period. That is the IMPLD segment will grow from 3000 to 3600 acres.
This requires the PERLND segment to shrink from 6000 to 5400 acres.

First, create a timeseries for IMPLND. Name it *implnd* and save in the HDF5 file.

This process uses the tindex computed in the last example.

In [None]:
implnd = pd.Series(index=tindex)
implnd[tindex[0]] = 3000.
implnd[tindex[-1]] = 1.2 * 3000.
implnd = implnd.interpolate(how='time')

implnd.to_hdf(hdfname, 'TIMESERIES/implnd')

Create a timeseries for PERLND

Start with the original PERLND area and pointwise (in time) subtract the increase in the IMPLND segment.

In [None]:
perlnd = 6000. - (implnd-3000.)         # Note: this is a full vector calculation

perlnd.to_hdf(hdfname, 'TIMESERIES/perlnd')

#### Modify the AFACTR entries in the LINKS table

It is necessary to indicate when and which time series will be used to replace a fixed AFACTR. 

In [None]:
df = pd.read_hdf(hdfname, '/CONTROL/LINKS')
df

The PERLND AFACTR is at table index 0, the IMPLND  AFACTR at index 5.

Modify the AFACTR entries for these two rows and save back to the HDF5 file.

In [None]:
df.loc[0, 'AFACTR'] = '*perlnd'
df.loc[5, 'AFACTR'] = '*implnd'
df.AFACTR = df.AFACTR.astype(str)   # previously all entries were floats, so Pandas made this a float typed column.
df.to_hdf(hdfname, '/CONTROL/LINKS',  data_columns=True, format='table')

df

Now run the simulation and look for the message displayed whenever AFACTR is replaced by a time series.

In [None]:
HSP2tools.run_Tutorial(hdfname)

## Section 7: Run HSP2 with defined workflows (including QA/QC)<a id='workflow'></a>

This section discusses using project defined workflows for 
$\textbf{HSP}^\textbf{2}$.

### Use Case 5: Determine the current status of all work on project xxxx

### Define Workflows

Notebooks should be created for various activities like
+ Preparing a timeseries (remove bad data, estimating missing data from other sources)
+ Creating a Watershed model
+ Sensitivity Analysis
+ Parameter Calibration
+ Analysis of Best Management Practices
+ Analysis methods
 
#### Generic process to Start a new workflow activity
+ Fetch a copy of the appropriate "Master" workflow Notebook from a version control repository.
+ Rename it and save back to the repository or to the associated HDF5 file.
+ Master Notebooks should define a place near the top to the user's name, creation date, purpose of the activity, and any other required metadata.
+ The heading and text (markdown) cells in the Notebook should be a generic specification of that process.
+ Ideally, all data should be processed in the Notebook. The data should be fetched from databases, HDF5 files, and other controlled sources. The processed results should have defined storage locations like a database or HDF5 file.  Add cells as needed for process required computations.
  + Currently, the Juptyer Notebooks support 40+ computer languages and run on Windows, Linux, and Macs so their use shouldn't be allowed to be an issue.
+ Whenever the process step seems inappropriate, simply document the Notebook in that or an adjacent cell to describe why the process step needs to do either do unusual processing or to skip steps. Document the "as done" rather than what is hoped for in the Notebook.
+ Periodically, the Notebook should be committed back to the repository or saved to the HDF5 file.
+ Ideally, associated documentation including customer documents and emails should be saved to either the Notebook or to a project defined location where it can be controlled.

### Add QC/QA to Notebook workflows

#### Create Workflow Metadata

Project defined metadata is defined in the master Notebooks.  It should be filled in a the time the Notebook is created for the specific task. The metadata can also be attached to the HDF5 file's top directory.  

A very simple Python program can scan a directory (and all its subdirectories) to extract this top level
metadata and build a simple CSV file with the results.

The resulting summary CSV can be displayed in Pandas or EXCEL to have a compact description of all simulations and to find a specific simulation by
its characteristics. You don't need to require the exact same set of metadata for all HDF5 files! Files having additional metadata or not having
the normal common metadata don't break the code. Missing data just leaves a blank and extra data creates a new column in the summary CSV file.

Here is an example of a metadata that is then attached to the HDF5 file.

In [None]:
md = {
'Notebook': 'Workflow1.ipynb',
'HDF5Name': 'Workflow1.h5',

'Analyst':'RTH',
'CreationDate':'08/20/2014',
'Purpose':'Extract, cleanup, and verify precepitation time series',
'Project':'My Watershed',
'Comment':'Closest station to PERLND 24 & IMPLND 27',
'Source':'USNWS KRAP',
'Notebook': 'Workflow1.ipynb',
'HDF5Name': 'Workflow1.h5',

'DataSourceQuality':'Good',
'FinalDataQuality':'Excellent',
}

In [None]:
pd.DataFrame(md, index=[0]).T

These tutorials have already demonstrated how to save DataFrames to HDF5 or to add annotation to the HDF5 file.

#### Notebook widgets

The Master Notebooks can put "widgets" in some cells at key points for the user to mark the status of work completion or to mark other status such as data quality, peer review completion, etc.

Python callback routines may be written at the project support level to cause the widgets to report their values to a database or other collection point. This allows the status of all ongoing and completed workflows to be available to the quality team and project management in real-time.

**Note**: neither the metadata above or the widgets below are actually connected to any persistent storage in this tutorial.

Here are examples of two (of many) widgets that might be used:

#### (Example)  Step 6: Get approval from customer.

In [None]:


step6 = Dropdown(description='Status of step 6')
step6.options = {'Not Started':0, 'Done':1, 'Not Applicable':2, 'In Progress':3, 'Blocked':4}
step6

#### (Example) Step 10: Final check of the processing of the precipation data

In [None]:
done10 = Checkbox(description='Precipation Data Processing Complete')
done10

Callback routines are given the state of the widget and can connect to a collection point.
Alternatively, or it can be read by a tool that periodically scans the filesystem for all Notebooks:

In [None]:
print 'STEP  6 is', step6.value
print 'STEP 10 is', done10.value

Notebooks support the traditional set of widgets including pushbuttons, radio buttons, text boxes, and sliders. Appropriate use of widgets connected to persistent storage can be a significant aid in managing workflows.  This insures consistency of reporting, realtime status, and allows tools to perform automatic aggregation of information across the project.

Widgets have many options to customize such as color, size, and font.

**Workflow summary**

 + Create a Master copy of watershed's HDF5 file.
 + Annotate the master file with essential project information
 + Create one or more IPython Notebooks (from a template if possible) These include notebooks for
 
     + Data preparation (like timeseries data)
     + Setup of data in the Master HDF5 file
     + QA/QC checks
     + Checking run results
     
 + Store associated project documentation (notebooks, scanned pages, Word documents, etc.) into the HDF5 file.
 + Create a Git or Mercurial repository for specialized code (.m or .py files) and for version control of working documents. Save in the Master
 HDF5 file.
 
For each new investigation, calibration or other study

 + Clone the Master HDF5 file
 + Remove unnessary data that can be pointed back to the Master to save storage and insure proper data integrety
 + Pack (and possibly compress) the cloned HDF5 file
 + Create one or more IPython Notebooks from templates
 + Modify simulation parameters for this study
 + Save working notebooks and other documentation specific to this simulation into the cloned HDF5 file.
 + Annotate the HDF5 file
 + Create a Mercurial repository for specialized code (.m or .py files) and for version control of working documents. Save in the cloned
 HDF5 file.
 
Advantages of HSP2 workflow:

 + All documentation, timeseries and other simulation data, and computed simulation results can be saved into a single HDF5 file to keep
 all the "artifacts" together.
 
 + IPython Notebook templates can be used to define appropriate processes and document the results of each process step. This
 can be a good enhancement to a QA/QC program.
 
 + Data does not need to be duplicated.
 + Widgets can make it easy and consistent to report the status of processing steps that can be aggrageted across all Notebooks for a project.
 + Metadata can be placed in each Notebook and aggregated across all Notebooks for a project.