<img src="Images/HSP2.png" />
This Jupyter Notebook Copyright 2017 by RESPEC, INC.  All rights reserved.

$\textbf{HSP}^{\textbf{2}}\ \text{and}\ \textbf{HSP2}\ $ Copyright 2017 by RESPEC INC. and released under this [License](LegalInformation/License.txt)

# TUTORIAL 2: How to use $\textbf{HSP}^\textbf{2}$ files



## Introduction

All data required for an $\textbf{HSP}^\textbf{2}$ simulation run are contained in one (or more) HDF5 files. The results (computed time series) are also stored back into the simulation's HDF5 file along with information about the latest run.   Additional project information including documents should also be stored in the project's HDF5 file to provide a complete record.

This tutorial will cover how to read, view, edit, save, and annotate the computed time series datasets and the various tables and time series data that control the  $\textbf{HSP}^\textbf{2}$  simulation in Python.  A later tutorial will demonstrate how to create CSV (comma separated value) files which can move information in and out of the HDF5 files.

The Pandas Python library is particularly useful for data "munging" activities. It is heavily used in these tutorials.

**Tutorial Contents**

 + Section 1: [Read, Modify, and Write DataFrame Tables to HDF5](#section1)
   + Demonstrated how to see the internal HDF5 structure
   + Demonstrated how to see the file structure (keys)
   + Discussedthe structure of the Python $\textbf{HSP}^\textbf{2}\ $ HDF5 file.
   + Demonstrated how to read table data from HDF5
   + Demonstrated how to modify existing table data column
   + Demonstrated how to add a new column to the data
   + Demonstrated how to modify a single element in the table
   + Demonstrated how to add a new row to the data
   + Demonstrated how to save a new or modified table back to HDF5

 + Section 2: [Annotating the HDF5 File (also called Attributes in HDF5)](#section2)
   + Read attributes from a time series.
   + Save a time series attribute to HDF5.
   + Use a time series attribute to control code execution 
   + Save attributes to groups (dictionaries.)
   + Annotating a table row by row
 + Section 3: [User Defined Filters for UCI-Like Data](#section3)  
     + Demonstrate the Land Use filter capabilty using the fetch() and replace() convenience functions in $\textbf{HSP}^\textbf{2}$.
     + Demonstrate creating a boolean filter based on the data in a table
     + Demonstrate an HDF5 query functions to read only the data matching the query selection.
     + Demonstrate creating User Defined filters for any data table
     + Demonstrate using the these filters to read, modify, and save data to the HDF5 file
 + Section 4: [Read, Modify, and Write time series to HDF5](#section4)
   + Read a time series from the HDF5 file
   + Perform a computation on the entire time series
   + Write the time series to the HDF5
   + Create new time series
   + Show a subset of the time series data (called slicing) 
 + Section 5: [Working with Computed Datasets from Multiple Segments](#section5)
   + Create a list of keys for the desired data (manually or by filtering)
   + Loop over the HDF5 keys to create a DataFrame with the required time series.
 + Section 6: [Viewing Simulation Results](#section7)
  + ViewPerlnd
  + ViewImplnd
  + ViewRCHRES
 + Section 7: [More $\textbf{HSP}^\textbf{2}$  Convenience Functions](#section8)
  + Add a new segment using information from an existing segment - clone_segment().
  + Remove a segment from all the tables in an HDF5 file - remove_segment().
 + [Final Remarks](#GIS)
  + Discuss using the HDF5 file to store other project information like PDF files and Notebooks.
  + Discuss using the HDF5 file to store Geographic Information Data like shapefiles.

**OPTIONAL**, but highly encouraged, please install   [HDFView](http://www.hdfgroup.org/products/java/release/download.html) or [HDFCompass](https://www.hdfgroup.org/projects/compass/). HDFView is the traditional HDF4 and HDF5 file viewer. It allows the user to make changes to data, copy and move directories, output data as text files, plot, and many other functions. HDFCompass is a newer tool which currently can view, but not change HDF5 files. You can use either or both **free** tools whenever the tutorial suggests using HDFView. Both are tools provided by the non-profit HDF Group which maintains the HDF4 and HDF5 file formats as long term archival data format for scientific data.

> HDFView and HDFCompass (both free) provide a standard Windows installer as well as versions for Linux, and Mac. Instructions and documentation are available at this site links above by the HDF Group which maintains HDF4 and HDF5.

> It is possible to modify the HDF5 file using HDFView, but it is a bit clumsy. The tutorials only use it to view the HDF5 file.

> As you run the tutorial, note that neither HDFView nor HDFCompass currently update their displays automatically if the data in the file changes. Simply close the file and open the file (using the recent file shortcut in HDFView) to quickly reopen the file to see any changes.

### Required Python imports  and settings<a id='options'></a>
Execute the following cell (right arrow in the cell toolbar or Ctrl-Enter shortcut.) This will configure the Python environment.

In [None]:
import os
import site
site.addsitedir(os.getcwd().rsplit('\\',1)[0] + '\\')  # adds your path to the HSP2 software.

hdfname = 'TutorialData/tutorial.h5'

import numpy as np
import pandas as pd
pd.options.display.max_rows    = 18
pd.options.display.max_columns = 10
pd.options.display.float_format = '{:.2f}'.format  # display 2 digits after the decimal point

import HSP2
import HSP2tools

HSP2tools.reset_tutorial()    # make a new copy of the tutorial's data
HSP2tools.versions()          # display version information below

As you work in these tutorials, any changes to the data are made to a copy. So if you make a mistake or want to try experiments, don't worry - only the copy is modified. 

Whenenver you want a fresh copy of the data files, execute the cell above again. (This is true for all tutorials.)

### Data used in this tutorial

The data used throughout these tutorials are taken from the HSPF test10 and from a real watershed (calleg). The HDF5 filename set in the cell above using the code
```
hdfname = 'TutorialData/tutorial.h5'
```
specifies the HSPF Test10 HDF5 file.  The $\textbf{HSP}^\textbf{2}\ $ tools uciReader and wdmReader were used to convert the test10 UCI and WDM files into the HDF5 file used by $\textbf{HSP}^\textbf{2}$. These tools are discussed in Tutorial 4.  Test10 is a simple watershed model with 1 PERLND, 1 IMPLND and 5 RCHRES segments.

**NOTE**: $\textbf{HSP}^\textbf{2}\ $ allows time series, FTables, segments IDs 
(like PERLND pids), and all other objects to be named with any "Natural Name" unlike HSPF which requires numbers as names.

The Natural Naming convention means that a name starts with a letter and is followed by any combination of letters, numbers, and underscores. Upper and lower case letters are distinct. This is exactly the Unix/Linux file naming convention. Unicode characters such as the Greek alphabet or mathematical symbols may NOT be used in names due to limitations of HDF5. 

The PANDAS and pyTable interfaces to HDF5 files can relax this requirement for Natural Names for DataFrame column names, but it is recommended to use Natural Names.

$\textbf{HSP}^\textbf{2}\ $ doesn't require any specific naming policies. Names can/should be informative. For example, a RCHRES segment name may be LakeWoBeGone. 

The $\textbf{HSP}^\textbf{2}\ $ uciReader and wdmReader tools will convert numbers used as names in HSPF UCI and WDM files when creating the HSP2 HDF5 file.
The time series WDM numbers line 14 and 139 are converted to names like TS14 and TS139. FTable numbers are converted to names like FT001, FT103.  Segment IDs (rid, iid, pid) numbers are converted to names like P001, I001, R003 for PERLND, IMPLND, and RCHRES segments respectively. These are not required or even suggested naming conventions. A later version of these tools will allow the user to specify how WDM and UCI numbers are changed into names for legacy watersheds. A future GUI tool will allow the user to directly name objects are they are created for new watersheds.

## Section 1: Read, Modify, and Write DataFrame Tables for  $\textbf{HSP}^\textbf{2}$ <a id='section1'></a>

The HDF5 file used by $\text{HSP}^2$ contains the data previously contained in HSPF UCI files (flags, parameters, initializations, monthly tables, FTables, etc.). The HDF5 file  also contains the time series data previously contained in WDM files, the results of the simulation run (time series), documentation, and other data to make a self-contained simulation package.

$\textbf{HSP}^\textbf{2}$  files are organized internally into groups and datasets. Both groups and datasets are accessed by specifying a full Linux style path to the required element. The term directory will be often used for an HDF5 group since it conveys the meaning in a better known term.

The Pandas library provides two data structures used in 
$\textbf{HSP}^\textbf{2}$. The Series, a one dimensional array, usually with a time index and the DataFrame which is a two dimensional array that looks like a spreadsheet.

### Examine the structure of the  $\textbf{HSP}^\textbf{2}$  tutorial HDF5 file

Generally, you don't need to do this for your own files. But if you are given an unknown HDF5 file, it is always easy to understand its structure. The HDFView amd HDCompass utilities are very useful for this.

The following example shows how to do this in Python (using Pandas). The information displayed will make more sense as you complete these tutorials.

Over the course of these tutorials, many Python expressions will be explained. So don't worry about expressions that are not yet clear. 

In [None]:
with pd.get_store(hdfname, mode='r') as store:
    print store

Almost always you **don't** need this level of detail. You can look at the just the HDF5 group (directory) structure easily.<a id='keys'></a> The Pandas store.keys() routine will return the internal directory and dataset structure of the HDF5 file.

In [None]:
with pd.get_store(hdfname, mode='r') as store:
    keys = sorted(store.keys())  
keys

**Note:** Placing the name of the object, for example "keys" in the cell above, by itself as the last code line in the cell will cause the Notebook to display the object when the cell is run. This Notebook feature is frequently used in these tutorials.  If you want to see more than one object, then you can print as often as you like in a cell.

### Structure of an HDF5 file for $\textbf{HSP}^\textbf{2} \ $ 

The structure of the $\textbf{HSP}^\textbf{2}\ $ HDF5 file is simple and is shown in the keys above.
The following discussion will provide more detail.

The directory **\CONTROL** contains these tables in the following file structure:

+ CONTROL
  + CONFIGURATION
  + EXT_SOURCES
  + GLOBAL
  + OP_SEQUENCE
  + MASS_LINK
  + LINKS (combines SCHEMATIC and NETWORK data from HSPF into one table.)
  
Each table above contains information similar to their HSPF UCI counterpart, but usually with more information.

The directory **\FTABLES** contains all the individual FTables.  So the FTables are organized like:

+ FTABLES
  + FT001
  + FT002
  + FT003

The directory **\Timeseries** contains all the time series data (like the data in a WDM file) with each time series in its own table. Like a WDM file, each time series can be at a different timestep and can span longer time durations than the actual simulation duration.

The directory **\RESULTS** contains all the computed time series. Under this directory, there is a directory for every PERLND, IMPLND and RCHRES segment. This directory has subdirectories for each "activity" like PWATER.  For example, a section of this file structure for test10 looks like:

+ RESULTS
  + PERLND_P001
    + PWATER
    + SNOW
    
A table, like SNOW above, contains all the saved, computed time series from the SNOW module for that segment.
  
There is a top directory for each of PERLND, IMPLND, and RCHRES to hold the HSPF UCI-like data.
They all have similar structure. For example, here is a part of the **\PERLND** structure:

+ PERLND
  + ACTIVITY
  + GENERAL_INFO
  + PWATER
    + FLAGS
    + STATE (sometimes called INITIALIZATIONS in HSPF documenation)
    + PARAMETERS
    + MONTHLY (optional)
    + SAVE
  + SNOW 
    + FLAGS
    + STATE (sometimes called INITIALIZATIONS in HSPF documenation)
    + PARAMETERS
    + MONTHLY (optional)
    + SAVE

An $\textbf{HSP}^\textbf{2}$  HDF5 table contains all the associated information from possibly many HSPF UCI tables. For example,
the $\textbf{HSP}^\textbf{2}$  table at PERLND/SNOW/PARAMETERS contains all the parameter information from HSPF UCI SNOW-PARM1 and SNOW-PARM2.  

The SAVE table in every "activity" directory (like PWATER) will contain a column for each possible computed time series and a row for each segment ID. The user can put a True or False at each intersection in this table to specify which time series are saved to the HDF5 file during the run. A time series is saved at the same timestep interval in which it is run at during the simulation (frequently hourly.) That is, all computed data for a time series is saved when user selects it without modification.

Any computed time series is automatically available to all activities downstream in the same OPSEQ command. The SAVE table defaults are sufficient to supply other modules in later OPSEQ commands.
The primary purpose is for the user to specify data for their post run analysis needs.

#### View the HDF5 file with HDFView or HDFCompass

See how the the test 10 HDF5 file looks in HDFView.
 + start HDFView or Compass.
 + open file 'tutorial.h5' in the TutorialData sub directory where you started these tutorials. You will need to browse to this location since these tools start at the top of the user directory.

Sadly, neither HDFView nor Compass take command arguments, so this can't be automatically run from this Turtorial.

### View and Modify an FTable

In the keys above, you can see the directory '/FTABLES' contains 5 data sets named 'FT001',...,'FT005'.

Pandas can access data tables and time series with the **read_hdf()** function. The **read_hdf()** function takes the HDF filename for its first
argument, and the **full** path within this HDF file to the data set as its second argument.  A **full** path means from the top (**/**) to the name of the dataset (like a Linux directory path). The full path is called a key in HDF5.

**Note:**  Pandas allows you to skip the leading  **'/'**.  It will always be shown in these tutorials because other tools like MATLAB or other software might require it (so stay in the habit of using it.)

Although this example shows techniques to work with an FTable, the same techniques can be used for any dataset (a Pandas DataFrame or Series) stored in the HDF5 file.

#### View FTable  FT001

The appearance of this data in a Jupyter Notebook looks like a spreadsheet so it should make sense. The data is actually in a Pandas DataFrame structure.  HDFView and HDFCompass will show a similar display.

In [None]:
df = pd.read_hdf(hdfname, '/FTABLES/FT001')
df

These lines from the [Required Python imports and settings](#options) cell at the top of this Notebook controls the amount of the table presented (and the display rounding).

```
pd.options.display.max_rows    = 18
pd.options.display.max_columns = 10
pd.options.display.float_format = '{:.2f}'.format  # display 2 digits after the decimal point,

```

Tables bigger than the max you set with the display options will show a truncated display - but then the display will then show the actual size of the table below the figure.  Elipses are used to show truncation of large tables, so you always know when this happens.

Reset the display options to see how the display changes. (The data iteself is  unchanged by the display option settings.)

In [None]:
pd.options.display.max_rows    = 8
pd.options.display.max_columns = 4
pd.options.display.float_format = '{:.5f}'.format  # display 5 digits after the decimal point,

Now view the same FTABLE again with these display options.

In [None]:
ftable = pd.read_hdf(hdfname, '/FTABLES/FT001')
ftable

Now reset the original display options

In [None]:
pd.options.display.max_rows    = 18
pd.options.display.max_columns = 10
pd.options.display.float_format = '{:.2f}'.format  # display 2 digits after the decimal point

#### Get the data from a DataFrame column

To get the data for one column in the table, Pandas only needs the column name. For example, we can look at the **Disch1** column data below.

**Note:** This is a view into the table! Changing the view data changes the original table.

In [None]:
ftable.Disch1

Nowever, if the column name doesn't follow the Natural Naming convention, you must use this syntax which always works:

In [None]:
ftable['Disch1']

A few routines in Pandas also require this syntax, but this is decreasing over time. (Legacy issue.)

#### Calculations on a column in a DataFrame

Calculations may be done using standard Python expressions.  Numpy expressions can work on all the elements of an array (such as a DataFrame or Series) at once without explicit looping. For example, increment the Disch1 data by 25%:

In [None]:
ftable.Disch1 = 1.25 * ftable.Disch1
ftable.Disch1

Now look at the original table, to see that the **Disch1** column is modified for every element.

In [None]:
ftable

### Write a DataFrame Table to HDF5

This FTable can now be saved back to the HDF5 file, using the Pandas **to_hdf()** function.
The first argument is the HDF5 filename, the second argument is the path (or key) to the data set in the HDF5 file.
Optional keyword arguments may follow.

The keyword arguments used in $\textbf{HSP}^\textbf{2}\ $ specify that the table will be implemented in the HDF5 file to look like a spreadsheet.

 + **data_columns=True**   makes column headers visible in HDF5,  use HDFView on this FTable to see the headers
 + **format='table'**      specifies  which data format to write into the HDF5. The table format allows queries, appending, etc. The format can also use just a 't' instead of 'table'.

In [None]:
ftable.to_hdf(hdfname, '/FTABLES/FT001', data_columns=True, format='table')

Now view the FT001 using HDFView to see that this data was modified in the HDF5 file. (Don't forget to close and reopen the HDF5 file in HDFView!)

If you are using HDFCompass, close the window with the FT001 table, then reopen it to see the changes. You don't need to close and reopen the file.  **However**, sometimes HDFCompass will get confused and you wll need to close and reopen the file. Perhaps, it is better to always close the file and reopen it to avoid this bug.

### Create a new column by calculations from existing columns, then it save back to HDF5

Now add a new discharge column calculated from the other columns (silly, just to show another vector style calculation and an automatic column creation.)

**NOTE:** When a new column is created automatically, the reference must use the 

```
ftable['Disch4']
```

syntax on the left side (before the equal sign) rather than

```
ftable.Disch4
```

In [None]:
ftable['Disch4'] = 0.8 * ftable.Disch1 + 0.15 * ftable.Disch2 + 0.05 * ftable.Disch3 + 2.5
ftable

This is not a good discharge since the first row should be zero!
Fix this up by accessing one element in the table. Use the **loc** function to specify the row name and the column name (in that order).

In [None]:
ftable.loc[0, 'Disch4'] = 0.0
ftable

Alternatively, you can select the specific column, the select the row in the column using this notation:

In [None]:
ftable.Disch4[0] = 0.0

Now save back to the HDF file. Check this using HDFView.

In [None]:
ftable.to_hdf(hdfname, '/FTABLES/FT001', data_columns=True, format='table')

### Add a row to an DataFrame

Adding a column is easy (almost automatic) as show above. However, adding a row to Pandas is a bit harder.  It is often convenient to use the shape attribute to get the current number of rows (and columns), then use the **.loc** method to append the new row to the table at the next available index.
 
The following array is to be appended after index 13 in the table above. It should have the same number of entries as the number of columns as the table.

In [None]:
newrow = [30.0, 15.2, 170.0, 16.3, 9.4, 600.0, 45]
newrow

Determine the number of rows currently in the ftable. The Pandas/numpy shape property gives the dimensions of a table or series.

In [None]:
nrows, ncols = ftable.shape
nrows

The **loc** function extends an array wehn the specified index is not already in the array.

**NOTE:** Python (like C, C++, Java) starts counting at zero. The existing rows are indexed from 0 to 13. Hence the next row to add needs index = 14.  This is always the value of nrow!

In [None]:
ftable.loc[nrows] = newrow
ftable

The **.loc** function is great for adding a single or small number of lines. When a larger number of lines must be added, the technique of creating a new DataFrame with the new rows and then concatenating the original and new DataFrames will give better performance. 

Pandas also provides merge, join, and concatenate methods to for DataFrames. These are methods similar to database and spreadsheet functionality.

Now save the modified FTable back to the HDF5 file:

In [None]:
ftable.to_hdf(hdfname, '/FTABLES/FT001', data_columns=True, format='table')

Now close and reopen HDFView to see the result.
You can also reread the FTABLE directly from the HDF5 file to see the changes:

In [None]:
pd.read_hdf(hdfname, '/FTABLES/FT001')

You may also add a new  row one element at a time.

In [None]:
nrows, ncols = ftable.shape
ftable.loc[nrows, 'Volume'] = 300.0
ftable

Notice that the other DataFrame cells in the new row are filled with NaN, Not a Number, by default.
It is expected that real values be subsituted for the NaNs at some point.

NaN (and INF and -INF) are actual IEEE floating point standard numbers.

In modern computer languages, any operation using NaN numbers will produce a NaN. HSPF typically used a special value (like -1.0e30) as a placeholder for a missing value or a currently undefined value. Unfortunately, such a number can be used in calculations leaving no trace except incorrect results.  NaN numbers are safer.

**Section Summary**

  + Demonstrated how to see the internal HDF5 structure
  + Demonstrated how to see the file structure (keys)
  + Discussedthe structure of the Python $\textbf{HSP}^\textbf{2}\ $ HDF5 file.
  + Demonstrated how to read table data from HDF5
  + Demonstrated how to modify existing table data column
  + Demonstrated how to add a new column to the data
  + Demonstrated how to modify a single element in the table
  + Demonstrated how to add a new row to the data
  + Demonstrated how to save a new or modified table back to HDF5


## Section 2: Annotating the HDF5 File (also called Attributes in HDF5)<a id='section2'></a>

One of the features of HDF5 is the ability to add annotation (attributes) to any object in the HDF5 files.
These attributes are saved and accessed as key, value pairs.  Keys are generally short strings. There is no fixed limit to the number of key, value pairs. However, the total size in bytes for the annotations (attributes) for a single object should be limited to 64k.

These attributes can be read and used by code. For example, if an attribute to a time series is its measurement unit (like ft or  km), then the code can read this and perform any necessary units conversion before using it.

The downside of these attributes is that they are lost when the object they are attached to is deleted (perhaps before writing an updated version of the object.) They are not lost by appending to the existing data.  The solution is to read and save the attributes first to reattach back to the rewritten object.

Many users prefer to create fields in their data to keep this type of attribute information rather than use the attribute metadata. Users may create a table to show all the provenance information (original data source, all operations on the data including the user, tools used, etc., and data quality assessments. These tables are stored in the HDF5 files with their associated data.  This seems to be the preference of the Pandas HDF5 community - hence the weak support in Pandas for annotatation.

######  Pandas uses PyTables to read/write to HDF5 files. For data annotations, the h5py library is much more powerful to use.

In [None]:
import h5py

### Time Series Attributes

It is a good idea to annotate each time series data set with information about the source, data quality,
data processing applied to the data, units, and the like. Annotations are easily viewable with the HDFView program.
They can also be easily read via Pandas or h5py. 

The tool used to import WDM data to HDF5 automatically annotates each time series with the data found in the WDM file.
The type data, ttype, actually comes from the UCI EXT_SOURCES data rather than the WDM file. The units aren't available except as English or Metric (poor!)
in either the WDM or UCI files, so it is set to '??' during the data import. An example of the units ambiguity is that the water volume is expressed both in ft$^3$ or acre-ft in different calculations which are both English units!

Read some attributes of a time series, TS39:

In [None]:
with h5py.File(hdfname, 'a') as hdf:
    ts = hdf['/TIMESERIES/TS39']
    print ts.attrs['start_date']   # ISO format by default
    print ts.attrs['wdm_units']   
    print ts.attrs['agg_method']
    print ts.attrs['units']

You can easily discover all the attributes attached to an object. (Some are automatically set by Pandas.)

In [None]:
with h5py.File(hdfname, 'a') as hdf:
    ts = hdf['/TIMESERIES/TS39']
    dd = ts.attrs.items()
    
for key, value in dd:
    print key, value

**Note:** the Python statement above:

```
with h5py.File(hdfname, 'a') as hdf:
```

The with statement is a common Python idiom to open a file and insures the file is closed when the code in the following block has been executed or whenever the code causes an exception out of the inclosed block. The name following the **as** is used as a file "handle" to access elements of the file.

We can set the units ('in/hr') in a similar manner.

In [None]:
with h5py.File(hdfname, 'a') as hdf:
    ts = hdf['/TIMESERIES/TS39']   # first access the object (here the time series)
    ts.attrs['units'] = 'in/hr'    # then define the key, value pairs

This annotation can be seen using HDFView by looking at the '/time series/TS39' dataset. The annotations are shown in the Metadata (bottom panel in HDFView). 

Attribute data can be used by the code. For example, consider reading the 'units' attribute for TS39 and then determining if a units conversion is necessary and performing it in the code below.

In [None]:
with h5py.File(hdfname, 'a') as hdf:
    ts = hdf['/TIMESERIES/TS39']    # 'type' conflicts with everything, ttype was default.
    units = ts.attrs['units']
    
ts = pd.read_hdf(hdfname, '/TIMESERIES/TS39')
if units == 'in/hr':
    ts = ts * 2.54  # converts to cm/hr
    print 'did in/hr conversion to cm/hr'
elif units == 'ft/hr':
    ts = ts * 12.0 * 2.54
    print 'did ft/hr conversion to cm/hr'
elif units == 'cm/hr':
    print 'no conversion needed'   
else:
    print 'No units found, no automatic conversion possible'

**Attributes for all data sets work the same way.
You can attach attributes to Tables, for example.**

Exactly the same technique will work for tables data. Try setting an attribute on the FTable1 used above. An empty cell is provided below.

### Directory (Group) Attributes

Directories can also have attributes attached.

In the following cell "/" is the name of the top level group. It is a good place to put project level information (metadata) about the HDF5 file.

In [None]:
with h5py.File(hdfname, 'a') as hdf:
    ts = hdf['/']                      # top level group!
    ts.attrs['Simulation Name'] = 'Test 10'
    ts.attrs['Owner'] = 'Me'
    ts.attrs['Notes'] = """ This HDF5 file contains the data necessary to run the HSPF Test 10.
     The HSPF source distribution provides the answer files to check the results of the simulation.
     For the test, the SAVE tables were marked as all ones (True values) to force saving all 
     computed time series, but this is not a good practice for large watersheds."""

Use HDFView to examine the metadata. Click on the "tutorial.h5" label at the top of the file tree (left most panel.)

Remember to close and reopen the HDF5 file in HDFView to see the update.

Read this new annotation of a group using h5py:

In [None]:
with h5py.File(hdfname, 'a') as hdf:
    ts = hdf['/']                      # top level group!
    print ts.attrs['Notes']

Attributes for all groups work the same way.
You can try setting an attribute on the **/FTABLES** group.
A empty cell is provided below.

### Annotating a Table

HDF5 annotations apply to the entire group or dataset.  You may set annotations on a DataFrame table in exactly the same way as setting them on a time series or group as shown above.

If you have a table and need to annotate it row by row, HDF5 annotations don't work well since they attach to the entire DataFrame, not individual elements.

Instead, create a new column in the table to hold the annotations and then fill in the annotations as needed.

In [None]:
links = pd.read_hdf(hdfname, '/CONTROL/LINKS')
links

In [None]:
links['MyNotes'] = ''  # sets up a new column with an empty string default

links.loc[0, 'MyNotes'] = 'This segment will be reduced to to 3000 Acres by the end of the simulation'
links.loc[1, 'MyNotes'] = 'This segment will grow during the simulation at the expense of P001'
links.loc[6, 'MyNotes'] = 'This is the end node of this watershed model'

links

In [None]:
links.to_hdf(hdfname, '/CONTROL/LINKS2', data_columns=True, format='table')

Check that this was written to the HDF5 file by reading it (or viewing in HDFView).

In [None]:
pd.read_hdf(hdfname, '/CONTROL/LINKS2')

## Section 3: User Defined Filters<a id='section3'></a>

You may select subsets of the data by filters for viewing or modification of the HDF5 data using several techniques.
+ The first technique is to use the user defined land use associated with each segment of a PERLND, IMPLND or RCHRES in $\textbf{HSP}^\textbf{2}\ $. This technique will be demonstrated using two of the convenience functions provided with $\textbf{HSP}^\textbf{2}\ $. However, the convenience functions are outside the core of  $\textbf{HSP}^\textbf{2}\ $ and users can create their own convenience functions using these as examples.  Other convenience functions will be described in a later section.
+ The second technique is to create a boolean expression to select subsets of data based on the data itself.
+ The last technique is to add one or more columns to a data table which can then be used to create custom subsets of the table.

This section shows these approaches.

### LAND USE FILTERING
Each PERLND, IMPLND and RCHRES GENERAL_INFO table has a column for the user to specify the land use for each segment. The land use may be any string (or number represented as a string.)  For example, PERLND land use options might be information like:
+ OldFOREST, AlpineFOREST, AGRICULTURE, GRASSLAND
+ 1003, 2010, 9900
+ 10.1.5, 10.3.1, 12.25.20  (a heirarchal naming scheme)
+ HUC codes

In test10, only the RCHRES operation has multiple segments. So this example will use the RCHRES "land use".

First setup a "land use" for RCHRES and save it to the HDF5 file:

In [None]:
df = pd.read_hdf(hdfname, '/RCHRES/GENERAL_INFO')
df.LANDUSE = ['Pond', 'Channel', 'Channel', 'Creek', 'Creek']
df.to_hdf(hdfname, '/RCHRES/GENERAL_INFO', data_columns=True, format='table')
df

This example will use the $\textbf{HSP}^\textbf{2}$ convenience functions **fetch()** and **replace()**.

The information about any function in Python can be found easily. After you read this information, close the box by clicking on the X in the upper right corner.

In [None]:
HSP2tools.fetch?

A number of examples will be provided below to demonstrate the fetch() function. Finally, the use of fetch() to filter by land use will be demonstrated.

**Note** fetch() returns two things. The first will be used to save the modified data back to the HDF5 file using the replace() function. You should ignore this until needed. The second returned value is the DataFrame with the requested data.

In [None]:
# fetch all data (Flags, Initializations, Parameters, Monthly) for operation RCHRES, activity HYDR
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR')
df

In [None]:
# shows how one of the subtype data (Flags, Initializaions, Parameters, and Monthly ) can be selected
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR', subtype='PARAMETERS')
df

In [None]:
# shows how several of the subtype data (Flags, Initializaions, Parameters, and Monthly ) can be selected
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR', subtype=['PARAMETERS', 'INITIALIZATIONS'])
df

In [None]:
# shows how one of the land use filters can be selected
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR', landuse='Creek')
df

In [None]:
# shows how several of the land use filters can be selected'])
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR', landuse=['Creek', 'Pond'])
df

Any combination of the optional arguments can be used in fetch. Of course the operation argument could be IMPLND or PERLND as needed.

Now show how this can be used. First fetch some data.

In [None]:
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR', subtype='PARAMETERS', landuse=['Creek', 'Pond'])
df

Modify the data associated with the the DB50 Column for the selected land uses.

In [None]:
df.DB50 = df.DB50  * 2.0
df

Now use the replace() convenience function to put this back into the HDF5 file. This requires the replaceinfo returned from the fetch() call.

In [None]:
HSP2tools.replace(replaceinfo, df)

Now check what is really in the HDF5 file to confirm the desired result.

In [None]:
df = pd.read_hdf(hdfname, 'RCHRES/HYDR/PARAMETERS')
df

Compare this with the original DataFrame at the start of this section to see how only the segments with the specified land use were modified.

### Boolean expression filtering

In [None]:
params = pd.read_hdf(hdfname, '/RCHRES/HYDR/PARAMETERS')
params 

Create a Boolean array that selects RCHRES with LEN <= 1.0

Logical operations == (equality test), != (inequality test), & (and), and | (or) are allowed. Each binary logic operation must be contained in a parenthesis.

(Actually, there are many more operations available - but not for this tutorial!)

In [None]:
bool_condition = (params['LEN'] <= 1.0)
bool_condition

#### Show this subset of the original table fitting the Boolean selection

In [None]:
params[bool_condition]

The **~** symbol performs element-wise negation. So we can see the compliment set easily.

In [None]:
params[~bool_condition]

Do some computation on the subset in the original array.  The bool_condition array limits the
changes to the rows for which the bool_condition is true.

The .loc operator treats the boolean as an index.

**NOTE** Saving the data back to the original table needs the .loc operator.

In [None]:
params.loc[bool_condition, 'KS'] = params.KS[bool_condition] * 1.10 + 0.02   
params 

Finally, the modified data can be saved back to the HDF5 file.

In [None]:
params.to_hdf(hdfname, '/RCHRES/HYDR/PARAMETERS', data_columns=True, format='table')

Now view this table using HDFView or COMPASS

Remember to close and reopen the file!

#### HDF5 supports query methods

Note: The query syntax is a string indicating defining the query. It is internally parsed and executed  efficiently within the HDF5 code. This allows reading and modification to tables that are too large to fit into the memory of your computer by working with managable subsets of the data.

Read the RCHRES HYDR PARAMETER data again with a query based on the LEN column:

In [None]:
params = pd.read_hdf(hdfname, '/RCHRES/HYDR/PARAMETERS', where=('LEN > 1.0'))
params

More complex querys are possible - but out of scope for this tutorial.

Just as before, this filtered DataFrame can be modified and saved back to the HDF5 file.

### Create a USER FILTER in a table

Now create a user filter for this table which groups the rows into sets 
with LEN < 1.0 in one set, LEN > 1.0 in another set.
Call this filter REACH_LENGTH.

This technique uses the numpy "where" statement. Traditional loops can be used instead to create the new column.

In [None]:
params  = pd.read_hdf(hdfname, '/RCHRES/HYDR/PARAMETERS')
params['REACH_LENGTH'] = np.where(params['LEN'] <= 1.0, 'SHORT Segment', 'LONG Segment')

params 

Save the DataFrame back to the HDF5 file for future use in selecting data to view, modify, or report.

In [None]:
params.to_hdf(hdfname, '/RCHRES/HYDR/PARAMETERS', data_columns=True, format='table')

So now we have a new filter that we can use to select data.

Use HDFView to see the new filter. Remember to close and reopen the file.

Now use the new filter to read the HDF5 file and show a subset of the data. For fun, use the query style data access.

In [None]:
pd.read_hdf(hdfname, '/RCHRES/HYDR/PARAMETERS', where='REACH_LENGTH == "SHORT Segment"')

Now this filtered table can be used to modify and save the data to the HDF5 file as before. Unlike the land use filter which can be used for any data in the operation, the REACH_LENGTH filter is local to the HYDR PARAMETER table.

Any number of USER DEFINED filters can be established for any table. More complex expressions can use multiple filters at the same time to select the data to be viewed or modified.

Users can modify the code for fetch() and replace() to create their own custom convenience filters since this code is only Pandas and HDF5 - not HSP2 specific. (Except knowing the HDF5 structure to find the data.)

**Section Summary**
 + Demonstrated the Land Use filter capabilty using the fetch() and replace() convenience functions.
     + View data for a specific Land Use or a set of Land Use values.
     + Modify the data selected by the Land Use
     + Save the modified data back to the HDF5 file using replace()
 + Demonstrated creating a boolean filter based on the data in a table column to view, modify, and save the selected data.
 + Demonstrated HDF5 query functions to read only the data matching the query selection.
 + Demonstrated creating User Defined filters for any data table
     + Demonstrated using the User Defined filter to view, modify and save the seleted data back to the HDF5 file.
   


## Section 4: Read, Modify, and Write time series to HDF5<a id='section4'></a>

The previous sections showed read, modify and write operations on tables (Pandas DataFrames) in HDF5. 
Another Pandas 'type' is a Series. If the index values are times, then you have a time series.

Pandas provides many options to cleanup time series problems like missing data, 
aggregation/disagregation to specified time intervals, etc. These options are discussed in other tutorials.

This section shows the basic operations to create a time series, read, modify and write time series data.

In [None]:
# read is the same as for tables, except the returned value is a Series
prec = pd.read_hdf(hdfname, '/TIMESERIES/TS39') 
prec

The following shows how to select a part of the time series based its numerical index (0 to n-1) where n is the length of the array. (The selected index numbers were just randomly selected.)

The slice syntax is [start:stop:skip] with defaults for missing values.

In [None]:
prec[1220:1230]

Instead of the numerical index used above, we can also select a *slice* using the dates.

In [None]:
prec['1976-02-20 20:00':'1976-02-21 05:00']

An example of modifying a time series was shown in the Section 2. For completeness, a calculation example is repeated here.

In [None]:
prec = prec * 2.54   # full vector calculation. 2.54 is multiplied times each element of the time series.
prec['1976-02-20 20:00':'1976-02-21 05:00']  # just show this much

Save to a different name to show creation of a new time series in the HDF5 file.

In [None]:
prec.to_hdf(hdfname, 'Timeseries/Example')
prec

View the new time series using HDFView. 

Remember to close and reopen the file.

**Section Summary**

 + Demonstrated reading a time series from the HDF5 file
 + Demonstrated a computation on the entire time series
 + Demonstrated writing the time series to the HDF5 with a different name (Create new time series.)
 + Demonstrated showing a subset of the time series (called slicing) 

## Section 5: Working with Calculated Datasets from Multiple Segments<a id='section5'></a>

Frequently, one needs to analyze calculated time series across multiple segments. This section will demonstrate how this may be done.

(Since test10 doesn't have multiple PERLND or IMPLND segments, we will again use the 5 REACH segments for this example.)

In order to perform the following steps, HSP2 must be run on hydrology to populate the results.  Running HSP2 will be discussed in depth in the following tutorials.

In [None]:
HSP2.run(hdfname, saveall=True)

### Example: RCHRES ROVOL

#### 1. Create the HDF5 paths (keys) to the desired data 

If you already know the RCHRES segments you care about, you can make a list explicitly:

In [None]:
keys = ['/RESULTS/RCHRES_R001/HYDRO', '/RESULTS/RCHRES_R005/HYDRO']

It is also easy to get all the keys programmatically using any of the filter techniques discussed in Section 3. For example, lets use the RCHRES land use filter to select the RCHRES segments that are Creeks. Pick any activity since the only thing we care about is the land use.

In [None]:
# fetch all data (Flags, Initializations, Parameters, Monthly) for operation RCHRES, activity HYDR
replaceinfo, df = HSP2tools.fetch(hdfname, 'RCHRES', 'HYDR', landuse='Creek')
df

The index array contains the segment ids that we need.

In [None]:
temp = list(df.index)
temp

Now create the keys from the temp list.  This Python expression is a called a list comprehension. However, traditional loops work too.

In [None]:
keys = ['/RESULTS/RCHRES_' + k + '/HYDR' for k in temp]
keys

If you want to build a DataFrame for ROVOL for all RCHRES segments, follow the above without the land use argument. That will return a DataFrame with all RCHRES indices which can the be used to extract the keys.

#### 2.  Build a DataFrame with the ROVOL data from the keys created above.

The RCHRES rid can be trivially extracted from a key for use as a column name:

In [None]:
keys[0][16:20]

In [None]:
keys[0]

Now get the data and put into the DataFrame

In [None]:
ts = pd.DataFrame()
for k in keys:
    colname = k[16:20]
    df = pd.read_hdf(hdfname, k)
    ts[colname] = df.ROVOL
    
ts

The techniques of Tutorial 5 can be used to plot these results, make reports, change the time interval, and perform other analyses.

**Section Summary**

 + Demonstrated how to create a list of keys to the subset of segments desired.
 + Demonstrated how to construct a DataFrame containing the each time series as a column. This makes subsequent analysis easy.

## Section 6: Viewing Simulation Results<a id='section7'></a>


To assist with working with HDF5 files, some Juptyer Notebooks are included with the tutorial Notebooks:

 + ViewPerlnd
 + ViewImplnd
 + ViewRchres
 
Open one of these Notebooks and specify the HDF5 filename, and the segment name.
The until you understand these Notebooks, executing them a cell at a time will help.

Later, you can run the entire Notebook in one operation (from the top menu select **Cell**, then select **Run All**.
After a brief pause, monthly and annual report summary will be printed and the data plotted for all available data.
 
**Note:** In real simulation runs, the user will usually specify saving only small subset of the computed time series to the HDF5 file. These Jupyter Notebooks will display the available information.
 
**Note:**  In RCHRES, some segments may or may not have multiple outlets. Thus the saved data may also depend on the specific segments.

## Section 7: More $\textbf{HSP}^\textbf{2}$  Convenience Functions<a id='section8'></a>

Besides the **fetch** and **replace** convenience functions described in Section 3 above, there are 2 more convenience function used to clone new segments from existing segments and to remove a segment entirely from the HDF5 file. These convenience functions are
  + clone_segment(hdfname, target, clonefrom, cloneto)
  + remove_segment(hdfname, target, segment)

### Cloning or Removing an Entire Segment<a id='section6'></a>

It is useful to be able to easily add a new segment by copying an existing segment. It will still be necessary to review the EXT_SOURCES, SCHEMATIC, NETWORK and MASS_LINK tables to insure the appropriate connectivity and data sources for the new segment.

View the tutorial.h5 file which is based on the HSPF test10 data. It has 1 PERLND, 1 IMPLND, and 5 RCHRES segments. Let's add a new PERLND.

The **clone_segment** command has the following arguments:

```
clone_segment(hdfname, target, clonefrom, cloneto)
```

 + hdfname - name of file in the same directory or full path name of file
 + operation - one of 'PERLND', 'IMPLND', or 'RCHRES'
 + clonefrom - name of segment used as model to duplicate
 + cloneto - name of NEW segment

Use HDFView to check that only PERLND1 exists in any PERLND or global table.

#### Clone a New Segment

In [None]:
clonefrom = 'P001'
cloneto   = 'P_AppleOrchard'
operation = 'PERLND'

In [None]:
HSP2tools.clone_segment(hdfname, operation, clonefrom, cloneto)

Now use HDFView to examine the **tutorial.h5** file to see that P_AppleOrchard was added. 

#### Remove a Segment

The **remove_segment** command has the following arguments.

```
remove_segment(hdfname, operation, segment)
```

 + hdfname - name of file in the same directory or full path name of file
 + operation - one of 'PERLND', 'IMPLND', or 'RCHRES'
 + segment - name of segment to be removed

In [None]:
HSP2tools.remove_segment(hdfname, 'PERLND', 'P_AppleOrchard')

Use HDFView to examine **tutorial.h5** to see that the PERLND P_AppleOrchard segment was removed.

**Section Summary**

Demonstrated the use of HSP2 convenience functions:
  + demonstrated how to add a new segment using information from an existing segment (clone)
  + demonstrated how to remove a segment

## Final Remarks<a id='GIS'></a>
In this original tutorial, this section demonstrated how Geographic Information System (GIS) information (like shapefiles) could be stored in the HDF5 file and used to draw interactive street maps overlayed with the shapefiles.

Unfortunately, this demonstration required the user to install a fair number of Python GIS libraries (like geopandas and mplleaflet.) This made a simple installation of HSP2 and its Tutorials too complicated. (And all these libraries were only used in this one section!) So this orginal section was turned into the discussion below. Tutorial 6 does demonstrates using GIS information to analyze and check the watershed network. This requires putting fake GIS data into the test10 HDF5 file.


### Using Other Data in HDF5 Files (like GIS Data)

HDF5 can be used to store any data - not just numerical time series and tables. The documentation (such as PDF files) for a simulation can be conveniently stored in the simulation's HDF5 file. The Jupyter Notebooks used prepare the data and run the simulations can also be stored in the project's HDF5 file.
This provides a complete archive of all important information to understanding the simulation requirements, data processing, and the simulation results.

GIS data can be easily stored in an HDF5 file. The HDF5 file can allow tight integration between HSP2 and tools like QGIS by being a single "database." For example, the shapefiles for each segment could be stored in a column in the operation's GENERAL_INFO table. Such information can allow enhanced watershed checking in HSP2. For example, the GIS system can calculate the area for each segment and store it in the GENERAL_INFO table. The AFACTR multipliers in SCHEMATC and NETWORK tables can be checked against these values.

Hopefully, many of these libraries will become a core part of the scientific Python stack and not need separate installation over time.

The original Section and code 
is shown in the text cell below for users who would like to explore GIS. Although installing all the libraries is tedious, the code below shows the using GIS functionality is easy.

####  Simple GIS (Geographic Information System) Example

$\text{HSP}^2$ is designed for future expansion. For example, it is easy to incorporate GIS data into the HDF5 file.

##### Read GIS data

The following reads GIS information (i.e. shape files and projection) for Simi Valley (and surrounding area) from the HDF5 file. 

##### Read the Shape File information

```
import geopandas as gpd
temp = pd.read_hdf(hdfname, '/GIS') # read the shapefile data from the HDF5 file
shapes = gpd.GeoDataFrame(temp)     # convert Pandas DataFrame to GeoPandas DataFrame
shapes.plot()
```

##### Read the map projection information


```
newcrs = pd.read_hdf(hdfname, '/CRS').T
```


##### Display on a MAP

The Python libraries necessary to show the shapefiles overlayed on a street map are beyond the basic $\textbf{HSP}^\textbf{2}\ $ distribution. If the appropriate libraries were installed, then a line of code like the following would display a *leaflet* interactive street map with the shapefiles over the street map.

```
mplleaflet.display(fig=shapes.plot().figure, crs=newcrs)
```
    
Other interactive maps such as Google Maps can also be use.

It is also possible to use matplotlib, bokeh, fiona and other tools to produce static maps showing the shapefiles over a traditional (non interactive) map. 

A suggested method is to use a full GIS tool like QGIS which easily works with Python. QGIS Python plug-ins can exchange data with the HDF5 files to provide good visualizations of the watershed and even check connectivity of the segments by layer.