<img src="Images/HSP2.png" />
This Jupyter Notebook Copyright 2017 by RESPEC, INC.  All rights reserved.

$\textbf{HSP}^{\textbf{2}}\ \text{and}\ \textbf{HSP2}\ $ Copyright 2017 by RESPEC INC. and released under this [License](LegalInformation/License.txt)

# TUTORIAL 6: HSP$^2$ Watershed Network Tool

## Introduction

This tutorial will demonstrate the use of the watershed network tool.
This tool can be used to

 + check the HSPF schematic and network data to find problems such as disconnected elements
 + (future) check that flux leaving a segment is balanced with flux entering other segments
 + create visual representations of the watershed network
 + create the simulation OP_SEQ
 + determine the smallest amount of recalculation required when simulation parameters are changed (SMART RUN)
 
 **Tutorial Contents**

 + Section 1: [Create a Watershed Network Graph](#section1)
 + Section 2: [Create an Operational Sequence (OP_SEQ)](#section2)
 + Section 3: [Create a Smart Operational Sequence](#section3)
 

### Required Python imports

In [None]:
import os
import site
site.addsitedir(os.getcwd().rsplit('\\',1)[0] + '\\')  # adds your path to the HSP2 software.

hdfname = 'TutorialData/tutorial.h5'

import shutil
import numpy as np
from IPython.display import Image      # this displays graphic objects on this notebook
import networkx

import pandas as pd
pd.options.display.max_rows    = 18
pd.options.display.max_columns = 10
pd.options.display.float_format = '{:.2f}'.format  # display 2 digits after the decimal point

from matplotlib import pyplot as plt
%matplotlib inline
 
import HSP2
import HSP2tools

HSP2tools.reset_tutorial()    # make a new copy of the tutorial's data
HSP2tools.versions()          # display version information below

## Section 1: Create a Watershed Network Graph<a id='section1'></a>

The watershed flow network is mathematically a Directed Acyclic Graph (DAG).
This tool creates a DAG from the $\textbf{HSP}^\textbf{2}$ NETWORK, SCHEMATIC, and MASS_LINK tables using a graph algorithm library, networkx.

The networkx library can apply many graph algorithms
to the DAG which are useful in watershed modelling to check the integrity of the watershed model. This checking would be vastly more powerful if Geographic Information is available to provide shapefiles, areas, elevations, slopes and coordinates (such as the shapefile centroid's coordinates for each segment. This information can be stored in the HDF5 file.

### Check for Problems in the watershed's DAG
This routine checks to insure that the watershed's graph is a proper DAG. That is, it looks for disconnected segments and loops.

**Note:** Additional types of checking will be added soon.

In [None]:
HSP2tools.check_network(hdfname)

### Display the watershed's DAG

For this example, some additional information will be put into the HDF5 file to simulate adding GIS information that can be used by this tool. First, review the information in the RCHRES GENERAL_INFO table:

In [None]:
df = pd.read_hdf(hdfname, '/RCHRES/GENERAL_INFO')
df

Now add some (fake) data to the GENERAL_INFO tables for PERLND, IMPLND, and RCHRES.
+ GISarea would be the segment's area computed from GIS shapefiles or other sources.
+ GISx and GISy represent scaled coordinates of the the segment (perhaps as represented by the coordinates of the segment's centroid.

In [None]:
HSP2tools.graphtutoral_test10(hdfname)

Now check that new data is available in the HDF5 file:

In [None]:
df = pd.read_hdf(hdfname, '/RCHRES/GENERAL_INFO')
df

The following code will read the PERLND, IMPLND, and RCHRES GENERAL_INFO tables. It will build Python dictionaries for that colors and coordinates that it found:

In [None]:
dcolor = {}
dpositions = {}

operations = ['PERLND', 'IMPLND', 'RCHRES']
for operation in operations:
    for i,r in pd.read_hdf(hdfname, operation + '/GENERAL_INFO').iterrows():
        name = operation + '\n' + i
        dcolor[name]     = r.GIScolor
        dpositions[name] = (r.GISx, r.GISy)

Now build the DAG for the test10 watershed. 

The sep argument will split the long name to fit into the graph better. It is optional.

In [None]:
dg = HSP2tools.graph_fromHDF(hdfname, sep='\n')

This cell builds a list of colors from DAG nodes used to look up the color in the dcolor dictionary above.

In [None]:
colors = [dcolor[x] for x in dg.nodes()]

#### Now view the DAG

In [None]:
plt.figure(figsize=[10,10])
plt.axis('off')

networkx.draw_networkx(dg, pos=dpositions, node_color=colors, node_size=4500, node_shape='s')

The DAG shows the connectivity of the watershed, but more can be done.
the code created the DAG, it also looked for the RCHRES segments that had no decendents, segments that had no predecessor, or had neither. It marked those nodes with special colors to make this visible in the network graph.

In [None]:
for node in dg.nodes():
    dagcolor = dg.node[node]['fillcolor']
    if dagcolor:
        dcolor[node] = dagcolor
colors = [dcolor[x] for x in dg.nodes()]

In [None]:
plt.figure(figsize=[10,10])
plt.axis('off')

networkx.draw_networkx(dg, pos=dpositions, node_color=colors, node_size=4500, node_shape='s')

The gold colored square identifies a RCHRES that does not feed into another segment. A red colored node would be an isolated node (no predecessor nor successor nodes). A dark green square would be a RCHRES with no predecessor flowing into it.

If GIS area and elevation data were available, it would be easy to extend the tool to perform more rigorous checking of the watershed network model.

The hardest problem with viewing network graphs is to create a layout that is informative.  The use of GIS information allows the network graph to be layed out using GIS coordinates. The network graph can even be viewed on top of maps showing streets and other topological features.

The networkx library allows an arbitrary amount of information to be attached to each node and connecting edge. The network tool sets properties on the node like optype (PERLND, IMPLND or RCHRES) and segment (R004) to assist this process.

Unfortunately, network graph viewers are operationg system dependent and fall outside this tutorial.
In general, network graph viewers look for names specific to the viewer to set properties such as edge labels, node labels, node shape, node colors, etc. So you can write a small piece of code to set these required node properties and then write out the file in a format that can be read by your viewer (using the networkx write functions.)


Assume the network graph viewer you selected uses the node property named *shape* with options like 'square', 'circle', and 'diamond'. You desire to change the node shape based on the associated opertion.

Then code like the following can set this property based on the node's optype (set by this tool):

```
for node in dg.nodes:
   if dg.nodes[node]['optype'] == 'PERLND':
       dg.nodes[node]['shape'] =  'square'
   elif dg.nodes[node]['optype'] == 'IMPLND':
       dg.nodes[node]['shape'] = 'diamond'
   elif dg.nodes[node]['optype'] == 'RCHRES':
       dg.nodes[node]['shape'] = 'circle'
```

Then use a one of the many networkx routines to write the DAG for  your viewer. For example, to write the graph as a GraphML format you would do
```
networkx.write_graphml(df, "test.graphml")
```

Then your network graph viewer can display your watershed's DAG.

**Section Summary**

 + Demonstrated making a Directed Acyclic Graph representing the flows in a watershed from the HDF5 file
 + Demonstrated checking the watershed model for disconnected elements
 + Demonstrated converting the graph into a variety of formats
     + PDF
     + JPEG
     + SVG
     + PNG

## Section 2: Create an Operational Sequence (OP_SEQ) table<a id='section2'></a>

The DAG can be used to create the OP_SEQ table which is then be saved into the HDF5 file to be used in the simulation. Mathematically, the DAG is sorted with a topological sort algorithm.

Start by deleting the OP_SEQUENCE table from **tutorial.h5**.

In [None]:
with pd.get_store(hdfname) as store:
    del store['/CONTROL/OP_SEQUENCE']

Now show the error when trying to read the OP_SEQUENCE table:

In [None]:
pd.read_hdf(hdfname, '/CONTROL/OP_SEQUENCE')

Run the utility to make an operational sequence.

In [None]:
HSP2tools.make_opseq(hdfname)

##### View the OP_SEQ

In [None]:
pd.read_hdf(hdfname, '/CONTROL/OP_SEQUENCE')

##### Run the simulation to check that it works

In [None]:
HSP2.run(hdfname)

## Section 3: Create a "Smart Run" Operational Sequence<a id='section3'></a>

**Use Case** Rerun the smallest number of operations for simulation.

Many times a simulation will be rerun with changes to only some of the watershed's data.
The "SMART RUN" capability creates an OP_SEQ which only performs the minimum set of operations to save run time.

Copy the tutorial.h5 file to make master.h5 and sim1.h5 files for this example.

In [None]:
master =  'TutorialData/Master.h5'
sim1   =  'TutorialData/sim1.h5'

shutil.copyfile(hdfname, master)
shutil.copyfile(hdfname, sim1)

The master file will be the fixed watershed reference.

First, check what happens with no changes

In [None]:
HSP2tools.smart_opseq(master, sim1)

The sim1 HDF5 file represents one of the HDF5 files you would create while analyzing the watershed.
Now make a change to the **sim1.h5** file as if you were exploring the impact of changing a parameter.

(This works with any number of changes to 

In [None]:
df = pd.read_hdf(sim1, '/RCHRES/HYDR/STATE')
df

In [None]:
df.loc['R003', 'VOL'] = 5  
df

Save the change to the **sim.h5** HDF5 file

In [None]:
df.to_hdf(sim1, '/RCHRES/HYDR/STATE', data_columns=True, format='table')

Now that there is at least one difference between the master and sim HDF5 files, we can 
determine the minimum simulation run (assuming all previous results are available when needed.)

#### Smart Operation Sequence

In [None]:
HSP2tools.smart_opseq(master, sim1) 

#### Check to see what the new OP_SEQ table looks like

In [None]:
pd.read_hdf(sim1, '/CONTROL/OP_SEQUENCE')

The OP_SEQUENCE table shows that only a subset of the watershed network must be rerun.

#### Run the simulation with the Smart OPSEQ

In [None]:
HSP2.run(sim1)

Currently, **smart_opseq**  checks every table under the "/PERLND", "/IMPLND" and "/RCHRES" directories in the HDF5, and the NETWORK, SCHEMATIC, and EXT_SOURCES tables to determine
which segments need to be rerun. (Even tables added by the user.) It automatically reruns all segments "down stream" from the
changed segments.  Any change is considered significant.

It can be extended to check the other tables such as the MASS_LINK table if desired.

#### Tutorial 7 discusses a more advanced capability for the smart_opseq.

**Section Summary**

 + Demonstrate creating the **OP_SEQ** table from schematic, network, and mass link tables
 + Demonstrated **SMART RUN** to create the minimal calculation **OP_SEQ** table when some simulation parameters are changed