## FWAKit Example 1 - linear referencing of hydrometric stations

fwakit includes functions that locates point features on the stream network by giving points a `blue_line_key` and `downstream_route_measure`. ltree type `fwa_watershed_code` and `local_watershed_code` values are also assigned for quick upstream/downstream queries.

### 1 - FWA setup
Once fwakit is installed (and the PostgreSQL/PostGIS/GDAL requirements):

- set the `FWA_DB` environment variable
- download and prep the FWA database as outlined in the README.  

To keep things small for this example, we'll use the test data (SALM watershed group) rather than the entire province, and only use three layers (lakes, streams, watersheds):

In [75]:
%env FWA_DB=postgresql://postgres:postgres@localhost:5432/fwakit

!fwakit create_db            
!fwakit download --source_url https://www.hillcrestgeo.ca/outgoing/fwakit/ --dl_path test_data --files FWA_BC.gdb.zip,FWA_STREAM_NETWORKS_SP.gdb.zip,FWA_WATERSHEDS_POLY.gdb.zip
!fwakit load --dl_path test_data -l fwa_lakes_poly,fwa_stream_networks_sp,fwa_watershed_poly_sp,fwa_manmade_waterbodies_poly,fwa_rivers_poly
!fwakit clean

env: FWA_DB=postgresql://postgres:postgres@localhost:5432/fwakit
Downloading FWA_BC.gdb.zip
Downloading FWA_STREAM_NETWORKS_SP.gdb.zip
Downloading FWA_WATERSHEDS_POLY.gdb.zip
Loading FWA source data to PostgreSQL database
Loading fwa_lakes_poly
Loading fwa_manmade_waterbodies_poly
Loading fwa_rivers_poly
Loading fwa_stream_networks_sp by watershed group
SALM
fwa_lakes_poly: cleaning
fwa_lakes_poly: adding ltree types
fwa_lakes_poly: indexing
fwa_manmade_waterbodies_poly: cleaning
fwa_manmade_waterbodies_poly: adding ltree types
fwa_manmade_waterbodies_poly: indexing
fwa_rivers_poly: cleaning
fwa_rivers_poly: adding ltree types
fwa_rivers_poly: indexing
fwa_watershed_groups_poly: cleaning
fwa_watershed_groups_poly: indexing
fwa_stream_networks_sp: cleaning
fwa_stream_networks_sp: adding ltree types
fwa_stream_networks_sp: indexing
fwa_watersheds_poly_sp: cleaning
fwa_watersheds_poly_sp: adding ltree types
fwa_watersheds_poly_sp: indexing


### 2 - Get some points

Grab Hydrometric Station points from Environment Canada and load to postgres:


In [3]:
# download
!wget http://dd.weather.gc.ca/hydrometric/doc/hydrometric_StationList.csv

# clean the header
!sed '1s/.*/id,name,lat,lon,prov,timezone/' hydrometric_StationList.csv > hydrostn.csv

# load to postgres
!ogr2ogr \
  --config PG_USE_COPY YES \
  -s_srs EPSG:4326 \
  -t_srs EPSG:3005 \
  -f PostgreSQL 'PG:host=localhost user=postgres dbname=fwakit password=postgres' \
  -lco OVERWRITE=YES \
  -overwrite \
  -lco GEOMETRY_NAME=geom \
  -oo X_POSSIBLE_NAMES=lon* \
  -oo Y_POSSIBLE_NAMES=lat* \
  -oo KEEP_GEOM_COLUMNS=NO \
  -sql "SELECT * FROM hydrostn WHERE prov='BC'" \
  -nln hydrostn \
  hydrostn.csv

--2018-02-25 21:27:08--  http://dd.weather.gc.ca/hydrometric/doc/hydrometric_StationList.csv
Resolving dd.weather.gc.ca (dd.weather.gc.ca)... 205.189.10.47
Connecting to dd.weather.gc.ca (dd.weather.gc.ca)|205.189.10.47|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Mon, 26 Feb 2018 05:27:09 GMT
  Server: Apache
  Last-Modified: Mon, 26 Feb 2018 05:20:53 GMT
  Accept-Ranges: bytes
  Content-Length: 156529
  Content-Type: text/csv
  Via: 1.1 dd.weather.gc.ca
  X-UA-Compatible: IE=Edge
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
Length: 156529 (153K) [text/csv]
Saving to: ‘hydrometric_StationList.csv’


2018-02-25 21:27:09 (229 KB/s) - ‘hydrometric_StationList.csv’ saved [156529/156529]



Now check that data is loaded and in the right place:

In [3]:
%matplotlib inline


import geopandas as gpd
import matplotlib.pyplot as plt
import mplleaflet
import fwakit as fwa

db = fwa.util.connect()
df = gpd.read_postgis("SELECT id, prov, ST_Transform(geom, 4326) as geom FROM hydrostn", db.engine)
ax = df.plot()
mplleaflet.display(tiles='esri_worldtopo')


env: FWA_DB=postgresql://postgres:postgres@localhost:5432/fwakit


### 3 - Match points to stream network

In [26]:
fwa.reference_points('public.hydrostn', 'id', 'public.hydrostn_streams', 100, db=db)                        

What kind of match are we getting? 
Join the result to the source points and map, plus look at the event table itself - how far was each point from a stream?

In [27]:
stngdf = gpd.read_postgis("SELECT a.id, ST_Transform(a.geom, 4326) as geom "
                          "FROM hydrostn a "
                          "INNER JOIN hydrostn_streams b ON a.id=b.id", db.engine)
ax = stngdf.plot()
mplleaflet.display(tiles='esri_worldtopo') 

In [28]:
# look at the event table itself
import pandas as pd

df = pd.read_sql('SELECT * FROM hydrostn_streams', db.engine)
df

Unnamed: 0,id,linear_feature_id,wscode_ltree,localcode_ltree,fwa_watershed_code,local_watershed_code,blue_line_key,downstream_route_measure,distance_to_stream
0,08HD006,710442195,920.722273,920.722273.097248,920-722273-000000-000000-000000-000000-000000-...,920-722273-097248-000000-000000-000000-000000-...,354152994,10983.087494,63.308356
1,08HD007,710448113,920.722273.416851,920.722273.416851,920-722273-416851-000000-000000-000000-000000-...,920-722273-416851-000000-000000-000000-000000-...,354148454,27.19476,82.331672
2,08HD007,710447588,920.722273,920.722273.410699,920-722273-000000-000000-000000-000000-000000-...,920-722273-410699-000000-000000-000000-000000-...,354152994,36576.107628,37.614846
3,08HD015,710455835,920.722273,920.722273.606660,920-722273-000000-000000-000000-000000-000000-...,920-722273-606660-000000-000000-000000-000000-...,354152994,54628.950443,10.529124


Note that for the three stations in our test area, we have four results... station id `08HD007` has been joined to two streams: `blue_line_key` `354148454` and `354152994`. Both streams are within the specified 100m threshold distance from this station (at 82.3 and 37.6 m respectively, see the `distance_to_stream` column above). While the closest stream may indeed be the correct match, this may not always be the case. To better match our points to the correct stream we can check the station name against the stream name.

### 4 - Improve match of points to stream network

First, with this small example (one station, two matches) it is easy to manually compare the station name to the stream name and the distance to stream:

In [29]:
sql = """ -- find stations with more than one matched stream
          WITH multi_matched AS 
          (
          SELECT id, count(*)
          FROM hydrostn_streams
          GROUP BY id
          HAVING count(*) > 1
          )
         -- get names and distance to stream for these stations
         SELECT 
          m.id, 
          p.name as station_name,
          s.gnis_name as stream_name,
          e.distance_to_stream
        FROM multi_matched m
        INNER JOIN hydrostn p ON m.id = p.id
        INNER JOIN hydrostn_streams e ON p.id = e.id
        INNER JOIN whse_basemapping.fwa_stream_networks_sp s
        ON e.linear_feature_id = s.linear_feature_id"""
df = pd.read_sql(sql, db.engine)
df


Unnamed: 0,id,station_name,stream_name,distance_to_stream
0,08HD007,SALMON RIVER ABOVE MEMEKAY RIVER,Salmon River,37.614846
1,08HD007,SALMON RIVER ABOVE MEMEKAY RIVER,Kay Creek,82.331672


Conveniently, the station is on Salmon River, and the Salmon River is the closest stream. But we can do better than matching on the closest stream, postgres supplies several methods for string matching. 
As we are only comparing one value (the station name) to at most 2 or 3 matches (the streams within the threshold distance), a string comparison may work well. Lets try the `pg_trgm` extension first (https://www.postgresql.org/docs/10/static/pgtrgm.html) - adding a single line to the previous query, is the stream name similar to the station name?

In [30]:
db.execute('CREATE EXTENSION IF NOT EXISTS pg_trgm')

sql = """WITH multi_matched AS 
          (
          SELECT id, count(*)
          FROM hydrostn_streams
          GROUP BY id
          HAVING count(*) > 1
          )
      
         SELECT 
          m.id, 
          p.name as station_name,
          s.gnis_name as stream_name,
          s.gnis_name %% p.name as trgm_comparison,
          e.distance_to_stream
        FROM multi_matched m
        INNER JOIN hydrostn p ON m.id = p.id
        INNER JOIN hydrostn_streams e ON p.id = e.id
        INNER JOIN whse_basemapping.fwa_stream_networks_sp s
        ON e.linear_feature_id = s.linear_feature_id"""
df = pd.read_sql(sql, db.engine)
df

Unnamed: 0,id,station_name,stream_name,trgm_comparison,distance_to_stream
0,08HD007,SALMON RIVER ABOVE MEMEKAY RIVER,Salmon River,True,37.614846
1,08HD007,SALMON RIVER ABOVE MEMEKAY RIVER,Kay Creek,False,82.331672


This works well for the test station. We can create a new table that refines the matching of station to stream with this logic:

In [32]:
sql = """CREATE TABLE hydrostn_referenced AS      
         SELECT 
          a.*, 
          p.name as station_name,
          str.gnis_name as stream_name
        FROM hydrostn_streams a
        INNER JOIN hydrostn p ON a.id = p.id
        INNER JOIN whse_basemapping.fwa_stream_networks_sp str
        ON a.linear_feature_id = str.linear_feature_id
        WHERE str.gnis_name %% p.name
      """
db.execute(sql)
df = pd.read_sql('SELECT * FROM hydrostn_referenced', db.engine)
df

Unnamed: 0,id,linear_feature_id,wscode_ltree,localcode_ltree,fwa_watershed_code,local_watershed_code,blue_line_key,downstream_route_measure,distance_to_stream,station_name,stream_name
0,08HD006,710442195,920.722273,920.722273.097248,920-722273-000000-000000-000000-000000-000000-...,920-722273-097248-000000-000000-000000-000000-...,354152994,10983.087494,63.308356,SALMON RIVER NEAR SAYWARD,Salmon River
1,08HD007,710447588,920.722273,920.722273.410699,920-722273-000000-000000-000000-000000-000000-...,920-722273-410699-000000-000000-000000-000000-...,354152994,36576.107628,37.614846,SALMON RIVER ABOVE MEMEKAY RIVER,Salmon River
2,08HD015,710455835,920.722273,920.722273.606660,920-722273-000000-000000-000000-000000-000000-...,920-722273-606660-000000-000000-000000-000000-...,354152994,54628.950443,10.529124,SALMON RIVER ABOVE CAMPBELL LAKE DIVERSION,Salmon River


### 5 - Use the referenced points

With the sample points now referenced to the streams, we can use some FWAKit functions to ask stream and watershed related questions:

First, what is the length of stream upstream and downstream of each of the stations?


In [58]:
sql = """SELECT 
           id,
           blue_line_key,
           downstream_route_measure,
           fwa_lengthdownstream(blue_line_key, downstream_route_measure) / 1000 AS dnstr_length_km,
           fwa_lengthupstream(blue_line_key, downstream_route_measure) / 1000 AS upstr_length_km 
         FROM hydrostn_referenced"""
df = pd.read_sql(sql, db.engine)
df

Unnamed: 0,id,blue_line_key,downstream_route_measure,dnstr_length_km,upstr_length_km
0,08HD006,354152994,10983.087494,10.983087,2206.267682
1,08HD007,354152994,36576.107628,41.580809,738.937124
2,08HD015,354152994,54628.950443,60.65515,446.664346


Note that while these points are all on the same `blue_line_key`, (the Salmon River, which flows to the ocean), the `dnstr_length_km` numbers returned above are *not* equivalent to the `downstream_route_measure` - the values are higher because all side channel flow is included in the calculations (or, to be precise, side channels having a `local_watershed_code` value are included). 

We can illustrate this difference by drawing the side channels downstream of station `08HD007` in red:

In [71]:
sql = """SELECT b.linear_feature_id, ST_Transform(b.geom, 4326) as geom
         FROM 
          whse_basemapping.fwa_stream_networks_sp b
         INNER JOIN hydrostn_referenced a ON
         -- donwstream criteria 1 - same blue line, lower measure
          (b.blue_line_key = a.blue_line_key AND
           b.downstream_route_measure <= a.downstream_route_measure)
          OR
          -- criteria 2 - watershed code a is a child of watershed code b
          (b.wscode_ltree @> a.wscode_ltree
              AND (
                   -- AND local code is lower
                   b.localcode_ltree < subltree(a.localcode_ltree, 0, nlevel(b.localcode_ltree))
                   -- OR wscode and localcode are equivalent
                   OR b.wscode_ltree = b.localcode_ltree
                   -- OR any missed side channels on the same watershed code
                   OR (b.wscode_ltree = a.wscode_ltree AND
                       b.blue_line_key != a.blue_line_key AND
                       b.localcode_ltree < a.localcode_ltree)
                   )
          )
        WHERE a.id = '08HD007'
        AND b.blue_line_key != b.watershed_key
  """
f, ax = plt.subplots(1)
streams_gdf = gpd.read_postgis(sql, db.engine)
streams_gdf.plot(axes=ax, color='r')
stngdf.plot(axes=ax)
mplleaflet.display(tiles='esri_worldtopo') 

Next, another available function is `fwa_lengthinstream`, the length of stream between two locations. We can ask, what is the length of stream between two stations? (D006 and D015)

In [44]:
sql = """WITH stn1 AS (SELECT * FROM hydrostn_referenced WHERE id = '08HD006'),
              stn2 AS (SELECT * FROM hydrostn_referenced WHERE id = '08HD015')
         SELECT 
           fwa_lengthinstream(stn1.blue_line_key, 
                                     stn1.downstream_route_measure,
                                     stn2.blue_line_key,
                                     stn2.downstream_route_measure) AS instream_length_km 
         FROM stn1, stn2"""
df = pd.read_sql(sql, db.engine)
df

Unnamed: 0,instream_length_km
0,43645.862949


NOTE `instream_length_km` returns only non-side channel streams when querying points on the same blue_line_key. This is a valuable metric but does not match the lengthupstream and lengthdownstream functions. Perhaps we need to add an option to each function specifying whether to return only main channel distances.

This example of `fwa_lengthinstream` is very basic but there are many scenarios where it can be useful. Imagine we have points of diversion and effuent discharge points both referenced to the stream network. We might want to know all water points of diversion that are within 1000m of an outlet. Using this function, we can answer this question by joining the two (hypothetical) tables a (discharges) and b (points of diversion):


In [73]:
sql = """
-- find all points of diversion downstream of discharge points
WTIH dnstr_pod AS (
SELECT
  a.id as id_a,
  a.blue_line_key as blue_line_key_a,
  a.downstream_route_measure as downstream_route_measure_a,
  b.id as id_b,
  b.blue_line_key as blue_line_key_b,
  b.downstream_route_measure as downstream_route_measure_b
FROM discharges a
INNER JOIN points_of_diversion b
-- use a simple spatial query to get candidates for the instream distance measure
ON ST_DWithin(a.geom, b.geom, 1100)
-- less than 1km downstream
AND fwa_lengthinstream(a.blue_line_key_obs,
                      a.downstream_route_measure_obs,
                      b.blue_line_key,
                      b.downstream_route_measure) <= 1000
-- greater than 0 km downstream (ie, not upstream)
AND fwa_lengthinstream(blue_line_key_obs,
                       downstream_route_measure_obs,
                       blue_line_key,
                       downstream_route_measure) >= 0
)

-- aggregate the result, one row per discharge point result
SELECT
  id_a,
  array_agg(id_b) as dnstr_pods
FROM dnstr_pod
GROUP BY id_a;
"""