# AEWS Python Notebook 08a: AEWS miscellanea

**Author**: Eric Lehmann, CSIRO Data61  
**Date**:  June 07, 2016

**Note**: The Python code below is "rudimentary" etc. etc. Priority is here given to code interpretability rather than execution efficiency.

**Note**: this notebook should be accessible and viewable at [https://github.com/eric542/agdc_v2/tree/master/notebooks](https://github.com/eric542/agdc_v2/tree/master/notebooks).

## Summary

Building up on the concepts introduced in the previous notebooks in this series, we work out the remaining components of the AEWS implementation, which include:

* (08a) reading extents from lakes / waterbodies of interest from a `.shp` file
* (08b) checking for data availability over ROIs: check for available dates and for empty time slices
* (08b) working out the "units" and range of values in L5, L7 and L8 data
* (08b) implementing the SWIR "glint" filter
* (08b) implementing green / amber / red thresholds
* (08c) generating the WOFS mask when ROI is across multiple tiles
* (08c) compare the SWIR vs. WOFS extents ("low water" flagging)
* (08d) using lakes polygons from `.shp` file to generate masks of lakes boundaries, and proper low-water flag (comparison of SWIR filter output against polygon extents)
* (08e) basic handling of netCDF (`.nc`) files and data: creating, loading, and updating netCDF datasets
* (08f) checking for new Landsat data after a WQ dataset has been created: checking for the latest available dates for a given ROI, add the WQ results to existing dataset
* (08g) working out what AGDC API does in terms of PQ masking, implement own PQ masking to avoid issues when AGDC database contains different dates of NBAR and PQ data.
* (08h) working out how to access and combine data from multiple satellites, for joint processing
* (08i) testing of computational requirements and related issues

As indicated at the beginning of each line, these various tasks have been implemented / tested in separate notebooks for ease of use.

**Abstract**  &mdash; In the notebook below (08a), we simply work out the code necessary to read in .shp files and access the various fields and layers within it.


## Preliminaries

This (Jupyter) notebook was written for use on the NCI's VDI system, with the following pre-loaded module:

```
 $ module use /g/data/v10/public/modules/modulefiles --append
 $ module load agdc-py2-prod 
```

**NOTE**: the specific module loaded here (`agdc-py2-prod`) is different from the module loaded in earlier notebooks (`agdc-py2-dev`)! While the earlier module contained only Landsat 5 data, the `agdc-py2-prod` module links to a (different) AGDC database containing the following NBART/NBAR/PQA datasets:

* Landsat 8: 2013
* Landsat 7: 2013
* Landsat 5: 2006/2007

It is unclear whether the API functions in these 2 modules are identical or represent different versions.

In [1]:
%%html  # Definitions for some pretty text boxes...
<style>
    div.warn { background-color: #e8c9c9; border-left: 5px solid #c27070; padding: 0.5em }
    div.note { background-color: #cce0ff; border-left: 5px solid #5c85d6; padding: 0.5em }
    div.info { background-color: #ffe680; border-left: 5px solid #cca300; padding: 0.5em }
</style>

In [2]:
import numpy as np
import ogr
import json

from pprint import pprint
from __future__ import print_function

## ROIs from shape file

Let's see how we can import regions of interest from a given .shp file. We'll use the file provided by DPI Water containing the lakes and waterbodies of interest in NSW. They provided two shape files, the first of which contains the *vectors* for all lakes in NSW.

In [3]:
lakes_file = '../NSW_lakes/NSW_Lakes.shp'
lakes_vec = ogr.Open(lakes_file)
lakes_lyr = lakes_vec.GetLayer(0)

In [4]:
print( "Nr of layers in the dataset:", lakes_vec.GetLayerCount() )
print( "Dataset metadata:", lakes_vec.GetMetadata() )

print( "\nLayer description:", lakes_lyr.GetDescription() )
print( "Layer extents:", lakes_lyr.GetExtent() )
print( "There are", lakes_lyr.GetFeatureCount(), "features in the layer." )
print( "Layer metadata:", lakes_lyr.GetMetadata() )
print( "Layer CRS:\n", lakes_lyr.GetSpatialRef() )

lakes_feat_X = lakes_lyr.GetFeature(100)

print( "\nFeature field count:", lakes_feat_X.GetFieldCount() )
print( "Feature fields:" )
pprint( lakes_feat_X.items() )
print( "\nFeature coordinates:\n", lakes_feat_X.GetGeometryRef() )

#print( "\nFeature in json format:" )
#pprint( json.loads( lakes_feat_X.ExportToJson() ) )   # long outputs...

Nr of layers in the dataset: 1
Dataset metadata: {}

Layer description: NSW_Lakes
Layer extents: (140.98906536096615, 153.5651335231402, -37.47082809926553, -28.217064921946303)
There are 430 features in the layer.
Layer metadata: {'DBF_DATE_LAST_UPDATE': '2016-04-29'}
Layer CRS:
 GEOGCS["GCS_GDA_1994",
    DATUM["Geocentric_Datum_of_Australia_1994",
        SPHEROID["GRS_1980",6378137.0,298.257222101]],
    PRIMEM["Greenwich",0.0],
    UNIT["Degree",0.0174532925199433]]

Feature field count: 26
Feature fields:
{'ATTRIBUTER': '1985/01/01',
 'CAPTUREMET': 8,
 'CAPTURESOU': 1,
 'CLASSSUBTY': 1,
 'CREATEDATE': '1998/12/15',
 'ENDDATE': '3000/01/01',
 'FEATUREMOD': '2006/01/16',
 'FEATUREREL': '1985/01/01',
 'HYDRONAME': 'TARGET',
 'HYDRONAMEO': 16303,
 'HYDRONAMET': 'LAKE',
 'HYDROTYPE': 1,
 'ISPROCESSE': 'Y',
 'LASTUPDATE': '2010/09/18',
 'OBJECTID': 49668,
 'OBJECTMODD': '2006/01/16',
 'PACKETID': 0.0,
 'PERENNIALI': 1,
 'PLANIMETRI': 20.0,
 'RECORDSTAT': None,
 'RELEVANCE': 10,
 'START

The second .shp file contains only the _extents_ of each lakes (bounding box):

In [5]:
env_file = '../NSW_lakes/NSW_Lakes_envelope.shp'
env_vec = ogr.Open(env_file)
env_lyr = env_vec.GetLayer(0)

In [6]:
print( "Nr of layers in the dataset:", env_vec.GetLayerCount() )
print( "Dataset metadata:", env_vec.GetMetadata() )

print( "\nLayer description:", env_lyr.GetDescription() )
print( "Layer extents:", env_lyr.GetExtent() )
print( "There are", env_lyr.GetFeatureCount(), "features in the layer." )
print( "Layer metadata:", env_lyr.GetMetadata() )
print( "Layer CRS:\n", env_lyr.GetSpatialRef() )

env_feat_X = env_lyr.GetFeature(429)

print( "\nFeature field count:", env_feat_X.GetFieldCount() )
print( "Feature fields:" )
pprint( env_feat_X.items() )
print( "\nFeature coordinates:\n", env_feat_X.GetGeometryRef() )

print( "\nFeature in json format:" )
pprint( json.loads( env_feat_X.ExportToJson() ) )

Nr of layers in the dataset: 1
Dataset metadata: {}

Layer description: NSW_Lakes_envelope
Layer extents: (140.98906536096626, 153.56513352314028, -37.47082809926553, -28.217064921946303)
There are 430 features in the layer.
Layer metadata: {'DBF_DATE_LAST_UPDATE': '2016-04-29'}
Layer CRS:
 GEOGCS["GCS_GDA_1994",
    DATUM["Geocentric_Datum_of_Australia_1994",
        SPHEROID["GRS_1980",6378137.0,298.257222101]],
    PRIMEM["Greenwich",0.0],
    UNIT["Degree",0.0174532925199433]]

Feature field count: 31
Feature fields:
{'ATTRIBUTER': '2011/09/16',
 'CAPTUREMET': 2,
 'CAPTURESOU': 4,
 'CLASSSUBTY': 1,
 'CREATEDATE': '2013/01/18',
 'ENDDATE': '3000/01/01',
 'EXT_MAX_X': 151.844189421,
 'EXT_MAX_Y': -32.693869181,
 'EXT_MIN_X': 151.770721942,
 'EXT_MIN_Y': -32.7694036189,
 'FEATUREMOD': '2015/08/21',
 'FEATUREREL': '2011/09/16',
 'HYDRONAME': 'GRAHAMSTOWN',
 'HYDRONAMEO': 4380,
 'HYDRONAMET': 'LAKE',
 'HYDROTYPE': 1,
 'ISPROCESSE': 'Y',
 'LASTUPDATE': '2015/08/21',
 'OBJECTID': 466459,


<div class=info>
<b>INFO:</b> Subsequently to those files, DPI Water have provided an additional dataset contained in two new .shp files, namely 'NSW_WaterBody.shp' and 'NSW_WaterBody_Envelope.shp', which contain similar data to the files used above, but for a total of 1947 water bodies in NSW; these are investigated further in the notebook <i>'AEWS Python Notebook 08d'</i>.
</div>

So, using the second .shp file of lakes extents, we can simply extract the desired ROIs for AEWS as follows (we would then simply use this piece of code to determine the lon/lat ranges in our AGDC queries):

In [7]:
env_file = '../NSW_lakes/NSW_Lakes_envelope.shp'
env_vec = ogr.Open(env_file)
env_lyr = env_vec.GetLayer(0)
print( "Do we have only one layer?", env_vec.GetLayerCount()==1)
print( "\nLat/lon extents for all features:")

kk_lim = 7   # just to reduce outputs...
n_ftr = env_lyr.GetFeatureCount()
for kk in range( n_ftr ):
    if kk==kk_lim:
        print("  [...]")
    elif kk<kk_lim or kk>n_ftr-kk_lim:
        ftr = env_lyr.GetFeature(kk)
        ftr_json = json.loads(ftr.ExportToJson())
        coords = np.array( ftr_json['geometry']['coordinates'][0] )
        min_lat = min(coords[:,1])
        max_lat = max(coords[:,1])
        min_lon = min(coords[:,0])
        max_lon = max(coords[:,0])
        
        lake_name = ftr_json['properties']['HYDRONAME']
        lake_type = ftr_json['properties']['HYDRONAMET']
        lake_nr = ftr_json['properties']['HYDRONAMEO']
        
        print( "  Feat. {}:".format(kk), lake_type, lake_name, "({}):".format(lake_nr),
               "lat. {}...{} / lon. {}...{}".format(min_lat,max_lat,min_lon,max_lon) )

Do we have only one layer? True

Lat/lon extents for all features:
  Feat. 0: LAKE IRONBARK (21297): lat. -33.6960849334...-33.6939385712 / lon. 150.912791081...150.914756425
  Feat. 1: LAKE DOUJON (21298): lat. -33.8918578038...-33.8895744451 / lon. 150.847428457...150.850120214
  Feat. 2: LAKE PITARPUNGA (8082): lat. -34.4169698285...-34.3186391384 / lon. 143.428375641...143.550909552
  Feat. 3: LAKE COOTRALANTRA (2649): lat. -36.2676662472...-36.2565286189 / lon. 148.882132208...148.896117075
  Feat. 4: LAKE COOPERS (2635): lat. -36.5345346465...-36.5291472767 / lon. 149.135589624...149.142616936
  Feat. 5: LAKE BEARDS (469): lat. -36.6491484954...-36.6296575853 / lon. 149.02652923...149.044251376
  Feat. 6: LAKE GREEN (4427): lat. -37.0269602001...-37.0245491853 / lon. 149.260632167...149.263162143
  [...]
  Feat. 424: LAKE MOONLIGHT (6961): lat. -34.2138197999...-34.1868210278 / lon. 142.910879539...142.942503438
  Feat. 425: LAKE RACECOURSE (8357): lat. -35.618858643...-35.601778