# WOfS Summary Validation Data Analysis - Point-based and Parallel  <img align="right" src="../Supplementary_data/DE_Africa_Logo_Stacked_RGB_small.jpg">

* **Products used:** 
[ga_ls8c_wofs_2](https://explorer.digitalearth.africa/ga_ls8c_wofs_2),
[ga_ls8c_wofs_2_summary ](https://explorer.digitalearth.africa/ga_ls8c_wofs_2_summary)

## Background
The [Water Observations from Space (WOfS)](https://www.ga.gov.au/scientific-topics/community-safety/flood/wofs/about-wofs) is a derived product from Landsat 8 satellite observations as part of provisional Landsat 8 Collection 2 surface reflectance and shows surface water detected in Africa.
Individual water classified images are called Water Observation Feature Layers (WOFLs), and are created in a 1-to-1 relationship with the input satellite data. 
Hence there is one WOFL for each satellite dataset processed for the occurrence of water.

The data in a WOFL is stored as a bit field. This is a binary number, where each digit of the number is independantly set or not based on the presence (1) or absence (0) of a particular attribute (water, cloud, cloud shadow etc). In this way, the single decimal value associated to each pixel can provide information on a variety of features of that pixel. 
For more information on the structure of WOFLs and how to interact with them, see [Water Observations from Space](../Datasets/Water_Observations_from_Space.ipynb) and [Applying WOfS bitmasking](../Frequently_used_code/Applying_WOfS_bitmasking.ipynb) notebooks.

## Description
This notebook explains how you can perform validation analysis for WOFS Annual Summary using collected ground truth dataset and point-based sampling. 

The notebook demonstrates how to:

1. Load validation points cleaning stage either as *.csv or as an ESRI shapefile
2. Query WOfS annual summary product for validation points and capture available the frequency value for each point 
***

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

After finishing the analysis, you can modify some values in the "Analysis parameters" cell and re-run the analysis to load WOFLs for a different location or time period.

### Load packages
Import Python packages that are used for the analysis.

In [1]:
%matplotlib inline

import time 
import datacube
from datacube.utils import masking, geometry 
import sys
import os
import dask 
import rasterio, rasterio.features
import xarray as xr
import glob
import numpy as np
import pandas as pd
import seaborn as sn
import geopandas as gpd
import subprocess as sp
import matplotlib.pyplot as plt
import scipy, scipy.ndimage
import warnings
warnings.filterwarnings("ignore") #this will suppress the warnings for multiple UTM zones in your AOI 

sys.path.append("../Scripts")
from rasterio.mask import mask
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
from deafrica_plotting import map_shapefile,display_map, rgb
from deafrica_spatialtools import xr_rasterize
from deafrica_datahandling import wofs_fuser, mostcommon_crs,load_ard,deepcopy
from deafrica_dask import create_local_dask_cluster

#for parallelisation 
from multiprocessing import Pool, Manager
import multiprocessing as mp
from tqdm import tqdm

### Analysis parameters

To analyse validation points collected by each partner institution, we need to obtain WOfS surface water observation data that corresponds with the labelled input data locations. 

### Loading Dataset

1. Load validation points for each partner institutions as a list of observations each has a location and month
    * Load the cleaned validation file as *.csv or ESRI *.shp
2. In case of loading shapefile, please ignore the following three cells and go to the next section

In [2]:
path2csv = '../Supplementary_data/Validation/Refined/Continent/AfriGIST_inland_5D.csv'
df = pd.read_csv(path2csv,delimiter=",")

In [3]:
geometry = [Point(xy) for xy in zip(df.LON, df.LAT)]
crs = {'init': 'epsg:4326'} 
ValPoints = GeoDataFrame(df, crs=crs, geometry=geometry)

In [4]:
ValPoints.to_file(filename='../Supplementary_data/Validation/Refined/Continent/AfriGIST_inland_5D.shp') 

### Sample WOfS annual summary product at the ground truth coordinates using point locations 
To load WOFL data, we can first create a re-usable query as below that will define the time period we are interested in, as well as other important parameters that are used to correctly load the data. 
As WOFLs are created scene-by-scene, and some scenes overlap, it's important when loading data to `group_by` solar day.

In [2]:
#generate query object 
query ={'resolution':(-30, 30),
        'group_by':'solar_day',
        'output_crs':'EPSG:6933',
        'time':'2018'}

In [3]:
path2shp = '../Supplementary_data/Validation/Refined/Continent/AfriGIST_inland_5D.shp'

In [4]:
input_data = gpd.read_file(path2shp).to_crs('epsg:6933') #reading the table and converting CRS to metric 
#input_data.columns

Index(['Unnamed_ 0', 'PLOT_ID', 'LON', 'LAT', 'CLASS', 'MONTH', 'ACTUAL',
       'CLASS_WET', 'CLEAR_OBS', 'PREDICTION', 'geometry'],
      dtype='object')

In [5]:
input_data= input_data.drop(['Unnamed_ 0'], axis=1)

In [6]:
#input_data.loc[input_data['PLOT_ID'] == 137483175]

In [7]:
Summarize = input_data.groupby('PLOT_ID',as_index=False).last()

In [8]:
coords = [(x,y) for x, y in zip(Summarize.geometry.x, Summarize.geometry.y)]

In [9]:
#Summarize

In [10]:
#function to sample WOfS for each validation point for early five days of each month 
def get_WS_for_point(index, row, input_data, query, results_freq):
    dc = datacube.Datacube(app='WOfS_accuracy')
    plot_id = input_data.loc[index]['PLOT_ID']
    #having the original query as it is 
    dc_query = deepcopy(query) 
    geom = geometry.Geometry(input_data.geometry.values[index].__geo_interface__, 
                             geometry.CRS('EPSG:6933'))
    q = {"geopolygon":geom}
    #updating the query
    dc_query.update(q)
    
    ds = dc.load(product ="ga_ls8c_wofs_2_annual_summary",
                 measurements = ['frequency'],
                 **dc_query)

    if not 'frequency' in ds:
        pass 
    else:
        freq = ds.frequency.values
        results_freq.update({str(int(plot_id)): float(freq)})

### Parallel Processing 

In [11]:
def _parallel_fun(input_data, query, ncpus):
    
    manager = mp.Manager()
    results_freq= manager.dict()
    
    # progress bar
    pbar = tqdm(total=len(input_data))
        
    def update(*a):
        pbar.update()

    with mp.Pool(ncpus) as pool:
        for index, row in input_data.iterrows():
            pool.apply_async(get_WS_for_point,
                                 [index,
                                 row,
                                 input_data,
                                 query,
                                 results_freq], callback=update)
        pool.close()
        pool.join()
        pbar.close()
        
    return results_freq

### Testing For-Loop 

In [12]:
# results_freq_test = dict()

# for index, row in Summarize.iterrows():
#     get_WS_for_point(index, row, input_data, query, results_freq_test)

In [14]:
#print(results_freq_test)

### Parallel for runnning

In [16]:
# %%time
WOfS_annual_frequency= _parallel_fun(Summarize, query, ncpus=15)

100%|██████████| 537/537 [00:10<00:00, 52.40it/s]


In [17]:
f = pd.DataFrame.from_dict(WOfS_annual_frequency, orient = 'index')
f = f.rename(columns={0:'Frequency'})
for index, row in f.iterrows():
    f.at[index,'PLOT_ID'] = index.split('_')[0]+'.0'
f= f.reset_index(drop=True)
Summarize['PLOT_ID'] = Summarize.PLOT_ID.astype(str)
f['PLOT_ID'] = f.PLOT_ID.astype(str)
final_df = pd.merge(Summarize, f, on=['PLOT_ID'], how='outer')

In [18]:
final_df

Unnamed: 0,PLOT_ID,LON,LAT,CLASS,MONTH,ACTUAL,CLASS_WET,CLEAR_OBS,PREDICTION,geometry,Frequency
0,137482620.0,17.782114,7.802986,Open water - freshwater,12,1,1.0,2.0,1,POINT (1715730.000 992460.000),0.636364
1,137482621.0,17.982660,7.455957,Open water - freshwater,12,1,1.0,1.0,1,POINT (1735080.000 948570.000),0.636364
2,137482622.0,24.357867,6.961847,Open water - freshwater,12,1,0.0,1.0,0,POINT (2350200.000 886020.000),0.000000
3,137482624.0,17.091860,6.464220,Open water - freshwater,12,1,1.0,2.0,1,POINT (1649130.000 822960.000),0.785714
4,137482625.0,12.917691,5.840607,Open water - freshwater,12,1,1.0,1.0,1,POINT (1246380.000 743850.000),1.000000
...,...,...,...,...,...,...,...,...,...,...,...
532,137483525.0,11.249268,6.156226,Open water - freshwater,12,1,1.0,1.0,1,POINT (1085400.000 783900.000),0.900000
533,137483526.0,10.573317,5.746073,Open water - freshwater,12,1,1.0,1.0,1,POINT (1020180.000 731850.000),1.000000
534,137483527.0,8.750363,4.708335,Open water - freshwater,12,1,0.0,1.0,0,POINT (844290.000 600000.000),0.714286
535,137483528.0,9.411390,4.659262,Open water - freshwater,2,1,1.0,1.0,1,POINT (908070.000 593760.000),1.000000


In [19]:
final_df.shape

(537, 11)

In [21]:
final_df.to_csv(('../Supplementary_data/Validation/Refined/Continent/Annual_summary_Test.csv'))

In [19]:
# dc = datacube.Datacube(app='WOfS_accuracy')

In [20]:
# for index, row in final_df[0:20].iterrows():
#     dc_query = deepcopy(query)
#     geom = geometry.Geometry(Summarize.geometry.values[index].__geo_interface__,
#                              geometry.CRS('epsg:6933'))
#     q = {"geopolygon":geom}
#     dc_query.update(q)
#     wofs = dc.load(product ="ga_ls8c_wofs_2_annual_summary", fuse_func=wofs_fuser,**dc_query).squeeze() 
#     Summarize.at[index,'Annual_Summary'] = wofs.frequency.values 

In [None]:
print(datacube.__version__)

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** September 2020

**Compatible datacube version:** 

## Tags
Browse all available tags on the DE Africa User Guide's [Tags Index](https://) (placeholder as this does not exist yet)