# WOfS Validation Count <img align="right" src="../Supplementary_data/DE_Africa_Logo_Stacked_RGB_small.jpg">

* **Products used:** 
[ga_ls8c_wofs_2](https://explorer.digitalearth.africa/ga_ls8c_wofs_2),
[ga_ls8c_wofs_2_summary ](https://explorer.digitalearth.africa/ga_ls8c_wofs_2_summary)

## Background
Accuracy assessment for WOfS product in Africa includes generating a confusion error matrix for a WOFL binary classification.
The inputs for the estimating the accuracy of WOfS derived product are a binary classification WOFL layer showing water/non-water and a shapefile containing validation points collected by [Collect Earth Online](https://collect.earth/) tool. Validation points are the ground truth or actual data while the extracted value for each location from WOFL is the predicted value. A confusion error matrix containing overall, producer's and user's accuracy is the output of this analysis. 

## Description
This notebook explains how you can perform accuracy assessment for WOFS derived product using collected ground truth dataset. 

The notebook demonstrates how to:
1. Generating a confusion error matrix for WOFL binary classification
2. Assessing the accuracy of the classification 
***

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

After finishing the analysis, you can modify some values in the "Analysis parameters" cell and re-run the analysis to load WOFLs for a different location or time period.

### Load packages
Import Python packages that are used for the analysis.

In [1]:
%matplotlib inline

import time 
import datacube
from datacube.utils import masking, geometry 
import sys
import os
import dask 
import rasterio, rasterio.features
import xarray
import glob
import numpy as np
import pandas as pd
import seaborn as sn
import geopandas as gpd
import subprocess as sp
import matplotlib.pyplot as plt
import scipy, scipy.ndimage
import warnings
warnings.filterwarnings("ignore") #this will suppress the warnings for multiple UTM zones in your AOI 

sys.path.append("../Scripts")
from rasterio.mask import mask
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
from sklearn.metrics import confusion_matrix, accuracy_score 
from sklearn.metrics import plot_confusion_matrix, f1_score  
from deafrica_plotting import map_shapefile,display_map, rgb
from deafrica_spatialtools import xr_rasterize
from deafrica_datahandling import wofs_fuser, mostcommon_crs,load_ard,deepcopy
from deafrica_dask import create_local_dask_cluster

### Loading Dataset

Read in the validation data csv, clean the table and rename the column associated with actual and predicted. 

We need to read two columns from this table:
- Water flag as the groundtruth(actual)
- Class Wet from WOfS (prediction)

### Joining CEO tables and Counting the number of Points 

In [2]:
#joining dataframes together and extract one csv for each partner institution 
DF = glob.glob('../Supplementary_data/Validation/Refined/Continent/AEZ_count/CEO_*.csv')
frame = []
for d in DF: 
    f = pd.read_csv(d,delimiter=",")
    frame.append(f)
Africa = pd.concat(frame)

In [3]:
Africa.to_csv(('../Supplementary_data/Validation/Refined/Continent/AEZ_count/Africa_ValidationPoints.csv'))

In [46]:
AfricaCount = Africa.groupby('PLOT_ID',as_index=False,sort=False).last()
#AfricaCount

### Joining WOfS-based Analysed Tables and Counting the number of Points 

In [63]:
#joining dataframes together and extract one csv for each partner institution 
DF2 = glob.glob('../Supplementary_data/Validation/Refined/Continent/AEZ_count/Groundtruth_*.csv')
frame = []
for d in DF2: 
    f = pd.read_csv(d,delimiter=",")
    frame.append(f)
Analysed = pd.concat(frame)

In [64]:
Analysed.to_csv(('../Supplementary_data/Validation/Refined/Continent/AEZ_count/Africa_AnalysedPoints.csv'))

In [65]:
AnalysedCount = Analysed.groupby('PLOT_ID',as_index=False,sort=False).last()
AnalysedCount

Unnamed: 0.1,PLOT_ID,Unnamed: 0,LON,LAT,FLAGGED,ANALYSES,SENTINEL2Y,STARTDATE,ENDDATE,WATER,NO_WATER,BAD_IMAGE,NOT_SURE,CLASS,COMMENT,MONTH,WATERFLAG,geometry,CLASS_WET,CLEAR_OBS
0,137755802.0,5,-0.260555,16.620471,0.0,1.0,2018,1/12/2018,5/12/2018,1351112,4910,2678,0,I am unsure,,5,1,POINT (-25140.0000218165 2091479.999484228),1.0,2.0
1,137755803.0,17,-14.230417,16.545471,0.0,1.0,2018,1/12/2018,5/12/2018,23,691011,1457812,0,I am unsure,,3,1,POINT (-1373039.999854533 2082299.999785511),1.0,2.0
2,137755804.0,26,-15.681815,16.479809,0.0,1.0,2018,1/12/2018,5/12/2018,2347,6,1589101112,0,Wetlands - freshwater,,3,1,POINT (-1513079.999827301 2074259.999913892),1.0,2.0
3,137755805.0,38,-0.115353,16.473684,0.0,1.0,2018,1/12/2018,5/12/2018,131112,4,2678910,5,Wetlands - freshwater,,3,1,POINT (-11130.00002520235 2073510.000079819),1.0,2.0
4,137755806.0,50,-3.217452,16.388211,0.0,1.0,2018,1/12/2018,5/12/2018,1112,45,12679108,3,Wetlands - freshwater,,11,1,POINT (-310439.9999698089 2063039.999883486),1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2539,137387432.0,8627,39.613404,-9.090942,0.0,1.0,2018,,,"1-4,6-12",0,0,5,Open water - marine,,12,1,POINT (3822150.000035984 -1155029.999972825),1.0,1.0
2540,137387433.0,8636,35.464420,-9.505899,0.0,1.0,2018,,,"1,2,4-6,8,10,12",0,3911,7,Open water - freshwater,,12,1,POINT (3421829.999860834 -1207290.000058969),1.0,4.0
2541,137387434.0,8651,33.947728,-9.568113,0.0,1.0,2018,,,2-12,0,1,0,Open water - freshwater,,12,1,POINT (3275489.999618933 -1215120.000045642),1.0,2.0
2542,137387435.0,8657,34.424376,-9.843084,0.0,1.0,2018,,,0,"1,2,7,8-11",345612,0,Forest/woodlands,,9,0,POINT (3321479.999846864 -1249710.000053535),0.0,1.0


### Reading Continental Validation points and Extract Count for the desired AEZ 

In [56]:
ValPoints = '../Supplementary_data/Validation/Refined/Continent/AEZ_count/Africa_ValidationPoints.csv'
df = pd.read_csv(ValPoints,delimiter=",")

In [57]:
geometry = [Point(xy) for xy in zip(df.LON, df.LAT)]
crs = {'init': 'epsg:4326'} 
AfricaValPoints = GeoDataFrame(df, crs=crs, geometry=geometry)

In [58]:
#Defining the agro-echological zone
aez = '../Supplementary_data/Validation/Refined/Continent/shapefile/AEZs_simple_Central.shp'

In [59]:
outline = gpd.read_file(aez).to_crs('EPSG:4326')
outline

Unnamed: 0,Zone,AreaSQKM,geometry
0,Central,5296785.0,"MULTIPOLYGON (((11.77117 -16.79723, 11.76834 -..."


In [60]:
#Clipping the input data with the buffer zone to have all inland points that are not coastal 
Zone_points = gpd.clip(AfricaValPoints, outline)

In [61]:
Zone_points.to_csv(('../Supplementary_data/Validation/Refined/Continent/AEZ_count/ValidationPoints_Central.csv'))

In [62]:
ZoneCount_org = Zone_points.groupby('PLOT_ID',as_index=False,sort=False).last()
ZoneCount_org

Unnamed: 0.2,PLOT_ID,Unnamed: 0,Unnamed: 0.1,LON,LAT,FLAGGED,ANALYSES,SENTINEL2YEAR,STARTDATE,ENDDATE,WATER,NO_WATER,BAD_IMAGE,NOT_SURE,CLASS,COMMENT,MONTH,WATERFLAG,Unnamed: 0.1.1,geometry
0,137387037.0,12,12,29.875854,2.178788,0.0,1.0,2018,,,1-12,0,10,0,Open water - freshwater,Point is within the river channel,10,2,0.0,POINT (29.87585 2.17879)
1,137387038.0,25,25,27.272168,0.893874,0.0,1.0,2018,,,1-12,0,10,0,Open water - freshwater,Point is near the river bank,10,2,1.0,POINT (27.27217 0.89387)
2,137387040.0,38,38,28.071867,-3.271851,0.0,1.0,2018,,,13456712,8-11,2,2,Open water - freshwater,No image for February,2,3,3.0,POINT (28.07187 -3.27185)
3,137387041.0,50,50,28.708019,-5.872515,0.0,1.0,2018,,,3-9,0,0,"1-2,10-12",Open water - freshwater,It is difficult to know the month the water wa...,12,3,4.0,POINT (28.70802 -5.87252)
4,137387042.0,63,63,29.475693,-6.510600,0.0,1.0,2018,,,25678910,0,4,1341112,Open water - marine,"No Images for the month of January, March, Nov...",12,3,5.0,POINT (29.47569 -6.51060)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
426,137711862.0,2744,231,14.238812,-17.359269,0.0,1.0,2018,1/12/2018,5/12/2018,136781112,25,4910,0,Barren (Bare soil / Rocky Land),,10,2,,POINT (14.23881 -17.35927)
427,137711863.0,2756,232,18.526157,-17.518309,0.0,1.0,2018,1/01/2018,5/01/2018,145679101112,0,238,0,Open water - freshwater,,8,2,,POINT (18.52616 -17.51831)
428,137711864.0,2768,233,19.172259,-17.811881,0.0,1.0,2018,1/10/2018,5/10/2018,145679101112,0,238,0,Open water - freshwater,,8,2,,POINT (19.17226 -17.81188)
429,137711865.0,2780,234,19.847278,-17.880227,0.0,1.0,2018,1/05/2018,5/05/2018,125678910,0,341112,0,Open water - freshwater,,12,2,,POINT (19.84728 -17.88023)


### Institution-based Count 

In [34]:
#Read the ground truth data following cleaning step 
CEO = '../Supplementary_data/Validation/Refined/AFRIGIST/CEO_AFRIGIST_2020-09-15.csv'
df1= pd.read_csv(CEO,delimiter=",")

In [35]:
CEOCount= df1.groupby('PLOT_ID',as_index=False,sort=False).last()
CEOCount

Unnamed: 0.1,PLOT_ID,Unnamed: 0,LON,LAT,FLAGGED,ANALYSES,SENTINEL2YEAR,STARTDATE,ENDDATE,WATER,NO_WATER,BAD_IMAGE,NOT_SURE,CLASS,COMMENT,MONTH,WATERFLAG
0,137483175.0,12,30.463813,-26.653807,0.0,1.0,2018,,,1-12,0,2,0,Open water - freshwater,,2,2
1,137483176.0,24,30.026031,-26.673227,0.0,1.0,2018,,,1-12,0,0,0,Open water - Constructed (e.g. aquaculture),,12,1
2,137483177.0,38,31.700362,-26.746737,0.0,1.0,2018,,,1-12,0,12,0,Open water - freshwater,,2,2
3,137483178.0,53,31.937287,-26.801901,0.0,1.0,2018,,,1-12,0,2712,0,Open water - freshwater,,12,2
4,137483179.0,65,27.339949,-26.863925,0.0,1.0,2018,,,1-12,0,0,0,Open water - freshwater,,12,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
612,137482800.0,2571,13.848290,-8.953839,0.0,1.0,2018,1/08/2018,5/08/2018,8,"1-7,9-12","1-7,9-12",0,Open water - freshwater,na,12,2
613,137482801.0,2593,13.023406,-9.025955,0.0,1.0,2018,1/09/2018,5/09/2018,511,"1-4,6-10,12","1-4,6-10,12",0,Wetlands - marine (e.g. mangroves),na,12,2
614,137482802.0,2616,13.607012,-9.155465,0.0,1.0,2018,1/09/2018,5/09/2018,8,"1-7,9-12","1-7,9-12",0,Open water - freshwater,na,12,2
615,137482803.0,2638,20.258942,-9.360062,0.0,1.0,2018,1/09/2018,5/09/2018,1-12,0,"1-4,6-8,10-12",0,Open water - freshwater,,12,2


In [36]:
#Read the ground truth data following analysis step 
WOfSSample = '../Supplementary_data/Validation/Refined/AFRIGIST/Groundtruth_AfriGIST_PointBased_5D.csv'
df2= pd.read_csv(WOfSSample,delimiter=",")

In [37]:
SampleCount= df2.groupby('PLOT_ID',as_index=False,sort=False).last()
SampleCount 

Unnamed: 0.1,PLOT_ID,Unnamed: 0,LON,LAT,FLAGGED,ANALYSES,SENTINEL2Y,STARTDATE,ENDDATE,WATER,NO_WATER,BAD_IMAGE,NOT_SURE,CLASS,COMMENT,MONTH,WATERFLAG,geometry,CLASS_WET,CLEAR_OBS
0,137483175.0,12,30.463813,-26.653807,0.0,1.0,2018,,,1-12,0,2,0,Open water - freshwater,,12,1,POINT (2939339.999593767 -3281940.000625274),1.0,1.0
1,137483176.0,24,30.026031,-26.673227,0.0,1.0,2018,,,1-12,0,0,0,Open water - Constructed (e.g. aquaculture),,12,1,POINT (2897100.000399006 -3284159.999907182),0.0,2.0
2,137483177.0,36,31.700362,-26.746737,0.0,1.0,2018,,,1-12,0,12,0,Open water - freshwater,,10,1,POINT (3058650.000408517 -3292560.000199907),1.0,2.0
3,137483178.0,50,31.937287,-26.801901,0.0,1.0,2018,,,1-12,0,2712,0,Open water - freshwater,,10,1,POINT (3081509.999813744 -3298860.000586511),0.0,2.0
4,137483179.0,65,27.339949,-26.863925,0.0,1.0,2018,,,1-12,0,0,0,Open water - freshwater,,12,1,POINT (2637929.999591611 -3305939.999803136),1.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
594,137482800.0,11166,13.848290,-8.953839,0.0,1.0,2018,1/08/2018,5/08/2018,8,"1-7,9-12","1-7,9-12",0,Open water - freshwater,na,9,0,POINT (1336169.999584316 -1137749.999945619),1.0,1.0
595,137482801.0,11189,13.023406,-9.025955,0.0,1.0,2018,1/09/2018,5/09/2018,511,"1-4,6-10,12","1-4,6-10,12",0,Wetlands - marine (e.g. mangroves),na,9,0,POINT (1256580.000172345 -1146840.00002571),1.0,2.0
596,137482802.0,11211,13.607012,-9.155465,0.0,1.0,2018,1/09/2018,5/09/2018,8,"1-7,9-12","1-7,9-12",0,Open water - freshwater,na,9,0,POINT (1312890.00022547 -1163160.00004016),1.0,1.0
597,137482803.0,11240,20.258942,-9.360062,0.0,1.0,2018,1/09/2018,5/09/2018,1-12,0,"1-4,6-8,10-12",0,Open water - freshwater,,12,1,POINT (1954709.999782347 -1188930.000040025),1.0,2.0


In [38]:
#Read the ground truth data following buffering inland from the coast 
Inland= '../Supplementary_data/Validation/Refined/AFRIGIST/AfriGIST_inland_5D.csv'
df3 = pd.read_csv(Inland,delimiter=",")

In [39]:
InlandCount= df3.groupby('PLOT_ID',as_index=False,sort=False).last()
InlandCount 

Unnamed: 0.1,PLOT_ID,Unnamed: 0,LON,LAT,CLASS,MONTH,ACTUAL,CLASS_WET,CLEAR_OBS,PREDICTION,geometry
0,137483175.0,6,30.463813,-26.653807,Open water - freshwater,12,1,1.0,1.0,1,POINT (30.46381301 -26.65380659)
1,137483176.0,14,30.026031,-26.673227,Open water - Constructed (e.g. aquaculture),12,1,0.0,2.0,0,POINT (30.02603057 -26.67322664)
2,137483177.0,19,31.700362,-26.746737,Open water - freshwater,10,1,1.0,2.0,1,POINT (31.70036188 -26.74673726)
3,137483178.0,24,31.937287,-26.801901,Open water - freshwater,10,1,0.0,2.0,0,POINT (31.93728675 -26.80190076)
4,137483179.0,32,27.339949,-26.863925,Open water - freshwater,12,1,1.0,3.0,1,POINT (27.33994919 -26.86392537)
...,...,...,...,...,...,...,...,...,...,...,...
532,137482799.0,2758,13.254423,-8.829394,Urban (Settlements/ roads),9,0,0.0,1.0,0,POINT (13.25442329 -8.82939446)
533,137482800.0,2762,13.848290,-8.953839,Open water - freshwater,9,0,1.0,1.0,1,POINT (13.8482901 -8.953838786)
534,137482802.0,2766,13.607012,-9.155465,Open water - freshwater,9,0,1.0,1.0,1,POINT (13.60701228 -9.155465)
535,137482803.0,2772,20.258942,-9.360062,Open water - freshwater,12,1,1.0,2.0,1,POINT (20.25894246 -9.360061605)


In [2]:
print(datacube.__version__)

1.8.2.dev7+gdcab0e02


***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** January 2020

**Compatible datacube version:** 

## Tags
Browse all available tags on the DE Africa User Guide's [Tags Index](https://) (placeholder as this does not exist yet)