# Accuracy Assessment of Water Observations from Space (WOfS) Product in Africa<img align="right" src="../Supplementary_data/DE_Africa_Logo_Stacked_RGB_small.jpg">


## Description
Now that we have run WOfS classification for each AEZs in Africa, its time to conduct an accuracy assessment. The data used for assessing the accuracy was collected previously and set aside. It is stored in the Results folder: `Results/WOfS_Assessment/Point_Based/Intermediate_Per_AEZ`.

Accuracy assessment for WOfS product in Africa includes generating a confusion error matrix for a WOFL binary classification.
The inputs for the estimating the accuracy of WOfS derived product are a binary classification WOFL layer showing water/non-water and a shapefile containing validation points collected by [Collect Earth Online](https://collect.earth/) tool. Validation points are the ground truth or actual data while the extracted value for each location from WOFL is the predicted value. 

This notebook will explain how you can perform accuracy assessment for WOfS using collected ground truth dataset. It will output a confusion error matrix containing overall, producer's and user's accuracy, along with the F1 score for each class.


## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

### Load packages
Import Python packages that are used for the analysis.

In [1]:
%matplotlib inline

import sys
import os
import rasterio
import xarray
import glob
import numpy as np
import pandas as pd
import seaborn as sn
import geopandas as gpd
import matplotlib.pyplot as plt
import scipy, scipy.ndimage
import warnings
warnings.filterwarnings("ignore") #this will suppress the warnings for multiple UTM zones in your AOI 

sys.path.append("../Scripts")
from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Point
from sklearn.metrics import confusion_matrix, accuracy_score 
from sklearn.metrics import plot_confusion_matrix, f1_score  
from deafrica_plotting import map_shapefile,display_map, rgb
from deafrica_spatialtools import xr_rasterize
from deafrica_datahandling import wofs_fuser, mostcommon_crs,load_ard,deepcopy
from deafrica_dask import create_local_dask_cluster

### Analysis Parameters 

- CEO : groundtruth points containing both WOfS classes, WOfS clear observations and the assigned label by analyst in each calendar month 
- input_data : dataframe for further analysis and accuracy assessment 

### Load the datasets
Ground truth points 

In [2]:
#Read the ground truth data 
#For each AEZ
CEO = '../Results/WOfS_Assessment/Point_Based/Intermediate_Per_AEZ/ValidationPoints_Central.csv'
df = pd.read_csv(CEO,delimiter=",")

In [3]:
df.columns

In [4]:
#Removing unncessary columns 
#if you encounter an error due to one of these columns not being in your data, remove it from the list of columns to drop 
input_data = df.drop(['Unnamed: 0', 'Unnamed: 0.1','Unnamed: 0.1.1','Unnamed__1','FLAGGED', 'ANALYSES','SENTINEL2Y','STARTDATE', 'ENDDATE', 'WATER', 'NO_WATER', 'BAD_IMAGE', 'NOT_SURE','COMMENT','geometry'], axis=1)
input_data = input_data.rename(columns={'WATERFLAG':'ACTUAL'})

In [5]:
#Counting the number of validation points in AEZ 
countpoints = input_data.groupby('PLOT_ID',as_index=False,sort=False).last()
countpoints

In [6]:
#setting the class_wet column to be prediction  
input_data['PREDICTION'] = input_data['CLASS_WET'].apply(lambda x: '1' if x >=1 else '0')  

In [7]:
#Remove the duplicated plot IDs which means those that are labeled for similar month as 0, 1, 2  or 3.
Duplicate = input_data.duplicated(['LAT', 'LON','MONTH'], keep=False)
input_data = input_data[Duplicate==False]

In [8]:
#Counting the number of duplicates in validation points in AEZ  
countduplicate = input_data.groupby('PLOT_ID',as_index=False,sort=False).last()
countduplicate

In [9]:
#Filter out those rows that are labeled more than 1 or there is no clear WOfS/SCL observations  
indexNames = input_data[(input_data['ACTUAL'] > 1) | (input_data['CLEAR_OBS']==0.0) | (input_data['CLEAR_OBS'].isna())].index
input_data.drop(indexNames, inplace=True)

In [10]:
#Counting the number of remaining points 
countfinal = input_data.groupby('PLOT_ID',as_index=False,sort=False).last()
countfinal

In order to save the table of valid points, the following cell should be run. Otherwise, skip to the next cell.  

In [11]:
input_data.to_csv(('../Results/WOfS_Assessment/Point_Based/ValidPoints_Per_AEZ/ValidationPoints_Central.csv'))

### Create a Confusion Matrix 

In [12]:
confusion_matrix = pd.crosstab(input_data['ACTUAL'],input_data['PREDICTION'],rownames=['ACTUAL'],colnames=['PREDICTION'],margins=True)
confusion_matrix

### Calculate Producer's and User's Accuracy 

`Producer's Accuracy` is the map-maker accuracy showing the probability that a certain class on the ground is classified. Producer's accuracy complements error of omission. 

In [13]:
confusion_matrix["Producer's"] = [confusion_matrix.loc[0][0] / confusion_matrix.loc[0]['All'] * 100, confusion_matrix.loc[1][1] / confusion_matrix.loc[1]['All'] *100, np.nan]
confusion_matrix

`User's Accuracy` is the map-user accuracy showing how often the class on the map will actually be present on the ground. `User's accuracy` shows the reliability. It is calculated based on the total number of correct classification for a particular class over the total number of classified sites.

In [14]:
#In case you received an error in this cell, change the indexing 0 and 1 from string to be a number (remove the quotation mark) 

users_accuracy = pd.Series([confusion_matrix['0'][0] / confusion_matrix['0']['All'] * 100,
                                confusion_matrix['1'][1] / confusion_matrix['1']['All'] * 100]).rename("User's")

confusion_matrix = confusion_matrix.append(users_accuracy)
confusion_matrix 

`Overal Accuracy` shows what proportion of reference(actual) sites mapped correctly.

In [15]:
confusion_matrix.loc["User's", "Producer's"] = (confusion_matrix['0'][0] + confusion_matrix['1'][1]) / confusion_matrix['All']['All'] * 100
confusion_matrix

In [16]:
input_data['PREDICTION'] = input_data['PREDICTION'] .astype(str).astype(int)

The `F1 score` is the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1(perfect precision and recall), and is calculated as:

In [17]:
fscore = pd.Series([(2*(confusion_matrix.loc["User's"][0]*confusion_matrix.loc[0]["Producer's"]) / (confusion_matrix.loc["User's"][0] + confusion_matrix.loc[0]["Producer's"])) / 100,
                   f1_score(input_data['ACTUAL'],input_data['PREDICTION'])]).rename("F-score")
confusion_matrix = confusion_matrix.append(fscore)

In [18]:
confusion_matrix

### Tidy Confusion Matrix 

- Limit decimal places
- Add readable class names 
- Remove non-sensical values 

In [19]:
confusion_matrix = confusion_matrix.round(decimals=2)

In [20]:
confusion_matrix = confusion_matrix.rename(columns={'0':'NoWater','1':'Water', 0:'NoWater',1:'Water','All':'Total'},index={'0':'NoWater','1':'Water',0:'NoWater',1:'Water','All':'Total'})

In [21]:
confusion_matrix

In [22]:
#saving out the confusion matrix 
confusion_matrix.to_csv('../Results/WOfS_Assessment/Point_Based/ConfusionMatrix/Central_confusion_matrix.csv')

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** January 2020

**Compatible datacube version:** 

## Tags
Browse all available tags on the DE Africa User Guide's [Tags Index](https://) (placeholder as this does not exist yet)