# Accuracy Assessments of crop-mask results

This notebook is set up for generating a confusion matrix for a binary classification.  It would require editing to creating a confusion matrix for multi-class classifications.

Inputs will be:

1. `predicted.tif` : a binary classification of crop/no-crop output by the ML script.

2. `cropland_prelim_validation.shp` : a shapefile containing crop/no-crop points to serve as the "ground-truth" dataset

Output will be:
1. A `confusion error matrix` containing overall, Producer's and User's accuracy

In [1]:
import rasterio
import pandas as pd
import numpy as np
import seaborn as sn
import matplotlib.pyplot as plt
import geopandas as gpd

## Analysis Parameters

In [2]:
pred_tif = 'results/predicted_12months.tif'
grd_truth = 'data/training_data/CrowdVal/CrowdVal_kenya_final_points.shp'

### Load the datasets

In [3]:
#ground truth shapefile
ground_truth = gpd.read_file(grd_truth).to_crs('EPSG:6933')

In [4]:
# Clean df if need be
ground_truth = ground_truth.rename(columns={'GRID_CODE': 'Actual'}).drop(['POINTID', 'x', 'y'], axis=1)

#reclassify crops class '4' to 1, other class=0
ground_truth['Actual'] = np.where(ground_truth['Actual'] == 4, 1, 0)
ground_truth.head()

Unnamed: 0,Actual,geometry
0,0,POINT (3316726.660 584092.265)
1,0,POINT (3326768.384 584092.265)
2,0,POINT (3336827.976 584092.265)
3,0,POINT (3346869.699 584092.265)
4,0,POINT (3356929.291 584092.265)


In [5]:
# Raster of predicted classes
prediction = rasterio.open(pred_tif)

### Extract a list of coordinate values

In [6]:
coords = [(x,y) for x, y in zip(ground_truth.geometry.x, ground_truth.geometry.y)]

### Sample the prediction raster at the ground truth coordinates

In [7]:
# Sample the raster at every point location and store values in DataFrame
ground_truth['Prediction'] = [int(x[0]) for x in prediction.sample(coords)]
ground_truth.head()

Unnamed: 0,Actual,geometry,Prediction
0,0,POINT (3316726.660 584092.265),1
1,0,POINT (3326768.384 584092.265),0
2,0,POINT (3336827.976 584092.265),1
3,0,POINT (3346869.699 584092.265),0
4,0,POINT (3356929.291 584092.265),0


---

## Create a confusion matrix

In [8]:
confusion_matrix = pd.crosstab(ground_truth['Actual'],
                               ground_truth['Prediction'],
                               rownames=['Actual'],
                               colnames=['Prediction'],
                               margins=True)

confusion_matrix

Prediction,0,1,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,3127,768,3895
1,398,71,469
All,3525,839,4364


### Calculate User's and Producer's Accuracy

`User's Accuracy`

In [9]:
confusion_matrix["User's"] = [confusion_matrix.loc[0, 0] / confusion_matrix.loc[0, 'All'] * 100,
                              confusion_matrix.loc[1, 1] / confusion_matrix.loc[1, 'All'] * 100,
                              np.nan]

`Producer's Accuracy`

In [10]:
producers_accuracy = pd.Series([confusion_matrix[0][0] / confusion_matrix[0]['All'] * 100,
                                confusion_matrix[1][1] / confusion_matrix[1]['All'] * 100]
                         ).rename("Producer's")

confusion_matrix = confusion_matrix.append(producers_accuracy)

`Overall Accuracy`

In [11]:
confusion_matrix.loc["Producer's", "User's"] = (confusion_matrix.loc[0, 0] + 
                                                confusion_matrix.loc[1, 1]) / confusion_matrix.loc['All', 'All'] * 100

### Tidy Confusion Matrix

* Limit decimal places,
* Add readable class names
* Remove non-sensical values 

In [12]:
# round numbers
confusion_matrix = confusion_matrix.round(decimals=1)

In [13]:
# rename booleans to class names
confusion_matrix = confusion_matrix.rename(columns={0:'Non-crop', 1:'Crop', 'All':'Total'},
                                            index={0:'Non-crop', 1:'Crop', 'All':'Total'})

In [14]:
#remove the nonsensical values in the table
confusion_matrix.loc['Total', "User's"] = '--'
confusion_matrix.loc["Producer's", 'Total'] = '--'

In [15]:
confusion_matrix

Prediction,Non-crop,Crop,Total,User's
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Non-crop,3127.0,768.0,3895,80.3
Crop,398.0,71.0,469,15.1
Total,3525.0,839.0,4364,--
Producer's,88.7,8.5,--,73.3


### Export csv

In [16]:
confusion_matrix.to_csv('results/confusion_matrix.csv')