<div class="alert alert-info">
<u><strong>Authors:</strong></u> <b>Alberto Vavassori</b> (alberto.vavassori@polimi.it), <b>Emanuele Capizzi</b> (emanuele.capizzi@polimi.it) - DICA - Politecnico di Milano - GIS GEOLab <br>
Developed within the LCZ-ODC project, funded by the Italian Space Agency (agreement n. 2022-30-HH.0).
</div>

# LCZ classification accuracy assessment

<a id='TOC_TOP'></a>
Notebook structure:  <br>

[Part 1: Classification accuracy on testing samples](#sec1.0)

 1. [Import testing samples](#sec1)  
 2. [Rasterize testing samples](#sec2)
 3. [Import the classified image to be assessed](#sec3)
 4. [Assess classification accuracy on testing samples](#sec4)
 
[Part 2: Inter-comparison with LCZ Generator product](#sec2.0)

 5. [Accuracy of LCZ Generator product](#sec5)
 6. [Extraction of samples for inter-comparison](#sec6)
 7. [Computation of confusion matrix and consistency metrics](#sec7)
 
<hr>

This Notebook is meant to verify the quality of the classification using testing samples. These samples consist of an external dataset that was not used within the classification step. The testing samples are always defined by the user and can be imported into this Notebook.

### Import libraries

<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
Note: the Notebook relies on the <a href='https://gdal.org/' target='_blank'><em>gdal</em></a> Python library; make sure you have it installed in your environment.
</div>

In [None]:
import warnings
warnings.filterwarnings("ignore")

import glob
import geopandas as gpd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import rasterio
from rasterio.plot import show
from rasterio.mask import mask
from rasterio.windows import Window
from rasterio.enums import Resampling
import numpy as np
import pandas as pd
import seaborn as sns
import ipywidgets as widgets
from osgeo import gdal, ogr, gdalconst, gdal_array, osr
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from shapely.geometry import box

In [None]:
# Import functions and set auto-reload
from functions import *
%load_ext autoreload
%autoreload 2

<div class="alert alert-success">

# <a id='sec1.0'></a> Part 1: Classification accuracy on testing samples
<a id='sec2'></a>
[Back to top](#TOC_TOP)

</div>

## 1. <a id='sec1'></a> Import testing samples
[Back to top](#TOC_TOP)

First, select the PRISMA image date:

In [None]:
date_prisma_w = widgets.Dropdown(
    options=['2023-02-09', '2023-03-22', '2023-04-08', '2023-06-17', '2023-07-10', '2023-08-08'],
    value='2023-02-09',
    description='PRISMA date:',
    disabled=False,
    layout={'width': 'max-content'},
    style = {'description_width': 'initial'}
)
date_prisma_w

In [None]:
sel_prisma_date = date_prisma_w.value
selected_prisma_image = 'PRISMA_outputs/coregistered/PR_'+ sel_prisma_date.replace('-', '') + '_30m.tif'
print(f"The selected date is --> PRISMA: {sel_prisma_date}.")

Set the legend that will be used henceforth for the plots:

In [None]:
legend = {
    2: ['Compact mid-rise', '#D10000'],
    3: ['Compact low-rise', '#CD0000'],
    5: ['Open mid-rise', '#FF6600'],
    6: ['Open low-rise', '#FF9955'],
    8: ['Large low-rise', '#BCBCBC'],
    101: ['Dense trees', '#006A00'],
    102: ['Scattered trees', '#00AA00'],
    104: ['Low plants', '#B9DB79'],
    105: ['Bare rock or paved', '#545454'],
    106: ['Bare soil or sand', '#FBF7AF'],
    107: ['Water', '#6A6AFF']
}

The following function displays the testing samples on a map.

In [None]:
testing_folder = './layers/testing_samples/testing_set_' + sel_prisma_date.replace('-', '') + '.gpkg'
cmm_folder = './layers/CMM.gpkg'

In [None]:
testing, m, shapes = plot_training_samples(testing_folder, cmm_folder, legend)

In [None]:
m

The following function imports the testing samples and computes the area of each LCZ class. The function outputs a plot with the total area of each LCZ class as well as the path to the vector layer that will be used in the following.

<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> 

**Note**:
Testing samples must be stored in `.gpkg` format as vector multi-polygons, and the file must contain a column `LCZ` with integer values corresponding to the LCZ class, as reported in the dictionary `legend`.

</div>
<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> Testing samples are time-dependent, especially for the natural classes; they must be updated if there are changes (e.g. in the land cover).  The following function imports the training samples specific to the selected PRISMA acquisition date.
</div>

In [None]:
vector_LCZ_path = testing_area(sel_prisma_date, legend)

## 2. <a id='sec2'></a>Rasterize testing samples
[Back to top](#TOC_TOP)

</div>
<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> If testing samples are already rasterized, skip this section and go to the next one.
</div>

It is necessary to convert the testing set (provided in vector format as Geopackage) in raster format using the following functions.

In [None]:
raster_reference = 'layers/lcz_generator/comparison_'+ sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip.tif'
output = './layers/testing_samples/testing_set_'+ sel_prisma_date.replace('-', '') + '_10m.tif'
attribute = 'LCZ'
projection = 32632

In [None]:
rasterized_result = rasterize_training(raster_reference, vector_LCZ_path, output, attribute, projection)

In [None]:
#plot_raster_training(output, legend)

In [None]:
bbox = prisma_bbox(selected_prisma_image, sel_prisma_date)

In [None]:
mask_prisma = mask_prisma_image(selected_prisma_image)

In [None]:
study_area = gpd.read_file(bbox)

In [None]:
roi_ds = './layers/testing_samples/testing_set_'+ sel_prisma_date.replace('-', '')+'_10m.tif'
roi_new_path = './layers/testing_samples/testing_set_'+ sel_prisma_date.replace('-', '')+'_10m.tif'
clip_image_study_area(roi_ds, roi_new_path, study_area)

## 3. <a id='sec3'></a>Import the classified image to be assessed
[Back to top](#TOC_TOP)

Select the method used for classification to chose the image that want to be tested:

In [None]:
classification_method = 'RandomForest'
classified_image_path = 'layers/lcz_generator/comparison_'+ sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip.tif'
classified_image = rasterio.open(classified_image_path)
classified_image = classified_image.read()
print(f"Selected image shape: {classified_image.shape}")

In [None]:
plot_classified_image(sel_prisma_date, classified_image, classification_method, legend)

## 4. <a id='sec4'></a>Assess classification accuracy on testing samples
[Back to top](#TOC_TOP)

In this section, some accuracy metrics are computed on the testing samples, specifically:
* accuracy: overall accuracy of the model, i.e. the fraction of the total samples that were correctly classified

$$ \frac{TP+TN}{TP+TN+FP+FN} $$

* precision: fraction of predictions as a positive class were actually positive

$$ \frac{TP}{TP+FP} $$

* recall: fraction of all positive samples that are correctly predicted as positive

$$ \frac{TP}{TP+FN} $$

* f1-score: combination of precision and recall; mathematically it is the harmonic mean of precision and recall

$$\frac{2(precision*recall)}{precision+recall}$$

* support: number of occurrences of each class in the testing sample.

In [None]:
accuracy, confusion, report, report_df = print_accuracy_lczgenerator(classified_image_path, sel_prisma_date, legend)

In [None]:
confusion

In [None]:
report_df

In [None]:
classi = ["Compact Mid-rise","Compact Low-rise","Open Mid-rise","Open Low-rise","Large Low-rise",
          "Dense trees","Scattered trees","Low plants","Bare rock or paved","Bare soil or sand","Water"]

accur_lczgen_9feb = np.array([[9685, 1475,  784,    0,   72,    0,    0,    0,   73,    0,    0],
       [1272, 6292,  394, 1466,  193,    0,    0,    0,    3,    0,    0],
       [1505,  922, 6765, 1092,   27,    0,   15,    0,   41,    0,    0],
       [  95, 1385, 2110, 6732,    2,    0,   68,    2,    0,    0,    0],
       [   0,  585,   66,  160, 8933,    0,    0,    0,  100,  674,    0],
       [   0,    0,   47,    4,    7, 8297, 2047,   39,    0,    0,    0],
       [   0,    0,   55,  591,   44, 1172, 8601,  293,  266,   55,   67],
       [   0,    0,    0,   55,    0,    0,  397, 8608,    0, 1859,    0],
       [   0,   26,    0,  529,   53,    4,   56,  545, 1423,  105,    0],
       [   0,    0,    0,    3,   69,    4,   59, 2455, 1195, 7363,    0],
       [   0,    0,    0,    3,    0,    0,    0,    4,   41,    7, 5788]])
accur_lczgen_9feb = accur_lczgen_9feb / np.sum(accur_lczgen_9feb)*100

accur_lczgen_17jun = np.array([[ 9872,1223,840,43,34,0,0,0,    77,0,     0],
       [  925,  6083,   521,  1733,   306,     0,     7,     0,    45,0,     0],
       [ 1551,   925,  6981,   873,    33,     0,     4,     0,     0,0,     0],
       [   70,  1456,  1962,  6825,     0,     0,    81,     0,     0,0,     0],
       [   13,   535,    56,   261,  8408,   100,   143,   393,   215,2245,     0],
       [    0,     0,    47,     4,     0,  5365,  2227,    11,     0,9,     0],
       [    0,     0,     0,   463,    55,  1328,  8530,   281,   270,194,    23],
       [    0,     0,     0,     3,     0,     9,   517, 13946,  1252,329,     0],
       [   16,    41,    16,   283,     1,     4,    93,   588,  1653,46,     0],
       [    0,    12,     0,   230,   319,     0,   399,  7378,    90,4029,     0],
       [    0,     0,     2,     1,     0,     0,     5,     1,    38,18,  5778]])
accur_lczgen_17jun = accur_lczgen_17jun / np.sum(accur_lczgen_17jun)*100

accur_lczgen_8aug = np.array([[11850,     8,   224,     7,     0,     0,     0,     0,     0,0,     0],
       [  156,  7367,   323,   198,   135,     0,     0,     0,     0,0,     0],
       [  105,    11, 10192,    19,    14,    20,     6,     0,     0,0,     0],
       [   20,   119,   335,  8145,     0,   101,    16,     0,     0,0,     0],
       [    8,    63,    67,    56, 10324,     0,     0,     0,     0,0,     0],
       [    0,     0,    13,     0,     3,  4840,   131,     0,     0,0,     0],
       [    0,    10,    53,     8,    56,   428, 10993,   106,    48,0,     6],
       [    0,     0,     0,     6,     0,     0,  2617, 17205,     0,1500,     0],
       [ 2732,   410,   450,    20,   334,     0,    89,   596,   718,29,     0],
       [    0,    36,     4,     0,   256,     0,  1249,  3233,     0,5308,     0],
       [    0,     0,     0,     0,     0,     0,    48,    38,     2,0,  4548]])
accur_lczgen_8aug = accur_lczgen_8aug / np.sum(accur_lczgen_8aug)*100

In [None]:
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="white")

# Create a DataFrame from the matrix
accuracy_prisma = pd.DataFrame(data = accur_lczgen_9feb)

# Generate a mask for the upper triangle

# Set up the matplotlib figure
f, ax1 = plt.subplots(figsize=(10, 10))
plt.tight_layout()

#cbar_ax = f.add_axes([0.25, 0.01, 0.5, 0.02])  # Adjust the position and size as needed

# Generate a custom diverging colormap
#cmap = sns.diverging_palette(230, 20, as_cmap=True)
cmap = 'Greens'
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(accur_lczgen_9feb, ax = ax1, cmap=cmap, vmin = 0, vmax = 14,
            square=True, linewidths=.5, cbar_kws={"shrink": .5, "use_gridspec": True}, #, "orientation": "horizontal"
            annot=True, fmt=".1f", annot_kws={"fontsize": 13, "color": 'black'}) #, cbar_ax = cbar_ax

# Set x and y ticks
ax1.set_xticks(np.arange(len(classi)) + 0.5)
ax1.set_yticks(np.arange(len(classi)) + 0.5)
ax1.set_ylabel("Testing samples", fontsize = 15)
ax1.set_xlabel("LCZ Generator map", fontsize = 15)

# Set x and y tick labels
ax1.set_xticklabels(classi, rotation = 90, ha = "right", fontsize = 15)
ax1.set_yticklabels(classi, rotation = 0, va = "center", fontsize = 15)
ax1.set_title('Confusion matrix computed from testing samples\nLCZ Generator', fontweight = 'bold', fontsize = 15)

plt.subplots_adjust(left = 0.30, right = 0.9, bottom = 0.2, top = 0.9)
plt.savefig('classified_images/accuracy_LCZgen_feb_test_samples.png', dpi = 300)
plt.show()

In [None]:
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="white")

# Create a DataFrame from the matrix
accuracy_prisma = pd.DataFrame(data = accur_lczgen_17jun)

# Generate a mask for the upper triangle

# Set up the matplotlib figure
f, ax1 = plt.subplots(figsize=(10, 10))
plt.tight_layout()

#cbar_ax = f.add_axes([0.25, 0.01, 0.5, 0.02])  # Adjust the position and size as needed

# Generate a custom diverging colormap
#cmap = sns.diverging_palette(230, 20, as_cmap=True)
cmap = 'Greens'
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(accur_lczgen_17jun, ax = ax1, cmap=cmap, vmin = 0, vmax = 14,
            square=True, linewidths=.5, cbar_kws={"shrink": .5, "use_gridspec": True}, #, "orientation": "horizontal"
            annot=True, fmt=".1f", annot_kws={"fontsize": 13, "color": 'black'}) #, cbar_ax = cbar_ax

# Set x and y ticks
ax1.set_xticks(np.arange(len(classi)) + 0.5)
ax1.set_yticks(np.arange(len(classi)) + 0.5)
ax1.set_ylabel("Testing samples", fontsize = 15)
ax1.set_xlabel("LCZ Generator map", fontsize = 15)

# Set x and y tick labels
ax1.set_xticklabels(classi, rotation = 90, ha = "right", fontsize = 15)
ax1.set_yticklabels(classi, rotation = 0, va = "center", fontsize = 15)
ax1.set_title('Confusion matrix computed from testing samples\nLCZ Generator', fontweight = 'bold', fontsize = 15)

plt.subplots_adjust(left = 0.30, right = 0.9, bottom = 0.2, top = 0.9)
plt.savefig('classified_images/accuracy_LCZgen_jun_test_samples.png', dpi = 300)
plt.show()