<div class="alert alert-info">
<u><strong>Authors:</strong></u> <b>Alberto Vavassori</b> (alberto.vavassori@polimi.it), <b>Emanuele Capizzi</b> (emanuele.capizzi@polimi.it) - DICA - Politecnico di Milano - GIS GEOLab <br>
Developed within the LCZ-ODC project, funded by the Italian Space Agency (agreement n. 2022-30-HH.0).
</div>

# LCZ classification accuracy assessment

<a id='TOC_TOP'></a>
Notebook structure:  <br>

[Part 1: Classification accuracy on testing samples](#sec1.0)

 1. [Import testing samples](#sec1)  
 2. [Rasterize testing samples](#sec2)
 3. [Import the classified image to be assessed](#sec3)
 4. [Assess classification accuracy on testing samples](#sec4)
 
[Part 2: Inter-comparison with LCZ Generator product](#sec2.0)

 5. [Accuracy of LCZ Generator product](#sec5)
 6. [Extraction of samples for inter-comparison](#sec6)
 7. [Computation of confusion matrix and consistency metrics](#sec7)
 
<hr>

This Notebook is meant to verify the quality of the classification using testing samples. These samples consist of an external dataset that was not used within the classification step. The testing samples are always defined by the user and can be imported into this Notebook.

### Import libraries

<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
Note: the Notebook relies on the <a href='https://gdal.org/' target='_blank'><em>gdal</em></a> Python library; make sure you have it installed in your environment.
</div>

In [None]:
import warnings
warnings.filterwarnings("ignore")

import glob
import geopandas as gpd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import rasterio
from rasterio.plot import show
from rasterio.mask import mask
from rasterio.windows import Window
from rasterio.enums import Resampling
import numpy as np
import pandas as pd
import seaborn as sns
import ipywidgets as widgets
from osgeo import gdal, ogr, gdalconst, gdal_array, osr
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from shapely.geometry import box

In [None]:
# Import functions and set auto-reload
from functions import *
%load_ext autoreload
%autoreload 2

<div class="alert alert-success">

# <a id='sec1.0'></a> Part 1: Classification accuracy on testing samples
<a id='sec2'></a>
[Back to top](#TOC_TOP)

</div>

## 1. <a id='sec1'></a> Import testing samples
[Back to top](#TOC_TOP)

First, select the PRISMA image date:

In [None]:
date_prisma_w = widgets.Dropdown(
    options=['2023-02-09', '2023-03-22', '2023-04-08', '2023-06-17', '2023-07-10', '2023-08-08'],
    value='2023-02-09',
    description='PRISMA date:',
    disabled=False,
    layout={'width': 'max-content'},
    style = {'description_width': 'initial'}
)
date_prisma_w

In [None]:
sel_prisma_date = date_prisma_w.value
selected_prisma_image = 'PRISMA_outputs/coregistered/PR_'+ sel_prisma_date.replace('-', '') + '_30m.tif'
print(f"The selected date is --> PRISMA: {sel_prisma_date}.")

Set the legend that will be used henceforth for the plots:

In [None]:
legend = {
    2: ['Compact mid-rise', '#D10000'],
    3: ['Compact low-rise', '#CD0000'],
    5: ['Open mid-rise', '#FF6600'],
    6: ['Open low-rise', '#FF9955'],
    8: ['Large low-rise', '#BCBCBC'],
    101: ['Dense trees', '#006A00'],
    102: ['Scattered trees', '#00AA00'],
    104: ['Low plants', '#B9DB79'],
    105: ['Bare rock or paved', '#545454'],
    106: ['Bare soil or sand', '#FBF7AF'],
    107: ['Water', '#6A6AFF']
}

The following function displays the testing samples on a map.

In [None]:
testing_folder = './layers/testing_samples/testing_set_' + sel_prisma_date.replace('-', '') + '.gpkg'
cmm_folder = './layers/CMM.gpkg'

In [None]:
testing, m, shapes = plot_training_samples(testing_folder, cmm_folder, legend)

In [None]:
m

The following function imports the testing samples and computes the area of each LCZ class. The function outputs a plot with the total area of each LCZ class as well as the path to the vector layer that will be used in the following.

<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> 

**Note**:
Testing samples must be stored in `.gpkg` format as vector multi-polygons, and the file must contain a column `LCZ` with integer values corresponding to the LCZ class, as reported in the dictionary `legend`.

</div>
<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> Testing samples are time-dependent, especially for the natural classes; they must be updated if there are changes (e.g. in the land cover).  The following function imports the training samples specific to the selected PRISMA acquisition date.
</div>

In [None]:
vector_LCZ_path = testing_area(sel_prisma_date, legend)

## 2. <a id='sec2'></a>Rasterize testing samples
[Back to top](#TOC_TOP)

</div>
<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
<a id='warning'></a> If testing samples are already rasterized, skip this section and go to the next one.
</div>

It is necessary to convert the testing set (provided in vector format as Geopackage) in raster format using the following functions.

In [None]:
raster_reference = 'PCs/PCs_'+ sel_prisma_date.replace('-', '') +'_30m.tif'
output = './layers/testing_samples/testing_set_'+ sel_prisma_date.replace('-', '') + '_30m.tif'
attribute = 'LCZ'
projection = 32632

In [None]:
rasterized_result = rasterize_training('PCs/PCs_' + sel_prisma_date.replace('-', '') + '_30m.tif', vector_LCZ_path, output, attribute, projection)

In [None]:
#plot_raster_training(output, legend)

In [None]:
bbox = prisma_bbox(selected_prisma_image, sel_prisma_date)

In [None]:
mask_prisma = mask_prisma_image(selected_prisma_image)

In [None]:
study_area = gpd.read_file(bbox)

In [None]:
roi_ds = './layers/testing_samples/testing_set_'+ sel_prisma_date.replace('-', '')+'_30m.tif'
roi_new_path = './layers/testing_samples/testing_set_'+ sel_prisma_date.replace('-', '')+'_30m.tif'
clip_image_study_area(roi_ds, roi_new_path, study_area)

## 3. <a id='sec3'></a>Import the classified image to be assessed
[Back to top](#TOC_TOP)

Select the method used for classification to chose the image that want to be tested:

In [None]:
classification_method_w = widgets.RadioButtons(
    options=['RandomForest', 'XGBoost', 'AdaBoost', 'GradientBoost'],
    value='RandomForest',
    layout={'width': 'max-content'},
    description='Classifier:',
    disabled=False
)
classification_method_w

In [None]:
classification_method = classification_method_w.value
print(f'Selected classification method: {classification_method}')

In [None]:
print('Selected image: classified_images/classified_' + classification_method + '_' + sel_prisma_date.replace('-', '') + '_medianfilter_30m.tif')
classified_image = rasterio.open('classified_images/classified_' + classification_method + '_' + sel_prisma_date.replace('-', '') + '_medianfilter_30m.tif')
classified_image = classified_image.read()
print(f"Selected image shape: {classified_image.shape}")

In [None]:
plot_classified_image(sel_prisma_date, classified_image, classification_method, legend)

## 4. <a id='sec4'></a>Assess classification accuracy on testing samples
[Back to top](#TOC_TOP)

In this section, some accuracy metrics are computed on the testing samples, specifically:
* accuracy: overall accuracy of the model, i.e. the fraction of the total samples that were correctly classified

$$ \frac{TP+TN}{TP+TN+FP+FN} $$

* precision: fraction of predictions as a positive class were actually positive

$$ \frac{TP}{TP+FP} $$

* recall: fraction of all positive samples that are correctly predicted as positive

$$ \frac{TP}{TP+FN} $$

* f1-score: combination of precision and recall; mathematically it is the harmonic mean of precision and recall

$$\frac{2(precision*recall)}{precision+recall}$$

* support: number of occurrences of each class in the testing sample.

In [None]:
accuracy, confusion, report, report_df = print_accuracy(classification_method, sel_prisma_date, legend)

In [None]:
confusion

In [None]:
report_df

<div class="alert alert-success">

# <a id='sec2.0'></a> Part 2: Inter-comparison with LCZ Generator product
<a id='sec2'></a>
[Back to top](#TOC_TOP)

</div>

## 5. <a id='sec5'></a> Accuracy of LCZ Generator product
[Back to top](#TOC_TOP)

The accuracy of the LCZ Generator product is evaluated internally, and it is provided as output. Here we read the confusion matrix from the csv file, and we compute accuracy metrics.

In [None]:
#directory = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/data/*cm_average_formatted.csv'
#confusion_matrix_path = glob.glob(directory)[0]

In [None]:
#report = lcz_generator_accuracy(confusion_matrix_path, legend)

## 6. <a id='sec6'></a>Extraction of samples for inter-comparison with LCZ Generator product
[Back to top](#TOC_TOP)

<div class="alert alert-warning" role="alert">
<span>&#9888;</span>
This section needs some pre-processing that can be performed entirely in QGIS. Specifically:
</div>

* The two maps have to be resampled to 10m resolution in QGIS, given that they have different spatial resolution (100m and 30m).
* The LCZ Generator map must be aligned to the PRISMA map.
* The two classified maps must finally be post-processed with a median filter of 9 pixels (the size is coherent with the 3 pixel window size that is used for post-processing the PRISMA map in the LCZ-ODC project workflow).
* Finally, the LCZ Generator map must be reclassified so that the LCZ classes are encoded as in the PRISMA map.

Once the above pre-processing is done, clip the LCZ Generator map to the extent of the PRISMA map (this is crucial because the PRISMA map is rotated with respect to the other one due to the acquisition mode of the satellite).

In [None]:
# Paths to your input rasters (pre-processed in QGIS)
lcz_generator_map_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass.tif'
prisma_map_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9.tif'

In [None]:
with rasterio.open(lcz_generator_map_path) as src1:
    lcz_generator_map = src1.read()
    lcz_generator_map_profile = src1.profile

In [None]:
with rasterio.open(prisma_map_path) as src2:
    prisma_map = src2.read()
    prisma_map_profile = src2.profile

In [None]:
prisma_map[prisma_map < 0] = np.nan
nan_indices = np.isnan(prisma_map)
lcz_generator_map[nan_indices] = np.nan

In [None]:
with rasterio.open('./layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip.tif', 'w', **lcz_generator_map_profile) as dst:
    dst.write(lcz_generator_map)

In [None]:
with rasterio.open('./layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass.tif', 'w', **lcz_generator_map_profile) as dst:
    dst.write(prisma_map)

The inter-comparison is hereafter carried out following 3 sampling schemas.

<u>Sampling schema: random sampling with fixed pixel number<u>

This first sampling schema considers a fixed number of pixels, calculated using the Cochran's formula for large populations.
Considering a precision level of +/-3%, a confidence level of 98%, and an estimated proportion of 0.5, the appropriate sample size is about 1500.
In the random sampling, the same number of pixels is extracted out of each class, equal to 1500/n (n being the number of classes). In this case, n = 11.

In [None]:
raster1_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip.tif'
raster2_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass.tif'

In [None]:
output_raster1_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip_pixels.tif'
output_raster2_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_pixels.tif'
output_raster3_path = 

In [None]:
pixels_number = 1500

In [None]:
random_sampling(raster1_path, raster2_path, output_raster1_path, output_raster2_path, pixels_number, legend.keys())

<u>Sampling schema: stratified random sampling with fixed pixel number<u>

A second sampling schema considers a fixed number of total pixels, calculated using the Cochran's formula for large populations, and subdivides it to each class (or stratum) according to its size (number of pixels). Considering a precision level of +/-3%, a confidence level of 98%, and an estimated proportion of 0.5, the appropriate sample size is about 1500. In the stratified random sampling, the number of pixels to be extracted per class is computed as:

$$ n_h = \frac{N_h}{N} n $$

where $n_h$ is the sample size of the h-th class, $N_h$ is the population size of the h-th class, $N$ is the size of the entire population (number of pixels in the raster, excluding nans) and $n$ is the size of the entire sample (1500).

In [None]:
raster1_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip.tif'
raster2_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass.tif'

In [None]:
output_raster1_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip_pixels_strat.tif'
output_raster2_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_pixels_strat.tif'

In [None]:
# n: size of the entire sample
n = 1500
pixels_number = n

In [None]:
stratified_sampling(raster1_path, raster2_path, output_raster1_path, output_raster2_path, pixels_number, legend.keys())

<u>Sampling schema: take all pixels of the maps<u>

A third possibility consists in using all the pixels of both rasters for the comparison. If this is the case, just put as input in the below function to outputs of the first pre-processing, namely:

In [None]:
# raster1_path = './layers/lcz_generator/comparison_20230209/lcz_generator_map_20230209_10m_align_medianfilter9_reclass_clip.tif'
# raster2_path = './layers/lcz_generator/comparison_20230209/prisma_20230209_10m_align_medianfilter9_reclass.tif'

## 7. <a id='sec7'></a>Computation of confusion matrix and consistency metrics
[Back to top](#TOC_TOP)

In [None]:
ref_raster_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/lcz_generator_map_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_clip_pixels.tif'
raster_path = './layers/lcz_generator/comparison_' + sel_prisma_date.replace('-', '') + '/prisma_' + sel_prisma_date.replace('-', '') + '_10m_align_medianfilter9_reclass_pixels.tif'

In [None]:
accuracy, confusion, report, report_df = inter_comparison(raster_path, ref_raster_path, legend)

In [None]:
report_df

In [None]:
confusion

In [None]:
#####################INTER-COMPARISON WITH LCZ-GENERATOR

In [None]:
classi = ["Compact Mid-rise","Compact Low-rise","Open Mid-rise","Open Low-rise","Large Low-rise",
          "Dense trees","Scattered trees","Low plants","Bare rock or paved","Bare soil or sand","Water"]

accur_prisma_9feb = np.array([[1229,   23,   73,    0,    8,    0,    0,    0,    0,    0,    0],
       [ 114,  785,   75,   92,    4,    0,    0,    0,    0,    0,    0],
       [ 125,   37,  970,   23,    0,    0,    0,    0,    0,    0,    0],
       [   0,  176,   81,  864,    0,    0,   19,    0,   15,    0,    0],
       [   0,    0,    0,    1, 1169,    0,    0,    0,    0,    0,    0],
       [   0,    0,    1,    7,    0,  912,  242,    0,    0,    0,    0],
       [   2,    0,    8,   22,    6,   46, 1149,    0,    4,    0,    0],
       [   0,    0,    0,    0,    0,    0,    8, 1212,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0,    4,    1,  296,    1,    0],
       [   0,    0,    0,    0,    6,    0,    0,    1,    0, 1233,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,  643]])
accur_prisma_9feb = accur_prisma_9feb / np.sum(accur_prisma_9feb)*100

accur_prisma_17jun = np.array([[1187,   17,  133,    0,    4,    0,    0,    0,    0,    0,    0],
       [  82,  779,   70,  126,   11,    0,    0,    0,    0,    0,    0],
       [ 186,   26,  904,   35,    2,    0,    0,    0,    0,    0,    0],
       [   1,  185,   46,  898,    0,    0,   14,    0,    2,    1,    0],
       [   0,    2,    9,    1, 1160,    0,    0,    0,    0,    1,    0],
       [   0,    0,    1,    7,    0,  626,  218,    5,    0,    0,    0],
       [   0,    0,    4,   29,    2,   10, 1147,   22,   21,    5,    0],
       [   0,    0,    0,    0,    0,    0,    2, 1650,    0,  139,    1],
       [   0,    0,    0,    4,    0,    0,    0,    1,  315,    6,    0],
       [   0,    0,    0,    0,    2,    0,    0,    5,    0, 1302,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,  648]])
accur_prisma_17jun = accur_prisma_17jun / np.sum(accur_prisma_17jun)*100

accur_prisma_8aug = np.array([[1177,   42,  114,    0,    4,    0,    0,    0,    0,    0,    0],
       [  65,  664,  103,   45,   28,    0,    0,    0,    0,    0,    0],
       [ 188,   20,  923,   17,    0,    0,    0,    0,    0,    0,    0],
       [   0,  168,  142,  630,    1,    0,   17,    0,   12,    0,    0],
       [   0,    5,   18,    2, 1147,    0,    0,    1,    0,    0,    0],
       [   0,    0,    3,    7,    0,  215,  320,    1,    0,    0,    0],
       [   0,    0,    8,   21,    3,   85, 1132,   37,    3,    0,    4],
       [   0,    0,    0,    0,    0,    0,    4, 2307,    0,   56,    0],
       [   0,    0,    0,    1,   13,    0,    3,    0,  576,    1,    0],
       [   0,    0,    0,    0,   12,    0,    0,    8,    4, 1104,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,  521]])
accur_prisma_8aug = accur_prisma_8aug / np.sum(accur_prisma_8aug)*100

In [None]:
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="white")

# Create a DataFrame from the matrix
accuracy_prisma = pd.DataFrame(data = accur_prisma_9feb)

# Generate a mask for the upper triangle

# Set up the matplotlib figure
f, ax1 = plt.subplots(figsize=(10, 10))
plt.tight_layout()

#cbar_ax = f.add_axes([0.25, 0.01, 0.5, 0.02])  # Adjust the position and size as needed

# Generate a custom diverging colormap
#cmap = sns.diverging_palette(230, 20, as_cmap=True)
cmap = 'Greens'
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(accur_prisma_9feb, ax = ax1, cmap=cmap, vmin = 0, vmax = 14,
            square=True, linewidths=.5, cbar_kws={"shrink": .5, "use_gridspec": True}, #, "orientation": "horizontal"
            annot=True, fmt=".1f", annot_kws={"fontsize": 13, "color": 'black'}) #, cbar_ax = cbar_ax

# Set x and y ticks
ax1.set_xticks(np.arange(len(classi)) + 0.5)
ax1.set_yticks(np.arange(len(classi)) + 0.5)
ax1.set_ylabel("Testing samples", fontsize = 15)
ax1.set_xlabel("PRISMA map", fontsize = 15)

# Set x and y tick labels
ax1.set_xticklabels(classi, rotation = 90, ha = "right", fontsize = 15)
ax1.set_yticklabels(classi, rotation = 0, va = "center", fontsize = 15)
ax1.set_title('Confusion matrix computed from testing samples\nPRISMA', fontweight = 'bold', fontsize = 15)

plt.subplots_adjust(left = 0.30, right = 0.9, bottom = 0.2, top = 0.9)
plt.savefig('classified_images/accuracy_prisma_feb_test_samples.png', dpi = 300)
plt.show()

In [None]:
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="white")

# Create a DataFrame from the matrix
accuracy_prisma = pd.DataFrame(data = accur_prisma_17jun)

# Generate a mask for the upper triangle

# Set up the matplotlib figure
f, ax1 = plt.subplots(figsize=(10, 10))
plt.tight_layout()

#cbar_ax = f.add_axes([0.25, 0.01, 0.5, 0.02])  # Adjust the position and size as needed

# Generate a custom diverging colormap
#cmap = sns.diverging_palette(230, 20, as_cmap=True)
cmap = 'Greens'
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(accur_prisma_17jun, ax = ax1, cmap=cmap, vmin = 0, vmax = 14,
            square=True, linewidths=.5, cbar_kws={"shrink": .5, "use_gridspec": True}, #, "orientation": "horizontal"
            annot=True, fmt=".1f", annot_kws={"fontsize": 13, "color": 'black'}) #, cbar_ax = cbar_ax

# Set x and y ticks
ax1.set_xticks(np.arange(len(classi)) + 0.5)
ax1.set_yticks(np.arange(len(classi)) + 0.5)
ax1.set_ylabel("Testing samples", fontsize = 15)
ax1.set_xlabel("PRISMA map", fontsize = 15)

# Set x and y tick labels
ax1.set_xticklabels(classi, rotation = 90, ha = "right", fontsize = 15)
ax1.set_yticklabels(classi, rotation = 0, va = "center", fontsize = 15)
ax1.set_title('Confusion matrix computed from testing samples\nPRISMA', fontweight = 'bold', fontsize = 15)

plt.subplots_adjust(left = 0.30, right = 0.9, bottom = 0.2, top = 0.9)
plt.savefig('classified_images/accuracy_prisma_jun_test_samples.png', dpi = 300)
plt.show()