# Clean and Extract Training Data


## Background

It is not uncommon that existing training data were collected at different time period than the study period. This means the dataset may not reflect the real ground cover due to temporal changes. FAO adopted a training data filtering method for any given reference year that is within a time span (e.g. 5 years) from an existing baseline, and tested the method in the production of land cover mapping for Lesotho. It is assumed that the majority of reference labels will remain valid from one year to the previous/next. Based on this assumption, the reference labels which have changed are the minority, and should be detectable through the use of outlier detection methods like K-Means clustering. More details on the method and how it works for Lesotho can be found in the published paper ([De Simone et al 2022](https://www.mdpi.com/2072-4292/14/14/3294)).

## Description

This notebook will implement filtering of extracted training data on a per-class basis for a target year using K-Means clustering and a baseline land cover map. The steps include:
1. Load extracted training features
2. Collect stratified random samples and extract features using `random_sampling` and `collect_training_data`
3. Train K-Means models using the features of the random samples
4. Apply clustering on training points and filter out those unlikely to be valid for the target year
5. Export the filtered training data to disk for use in subsequent scripts

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell.

### Load packages


In [None]:
%matplotlib inline
import os
import datacube
import warnings
import numpy as np
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
import rioxarray
from odc.io.cgroups import get_cpu_quota
from odc.algo import xr_geomedian
from deafrica_tools.datahandling import load_ard
from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.classification import collect_training_data
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from rasterio.enums import Resampling
from random_sampling import random_sampling # adapted from function by Chad Burton: https://gist.github.com/cbur24/04760d645aa123a3b1817b07786e7d9f

We now find the number of CPUs available in your environment for paralell processing.

In [None]:
# file paths and attributes
traning_points_path = 'Data/trainning_samples_FNDS_II_SOM_2016.geojson'
rf2017_path='Data/Landcover_map_ODC_Brazil_2015_2016.tif'
tiles_shp='Data/Mozambique_tiles_biggest1.shp'
class_name = 'LC_Class_I' # class label in integer format
crs='epsg:32736' # WGS84/UTM Zone 36S

# Load reference land cover survey points and reproject
training_data2017= gpd.read_file(traning_points_path).to_crs(crs) # read training points as geopandas dataframe
training_data2017=training_data2017[[class_name,'geometry']] # select attributes
print('land cover survey points 2017:\n',training_data2017)

# get bounding boxes of tiles
tiles=gpd.read_file(tiles_shp).to_crs(crs)
tile_bboxes=tiles.bounds
print('tile boundaries for Mozambique: \n',tile_bboxes)

The method also requires a baseline land cover map as a stratification layer to sample and train the K-Means model. In this example, we are using the existing national land cover map in 2016:

In [None]:
# load initial classification map
rf_2017_raster = xr.open_dataset(rf2017_path,engine="rasterio").astype(np.uint8).squeeze("band", drop=True)
# # reproject the raster
# rf_2017_raster= rf_2017_raster.rio.reproject(resolution=10, dst_crs=crs,resampling=Resampling.nearest)
rf_2017_raster=rf_2017_raster.band_data
print('Reference land cover classifcation raster:\n',rf_2017_raster) # note: 255 is nodata

In [2]:
lc_classes=training_data2017[class_name].unique() # get class labels
print('land cover classes:\n',lc_classes)
n_samples=1000 # number of random samples to optimise number of clusters for kmeans
zonal_stats = None
scaler = StandardScaler() # standard scaler for input data standardisation
frequency_threshold=0.05 # threshold of cluter frequency
fill_nan_value=-999 # value to replace nans in query results
measurements = ['blue','green','red','red_edge_1','red_edge_2', 'red_edge_3','nir_1','nir_2','swir_1','swir_2']
query = {
    'time': ('2021-01', '2021-12'),
    'measurements': measurements,
    'output_crs': crs,
    'resolution': (-10, 10)
}
# define a function to feature layers
def feature_layers(query): 
    #connect to the datacube
    dc = datacube.Datacube(app='feature_layers')
    ds = load_ard(dc=dc,
                  products=['s2_l2a'],
                  group_by='solar_day',
                  verbose=False,
#                   mask_filters=[("opening", 2)], # morphological opening by 2 pixels to remove small masked regions
                  **query)
    ds = calculate_indices(ds,
                           index=['NDVI'],
                           drop=False,
                           satellite_mission='s2')
    # interpolate nodata using mean of previous and next observation
#     ds=ds.interpolate_na(dim='time',method='linear',use_coordinate=False,fill_value='extrapolate')
#     ds=ds.interpolate_na(dim='time',method='linear',use_coordinate=False)
    # calculate geomedians within each two-month interval
    ds=ds.resample(time='2MS').map(xr_geomedian)
    # replace nan with a value so that the collect_training_data function will work
#     ds=ds.fillna(fill_nan_value)
    # stack multi-temporal measurements and rename them
    n_time=ds.dims['time']
    list_measurements=list(ds.keys())
    ds_stacked=None
    for j in range(len(list_measurements)):
        for k in range(n_time):
            variable_name=list_measurements[j]+'_'+str(k)
            # print ('Stacking band ',list_measurements[j],' at time ',k)
            measure_single=ds[list_measurements[j]].isel(time=k).rename(variable_name)
            if ds_stacked is None:
                ds_stacked=measure_single
            else:
                ds_stacked=xr.merge([ds_stacked,measure_single],compat='override')
    return ds_stacked

land cover classes:
 [3 5 1 2 4]


In [7]:
td2021_filtered=None # filtered training data
# filtering training data for each class
# for i in lc_classes[8:]:
for i in lc_classes:
    #i=1 # test for first class
    print('Processing class ',i)
    gpd_samples=None
    n_total=np.sum(rf_2017_raster.to_numpy()==i)
    # generate randomly sampled data to fit and optimise a kmeans clusterer
    for n in range(len(tile_bboxes)):
        print('stratified random sampling from tile ',n)
        da_mask=rf_2017_raster.rio.clip([tiles.iloc[n].geometry],crs=crs,drop=True)
        da_mask=da_mask.rio.reproject(dst_crs=crs,resampling=Resampling.nearest)
        n_samples_tile=n_samples*np.sum(da_mask.to_numpy()==i)/n_total
        gpd_samples_tile=random_sampling(da_mask,n_samples_tile,sampling='manual',
                                         manual_class_ratios={str(i):n_samples_tile},out_fname=None)
        if gpd_samples is None:
            gpd_samples=gpd_samples_tile
        else:
            gpd_samples=pd.concat([gpd_samples,gpd_samples_tile])
    # get data array
#     da_mask=da_mask.where(da_mask==i,np.nan) # replace other class values as nan so they won't be sampled (comment due to large memory required)
#     gpd_samples=random_sampling(da_mask,n_samples,sampling='stratified_random',manual_class_ratios=None,out_fname=None)
#     gpd_samples=random_sampling(da_mask,n_samples,sampling='manual',manual_class_ratios={str(i):n_samples},out_fname=None)
    gpd_samples=gpd_samples.reset_index(drop=True).drop(columns=['spatial_ref','class']) # drop this attribute derived from random_sampling function
    gpd_samples[class_name]=i # add attribute field so that we can use collect_training_data function
    if gpd_samples.crs is None:
        gpd_samples=gpd_samples.set_crs(crs)
    print('radomly sampled points for class ',i,'\n',gpd_samples)
    # extract data for the random samples
    column_names, sampled_data = collect_training_data(gdf=gpd_samples,
                                                          dc_query=query,
                                                          ncpus=ncpus,
#                                                           ncpus=1,
                                                          field=class_name, 
                                                          zonal_stats=zonal_stats,
                                                          feature_func=feature_layers,
                                                          return_coords=False)
    # standardise features
    scaler=scaler.fit(sampled_data[:,1:])
    sampled_data=scaler.transform(sampled_data[:,1:])
#     sampled_data[:,-6:]=sampled_data[:,-6:]*10000
#     sampled_data=sampled_data[:,1:]
    # fit kmeans model using the sample training data
    # first find optimal number of clusters based on Calinski-Harabasz index
    highest_score=-999
    n_cluster_optimal=5
    kmeans_model_optimal=None # initialise optimal model parameters
    labels_optimal=None
    for n_cluster in range(5,26):
        kmeans_model = KMeans(n_clusters=n_cluster, random_state=1).fit(sampled_data)
        labels=kmeans_model.predict(sampled_data)
        score=metrics.calinski_harabasz_score(sampled_data, labels)
#         score=metrics.davies_bouldin_score(sampled_data, labels)
        print('Calinski-Harabasz score for ',n_cluster,' clusters is: ',score)
#         print('Davies-Bouldin score for ',n_cluster,' clusters is: ',score)
        if (highest_score==-999)or(highest_score<score):
#         if (highest_score==-999)or(highest_score>score):
            highest_score=score
            n_cluster_optimal=n_cluster
            kmeans_model_optimal=kmeans_model
            labels_optimal=labels
    print('Best number of clusters for class %s: %s'%(i,n_cluster_optimal))
    
    # subset original training points for this class
    td_single_class=training_data2017[training_data2017[class_name]==i].reset_index(drop=True)
    print('Number of training data collected: ',len(td_single_class))
    column_names, model_input = collect_training_data(gdf=td_single_class,
                                                      dc_query=query,
                                                      ncpus=ncpus,
                                                      field=class_name,
                                                      zonal_stats=zonal_stats,
                                                      feature_func=feature_layers,
                                                      return_coords=True)
    print('Number of training data after removing Nans and Infs: ',model_input.shape[0])
    # first covert the training data to pandas
    td_single_class_filtered=pd.DataFrame(data=model_input,columns=column_names)
    # then to geopandas dataframe
    td_single_class_filtered=gpd.GeoDataFrame(td_single_class_filtered, 
                                    geometry=gpd.points_from_xy(model_input[:,-2], model_input[:,-1],
                                                                crs=crs))
    # normalisation before clustering
    model_input=scaler.transform(model_input[:,1:-2])
#     model_input=model_input[:,1:-2]
#     model_input[:,-6:]=model_input[:,-6:]*10000
    # predict clustering labels
    labels_kmeans = kmeans_model_optimal.predict(model_input)
    # append clustering results to pixel coordinates
    td_single_class_filtered['cluster']=labels_kmeans
    # append frequency of each cluster
    labels_optimal=pd.DataFrame(data=labels_optimal,columns=['cluster']) # calculate cluster frequencies of the random samples
    cluster_frequency=td_single_class_filtered['cluster'].map(labels_optimal['cluster'].value_counts(normalize=True))
    td_single_class_filtered['cluster_frequency']=cluster_frequency
#     print('filtered training data: \n',td_single_class_filtered[td_single_class_filtered['cluster_frequency']<frequency_threshold])
    # filter by cluster frequency
    td_single_class_filtered=td_single_class_filtered[td_single_class_filtered['cluster_frequency']>=frequency_threshold]
    print('Number of training data after filtering: ',len(td_single_class_filtered))
    # export filtered training data for this class as shapefile (will encounter 10-character limit for attributes)
#     td_single_class_filtered.to_file('Results/landcover_td2021_filtered_DEAfrica_new_class_'+str(i)+'.shp')
    # export filtered training data for this class as geojson file
    td_single_class_filtered.to_file('Results/landcover_td2021_filtered_class_'+str(i)+'.geojson', driver="GeoJSON")
    # append the filtered training points of this class to final filtered training data
    if td2021_filtered is None:
        td2021_filtered=td_single_class_filtered
    else:
        td2021_filtered=pd.concat([td2021_filtered, td_single_class_filtered])
        
# save training data for all classes
print('filtered training data for 2021:\n',td2021_filtered)
td2021_filtered.to_file('Results/landcover_td2021_filtered.geojson', driver="GeoJSON")

# export the filtered training data as txt file
output_file = "Results/landcover_td2021_filtered.txt"
td2021_filtered.to_csv(output_file, header=True, index=None, sep=' ')

Processing class  3
stratified random sampling from tile  0
Class 3: sampled at 103 coordinates
stratified random sampling from tile  1
Class 3: sampled at 153 coordinates
stratified random sampling from tile  2
Class 3: sampled at 151 coordinates
stratified random sampling from tile  3
Class 3: sampled at 71 coordinates
stratified random sampling from tile  4
Class 3: sampled at 227 coordinates
stratified random sampling from tile  5
Class 3: sampled at 131 coordinates
stratified random sampling from tile  6
Class 3: sampled at 12 coordinates
stratified random sampling from tile  7
Class 3: sampled at 1 coordinates
stratified random sampling from tile  8
Class 3: sampled at 44 coordinates
stratified random sampling from tile  9
Class 3: sampled at 10 coordinates
radomly sampled points for class  3 
                             geometry  LC_Class_I
0     POINT (564331.117 8657665.275)           3
1     POINT (313771.117 8332945.275)           3
2     POINT (327481.117 8597815.275)     

  0%|          | 0/903 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not perm

Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (892, 67)
Calinski-Harabasz score for  5  clusters is:  121.18073402206959
Calinski-Harabasz score for  6  clusters is:  115.22763779370413
Calinski-Harabasz score for  7  clusters is:  110.11488805222984
Calinski-Harabasz score for  8  clusters is:  104.62979928155579
Calinski-Harabasz score for  9  clusters is:  99.14895585025577
Calinski-Harabasz score for  10  clusters is:  94.31112159334593
Calinski-Harabasz score for  11  clusters is:  89.47470646656937
Calinski-Harabasz score for  12  clusters is:  85.65585932616916
Calinski-Harabasz score for  13  clusters is:  82.20423822247736
Calinski-Harabasz score for  14  clusters is:  80.66235832082143
Calinski-Harabasz score for  15  clusters is:  75.79495030804776
Calinski-Harabasz score for  16  clusters is:  74.6482449432701
Calinski-Harabasz score for  17  clusters is:  71.2351973951531
Calinski-Harabasz score for  18  clusters is:  69.

  0%|          | 0/1291 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)


Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (1273, 69)
Number of training data after removing Nans and Infs:  1273
Number of training data after filtering:  1273
Processing class  5
stratified random sampling from tile  0
Class 5: sampled at 97 coordinates
stratified random sampling from tile  1
Class 5: sampled at 190 coordinates
stratified random sampling from tile  2
Class 5: sampled at 197 coordinates
stratified random sampling from tile  3
Class 5: sampled at 106 coordinates
stratified random sampling from tile  4
Class 5: sampled at 165 coordinates
stratified random sampling from tile  5
Class 5: sampled at 84 coordinates
stratified random sampling from tile  6
Class 5: sampled at 5 coordinates
stratified random sampling from tile  7
Class 5: sampled at 0 coordinates
stratified random sampling from tile  8
Class 5: sampled at 39 coordinates
stratified random sampling from tile  9
Class 5: sampled at 6 coordinates
radomly sampl

  0%|          | 0/889 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not perm

Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (883, 67)
Calinski-Harabasz score for  5  clusters is:  144.18639982439132
Calinski-Harabasz score for  6  clusters is:  128.13543726901878
Calinski-Harabasz score for  7  clusters is:  116.53747219247207
Calinski-Harabasz score for  8  clusters is:  105.69295452474148
Calinski-Harabasz score for  9  clusters is:  99.81753886119022
Calinski-Harabasz score for  10  clusters is:  93.57639832543944
Calinski-Harabasz score for  11  clusters is:  87.71112260763586
Calinski-Harabasz score for  12  clusters is:  84.46097197713394
Calinski-Harabasz score for  13  clusters is:  81.6384358480126
Calinski-Harabasz score for  14  clusters is:  76.61572762426137
Calinski-Harabasz score for  15  clusters is:  73.28570324580419
Calinski-Harabasz score for  16  clusters is:  71.23521490415997
Calinski-Harabasz score for  17  clusters is:  69.06506176761293
Calinski-Harabasz score for  18  clusters is:  66

  0%|          | 0/662 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)


Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (659, 69)
Number of training data after removing Nans and Infs:  659
Number of training data after filtering:  659
Processing class  1
stratified random sampling from tile  0
Class 1: sampled at 44 coordinates
stratified random sampling from tile  1
Class 1: sampled at 12 coordinates
stratified random sampling from tile  2
Class 1: sampled at 11 coordinates
stratified random sampling from tile  3
Class 1: sampled at 109 coordinates
stratified random sampling from tile  4
Class 1: sampled at 7 coordinates
stratified random sampling from tile  5
Class 1: sampled at 84 coordinates
stratified random sampling from tile  6
Class 1: sampled at 209 coordinates
stratified random sampling from tile  7
Class 1: sampled at 96 coordinates
stratified random sampling from tile  8
Class 1: sampled at 284 coordinates
stratified random sampling from tile  9
Class 1: sampled at 89 coordinates
radomly sampled

  0%|          | 0/945 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not perm

Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (927, 67)
Calinski-Harabasz score for  5  clusters is:  630.0272931386039
Calinski-Harabasz score for  6  clusters is:  588.961432052608
Calinski-Harabasz score for  7  clusters is:  543.5268978554352
Calinski-Harabasz score for  8  clusters is:  518.1819967306607
Calinski-Harabasz score for  9  clusters is:  481.26736271835466
Calinski-Harabasz score for  10  clusters is:  456.20394787242867
Calinski-Harabasz score for  11  clusters is:  433.0182770774861
Calinski-Harabasz score for  12  clusters is:  407.9174050325895
Calinski-Harabasz score for  13  clusters is:  395.3818439075187
Calinski-Harabasz score for  14  clusters is:  377.91637058087645
Calinski-Harabasz score for  15  clusters is:  370.0486526639676
Calinski-Harabasz score for  16  clusters is:  360.0579353060557
Calinski-Harabasz score for  17  clusters is:  352.4434973608781
Calinski-Harabasz score for  18  clusters is:  346

  0%|          | 0/172 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)


Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (168, 69)
Number of training data after removing Nans and Infs:  168
Number of training data after filtering:  61
Processing class  2
stratified random sampling from tile  0
Class 2: sampled at 148 coordinates
stratified random sampling from tile  1
Class 2: sampled at 192 coordinates
stratified random sampling from tile  2
Class 2: sampled at 73 coordinates
stratified random sampling from tile  3
Class 2: sampled at 44 coordinates
stratified random sampling from tile  4
Class 2: sampled at 107 coordinates
stratified random sampling from tile  5
Class 2: sampled at 172 coordinates
stratified random sampling from tile  6
Class 2: sampled at 27 coordinates
stratified random sampling from tile  7
Class 2: sampled at 3 coordinates
stratified random sampling from tile  8
Class 2: sampled at 65 coordinates
stratified random sampling from tile  9
Class 2: sampled at 23 coordinates
radomly sampled

  0%|          | 0/854 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)


Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (845, 67)
Calinski-Harabasz score for  5  clusters is:  164.98065674498488
Calinski-Harabasz score for  6  clusters is:  152.73888752901402
Calinski-Harabasz score for  7  clusters is:  141.89233961482572
Calinski-Harabasz score for  8  clusters is:  132.43422595862216
Calinski-Harabasz score for  9  clusters is:  122.05461776871876
Calinski-Harabasz score for  10  clusters is:  114.52922831072557
Calinski-Harabasz score for  11  clusters is:  107.56834058012703
Calinski-Harabasz score for  12  clusters is:  101.29941813265646
Calinski-Harabasz score for  13  clusters is:  97.5030978022485
Calinski-Harabasz score for  14  clusters is:  93.99069862689755
Calinski-Harabasz score for  15  clusters is:  90.57570280693758
Calinski-Harabasz score for  16  clusters is:  85.77743297510757
Calinski-Harabasz score for  17  clusters is:  81.32678763743982
Calinski-Harabasz score for  18  clusters is:

  0%|          | 0/339 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)


Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (337, 69)
Number of training data after removing Nans and Infs:  337
Number of training data after filtering:  333
Processing class  4
stratified random sampling from tile  0
Class 4: sampled at 24 coordinates
stratified random sampling from tile  1
Class 4: sampled at 174 coordinates
stratified random sampling from tile  2
Class 4: sampled at 282 coordinates
stratified random sampling from tile  3
Class 4: sampled at 165 coordinates
stratified random sampling from tile  4
Class 4: sampled at 73 coordinates
stratified random sampling from tile  5
Class 4: sampled at 83 coordinates
stratified random sampling from tile  6
Class 4: sampled at 61 coordinates
stratified random sampling from tile  7
Class 4: sampled at 6 coordinates
stratified random sampling from tile  8
Class 4: sampled at 32 coordinates
stratified random sampling from tile  9
Class 4: sampled at 37 coordinates
radomly sampled

  0%|          | 0/937 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not perm

Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (921, 67)
Calinski-Harabasz score for  5  clusters is:  604.1878025353604
Calinski-Harabasz score for  6  clusters is:  554.6205299310844
Calinski-Harabasz score for  7  clusters is:  526.494582318181
Calinski-Harabasz score for  8  clusters is:  497.5040403392479
Calinski-Harabasz score for  9  clusters is:  472.342480677431
Calinski-Harabasz score for  10  clusters is:  449.93853019754107
Calinski-Harabasz score for  11  clusters is:  426.78275785803095
Calinski-Harabasz score for  12  clusters is:  402.5345159736822
Calinski-Harabasz score for  13  clusters is:  380.7651914996268
Calinski-Harabasz score for  14  clusters is:  364.8797728439266
Calinski-Harabasz score for  15  clusters is:  349.317573378688
Calinski-Harabasz score for  16  clusters is:  334.7366929523318
Calinski-Harabasz score for  17  clusters is:  324.7321769606024
Calinski-Harabasz score for  18  clusters is:  315.95

  0%|          | 0/35 [00:00<?, ?it/s]

CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not permitted)
CPLReleaseMutex: Error = 1 (Operation not perm

Percentage of possible fails after run 1 = 0.0 %
Removed 0 rows wth NaNs &/or Infs
Output shape:  (35, 69)
Number of training data after removing Nans and Infs:  35
Number of training data after filtering:  35
filtered training data for 2021:
     LC_Class_I       blue_0       blue_1       blue_2       blue_3  \
0          3.0   498.881073   400.912933   366.908997   552.580688   
1          3.0   477.645630   349.235870   251.920792   312.284973   
2          3.0   401.701843   371.754852   286.077118   311.395172   
3          3.0   461.556244   352.943115   286.713257   333.606384   
4          3.0   627.746338   373.808105   325.894257   475.023315   
..         ...          ...          ...          ...          ...   
30         4.0   886.718567   761.163391   750.301697   786.533813   
31         4.0  1294.000000   701.280823   322.233154   454.280548   
32         4.0   481.076233   411.874603   601.250610   582.437378   
33         4.0  1100.274414  1028.102539  1044.010254  1