# Example 2 - Sand labelling

<img src="images/banner3.png" width="100%" />

<font face="Calibri">
<br>
<font size="5"> <b>Sand clustering with Silhouette Analysis and KMeans notebook</b></font>

<br>
<font size="4"> <b> Nicolas Pucino; PhD Student @ Deakin University, Australia </b> <br>
<img style="padding:7px;" src="images/sandpiper_sand_retouched.png" width="170" align="right" /></font>

<font size="3">This notebook illustrates how to use Sandpiper to perform Silhouette Analysis and KMeans on all previously extracted points. <br>

<b>This notebook covers the following concepts:</b>

- Silhouete Analysis.
- KMeans clustering.
</font>


</font>

In [5]:
import pandas as pd
import geopandas as gpd
import numpy as np

from sandpyper.outils import coords_to_points 
from sandpyper.labels import get_sil_location, get_opt_k, kmeans_sa

Loading the project-related lists

- loc codes
- crs dict string

In [6]:
# The location codes used troughout the analysis
loc_codes=["mar","leo"]

# The Coordinate Reference Systems used troughout this study
crs_dict_string= {
                 'mar': {'init': 'epsg:32754'},
                 'leo': {'init': 'epsg:32755'},
                 }

## Loading, merging and preparing the tables

The function __get_merged_table__ merge the rgb and z tables together and format it in a way it is digestible for further analysis.

In [7]:
%%time

#Loading the tables

rgb_table_path=r"C:\my_packages\doc_data\profiles\rgb.csv"
z_table_path=r"C:\my_packages\doc_data\profiles\elevation.csv"

rgb_table=gpd.read_file(rgb_table_path)
z_table=gpd.read_file(z_table_path)

# As the distance (across-transect) comes from an interpolation, it has too many digits.
# let's round both tables distance columns to 2 significant values and assign their data type as "float".

rgb_table["distance"]=np.round(rgb_table.loc[:,"distance"].values.astype("float"),2)
z_table["distance"]=np.round(z_table.loc[:,"distance"].values.astype("float"),2)

  for feature in features_lst:


Wall time: 43.6 s


Storing Geodataframes as CSV is handy, but __we lose the column data type information__.
Especially important is the __geometry column__, which we need to convert back into __Shapely Point object format__.
To do that, the function __coords_to_points__ can be used across a Series ('geometry'). It can take quite a bit of time, so, if you have a lot of points, get ready!

In [8]:
rgb_table['geometry']=rgb_table.coordinates.apply(coords_to_points)
z_table['geometry']=z_table.coordinates.apply(coords_to_points)

In [9]:
# Here, we merge the two tables (storing elevation and rgb information)

data_merged = pd.merge(z_table,rgb_table[["band1","band2","band3","point_id"]],on="point_id",validate="one_to_one")

# replace empty values with np.NaN
data_merged=data_merged.replace("", np.NaN)

# and convert the z column into floats.
data_merged['z']=data_merged.z.astype("float")

In [10]:
# Here, we add two features, slope and curvature, computed from the elevation series,
# in case we wnat to use for KMeans clustering.
# Note that when passing from one transect to another, slope and curvature computations are wrong.
# However, we will clip those areas as they are in the water or in the backdune.

data_merged["slope"]=np.gradient(data_merged.z)
data_merged["curve"]=np.gradient(data_merged.slope)

In [11]:
# Our rasters have NaN values set to -32767.0. Thus, we replace them with np.Nan.
data_merged.z.replace(-32767.0,np.nan,inplace=True)

## Iterative silhouette analysis


The __get_sil_location__ function will iteratively perform KMeans clustering and Silhouette Analysis with increasing number of clusters (k, specified in the `ks` parameter) for every survey, using the feature set specified in the parameter `feature_set`.

This will return a dataframe with Average Silhouette scores with different k for all surveys, which we use to find sub-optimal number of clusters with __get_opt_k__ function.

Then, with the sub-optimal k, we finally run KMeans with __kmeans_sa__ function on all the surveys to obtain clustered points to visually discriminate between sand and non-sand in a Qgis environment.

In [13]:
%%time
# Run interatively KMeans + SA

feature_set=["band1","band2","band3"]
sil_df=get_sil_location(data_merged,
                        ks=(2,30), 
                        feature_set=feature_set,
                       random_state=10)

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

Working on : mar, 2019-05-16.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.7543939665567415




For n_clusters = 3 The average silhouette_score is : 0.5590529540689352




For n_clusters = 4 The average silhouette_score is : 0.5597768376432101




For n_clusters = 5 The average silhouette_score is : 0.5107457195645853




For n_clusters = 6 The average silhouette_score is : 0.49147067768976826




For n_clusters = 7 The average silhouette_score is : 0.4997813084147024




For n_clusters = 8 The average silhouette_score is : 0.4435375962382009




For n_clusters = 9 The average silhouette_score is : 0.44364207240575093




For n_clusters = 10 The average silhouette_score is : 0.4462891892935896




For n_clusters = 11 The average silhouette_score is : 0.42665113680120165




For n_clusters = 12 The average silhouette_score is : 0.40688093936549236




For n_clusters = 13 The average silhouette_score is : 0.41919539998270466




For n_clusters = 14 The average silhouette_score is : 0.4029391158283152




For n_clusters = 15 The average silhouette_score is : 0.40633497868830076




For n_clusters = 16 The average silhouette_score is : 0.4080035324646522




For n_clusters = 17 The average silhouette_score is : 0.403488429430863




For n_clusters = 18 The average silhouette_score is : 0.40403132905170486




For n_clusters = 19 The average silhouette_score is : 0.40313963732265284




For n_clusters = 20 The average silhouette_score is : 0.3768575897290801




For n_clusters = 21 The average silhouette_score is : 0.40622777213304384




For n_clusters = 22 The average silhouette_score is : 0.3621308985121348




For n_clusters = 23 The average silhouette_score is : 0.3820177660081013




For n_clusters = 24 The average silhouette_score is : 0.38930478237942484




For n_clusters = 25 The average silhouette_score is : 0.37236685519185414




For n_clusters = 26 The average silhouette_score is : 0.3702019504047717




For n_clusters = 27 The average silhouette_score is : 0.37128676587797854




For n_clusters = 28 The average silhouette_score is : 0.35680138177328574




For n_clusters = 29 The average silhouette_score is : 0.37529686408938806
Working on : mar, 2019-03-13.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.6950985612390032




For n_clusters = 3 The average silhouette_score is : 0.5323208581370846




For n_clusters = 4 The average silhouette_score is : 0.5185963377218871




For n_clusters = 5 The average silhouette_score is : 0.4873307642316771




For n_clusters = 6 The average silhouette_score is : 0.48838245012833004




For n_clusters = 7 The average silhouette_score is : 0.45696599660748133




For n_clusters = 8 The average silhouette_score is : 0.44680629560542023




For n_clusters = 9 The average silhouette_score is : 0.4522911170981031




For n_clusters = 10 The average silhouette_score is : 0.44533580281110047




For n_clusters = 11 The average silhouette_score is : 0.4463197435715727




For n_clusters = 12 The average silhouette_score is : 0.43756472499689997




For n_clusters = 13 The average silhouette_score is : 0.3999697713967485




For n_clusters = 14 The average silhouette_score is : 0.3830035947753742




For n_clusters = 15 The average silhouette_score is : 0.376517037347933




For n_clusters = 16 The average silhouette_score is : 0.37884631280081826




For n_clusters = 17 The average silhouette_score is : 0.3663994774586536




For n_clusters = 18 The average silhouette_score is : 0.36666698771063916




For n_clusters = 19 The average silhouette_score is : 0.3662389671050954




For n_clusters = 20 The average silhouette_score is : 0.35901585624189386




For n_clusters = 21 The average silhouette_score is : 0.34185620198773253




For n_clusters = 22 The average silhouette_score is : 0.33616289100540525




For n_clusters = 23 The average silhouette_score is : 0.3525701571315763




For n_clusters = 24 The average silhouette_score is : 0.32137483513490156




For n_clusters = 25 The average silhouette_score is : 0.3234130030500551




For n_clusters = 26 The average silhouette_score is : 0.3234221143691596




For n_clusters = 27 The average silhouette_score is : 0.3207832686410088




For n_clusters = 28 The average silhouette_score is : 0.32660976522939694




For n_clusters = 29 The average silhouette_score is : 0.3142476396516527
Working on : mar, 2019-02-05.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.7050449079216488




For n_clusters = 3 The average silhouette_score is : 0.5572061332483237




For n_clusters = 4 The average silhouette_score is : 0.48290371610383537




For n_clusters = 5 The average silhouette_score is : 0.47507821018218427




For n_clusters = 6 The average silhouette_score is : 0.47522585493175146




For n_clusters = 7 The average silhouette_score is : 0.45469304663958504




For n_clusters = 8 The average silhouette_score is : 0.46818802537610255




For n_clusters = 9 The average silhouette_score is : 0.44071576400279017




For n_clusters = 10 The average silhouette_score is : 0.41201627081040165




For n_clusters = 11 The average silhouette_score is : 0.3951563615913788




For n_clusters = 12 The average silhouette_score is : 0.39790913358513175




For n_clusters = 13 The average silhouette_score is : 0.3904244951617831




For n_clusters = 14 The average silhouette_score is : 0.3803196152217041




For n_clusters = 15 The average silhouette_score is : 0.3817320237342481




For n_clusters = 16 The average silhouette_score is : 0.3770246920575449




For n_clusters = 17 The average silhouette_score is : 0.3690155267053958




For n_clusters = 18 The average silhouette_score is : 0.36976220215100225




For n_clusters = 19 The average silhouette_score is : 0.37264715005733795




For n_clusters = 20 The average silhouette_score is : 0.37420613288346194




For n_clusters = 21 The average silhouette_score is : 0.37184117728932364




For n_clusters = 22 The average silhouette_score is : 0.3659601978812967




For n_clusters = 23 The average silhouette_score is : 0.35799520769621673




For n_clusters = 24 The average silhouette_score is : 0.37062287045999015




For n_clusters = 25 The average silhouette_score is : 0.33913070361943537




For n_clusters = 26 The average silhouette_score is : 0.3587452862551262




For n_clusters = 27 The average silhouette_score is : 0.3514491056176596




For n_clusters = 28 The average silhouette_score is : 0.34936229394377744




For n_clusters = 29 The average silhouette_score is : 0.3479702713856142
Working on : mar, 2018-12-11.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.6258245854229769




For n_clusters = 3 The average silhouette_score is : 0.5755844056627462




For n_clusters = 4 The average silhouette_score is : 0.5327273933986343




For n_clusters = 5 The average silhouette_score is : 0.5144378974212911




For n_clusters = 6 The average silhouette_score is : 0.46557477579783846




For n_clusters = 7 The average silhouette_score is : 0.44092180331777275




For n_clusters = 8 The average silhouette_score is : 0.44779285528309654




For n_clusters = 9 The average silhouette_score is : 0.43293603717109586




For n_clusters = 10 The average silhouette_score is : 0.4242816118470407




For n_clusters = 11 The average silhouette_score is : 0.42483958669232763




For n_clusters = 12 The average silhouette_score is : 0.3980237810614489




For n_clusters = 13 The average silhouette_score is : 0.3775442959580456




For n_clusters = 14 The average silhouette_score is : 0.3694741011196338




For n_clusters = 15 The average silhouette_score is : 0.3570456611400179




For n_clusters = 16 The average silhouette_score is : 0.35455427422764224




For n_clusters = 17 The average silhouette_score is : 0.3565658306360779




For n_clusters = 18 The average silhouette_score is : 0.34293287529459987




For n_clusters = 19 The average silhouette_score is : 0.34850132645391874




For n_clusters = 20 The average silhouette_score is : 0.3304317568143361




For n_clusters = 21 The average silhouette_score is : 0.3361209418879207




For n_clusters = 22 The average silhouette_score is : 0.34057431132823773




For n_clusters = 23 The average silhouette_score is : 0.3074423373220782




For n_clusters = 24 The average silhouette_score is : 0.3273794770683494




For n_clusters = 25 The average silhouette_score is : 0.3175403686141698




For n_clusters = 26 The average silhouette_score is : 0.31353112400825767




For n_clusters = 27 The average silhouette_score is : 0.3276884525328776




For n_clusters = 28 The average silhouette_score is : 0.3142168342551592




For n_clusters = 29 The average silhouette_score is : 0.31407552531677463
Working on : mar, 2018-11-13.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.6545542532179708




For n_clusters = 3 The average silhouette_score is : 0.5710750391468317




For n_clusters = 4 The average silhouette_score is : 0.5675569042692694




For n_clusters = 5 The average silhouette_score is : 0.5079940975804421




For n_clusters = 6 The average silhouette_score is : 0.44455190796772026




For n_clusters = 7 The average silhouette_score is : 0.42851142793853947




For n_clusters = 8 The average silhouette_score is : 0.43470874548460464




For n_clusters = 9 The average silhouette_score is : 0.3990979902898678




For n_clusters = 10 The average silhouette_score is : 0.389548264724396




For n_clusters = 11 The average silhouette_score is : 0.3936997101873047




For n_clusters = 12 The average silhouette_score is : 0.36462292531021673




For n_clusters = 13 The average silhouette_score is : 0.3720578455063357




For n_clusters = 14 The average silhouette_score is : 0.37500501831884253




For n_clusters = 15 The average silhouette_score is : 0.38214108415489567




For n_clusters = 16 The average silhouette_score is : 0.37537796121605255




For n_clusters = 17 The average silhouette_score is : 0.3698913077901671




For n_clusters = 18 The average silhouette_score is : 0.3611784563600014




For n_clusters = 19 The average silhouette_score is : 0.35724986018258187




For n_clusters = 20 The average silhouette_score is : 0.362238512876087




For n_clusters = 21 The average silhouette_score is : 0.3540277702054937




For n_clusters = 22 The average silhouette_score is : 0.35707696831113883




For n_clusters = 23 The average silhouette_score is : 0.3589305387692913




For n_clusters = 24 The average silhouette_score is : 0.3564906988114842




For n_clusters = 25 The average silhouette_score is : 0.35901365049184086




For n_clusters = 26 The average silhouette_score is : 0.34209730847973585




For n_clusters = 27 The average silhouette_score is : 0.343426148392728




For n_clusters = 28 The average silhouette_score is : 0.3397020392216121




For n_clusters = 29 The average silhouette_score is : 0.329197808452119
Working on : mar, 2018-09-25.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.6744487403277143




For n_clusters = 3 The average silhouette_score is : 0.5392986693421657




For n_clusters = 4 The average silhouette_score is : 0.5026209541415217




For n_clusters = 5 The average silhouette_score is : 0.4761894975049634




For n_clusters = 6 The average silhouette_score is : 0.4493166300011888




For n_clusters = 7 The average silhouette_score is : 0.4374599393793423




For n_clusters = 8 The average silhouette_score is : 0.4126497751695764




For n_clusters = 9 The average silhouette_score is : 0.41642734513800916




For n_clusters = 10 The average silhouette_score is : 0.3905059200920275




For n_clusters = 11 The average silhouette_score is : 0.38461377960831045




For n_clusters = 12 The average silhouette_score is : 0.3952147077399698




For n_clusters = 13 The average silhouette_score is : 0.3775584583332975




For n_clusters = 14 The average silhouette_score is : 0.36665602781989487




For n_clusters = 15 The average silhouette_score is : 0.3476178796028279




For n_clusters = 16 The average silhouette_score is : 0.3577833627430398




For n_clusters = 17 The average silhouette_score is : 0.3410117883866872




For n_clusters = 18 The average silhouette_score is : 0.3305100257828045




For n_clusters = 19 The average silhouette_score is : 0.3436067711400018




For n_clusters = 20 The average silhouette_score is : 0.34390817353489195




For n_clusters = 21 The average silhouette_score is : 0.3461144859125247




For n_clusters = 22 The average silhouette_score is : 0.3477109849140416




For n_clusters = 23 The average silhouette_score is : 0.35826361367616066




For n_clusters = 24 The average silhouette_score is : 0.33482765940443804




For n_clusters = 25 The average silhouette_score is : 0.3349352816169628




For n_clusters = 26 The average silhouette_score is : 0.3471592663721548




For n_clusters = 27 The average silhouette_score is : 0.35138080825858925




For n_clusters = 28 The average silhouette_score is : 0.3456302917536888




For n_clusters = 29 The average silhouette_score is : 0.34222609513264496
Working on : mar, 2018-07-27.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5829181945104884




For n_clusters = 3 The average silhouette_score is : 0.4991657608589235




For n_clusters = 4 The average silhouette_score is : 0.43711383211978866




For n_clusters = 5 The average silhouette_score is : 0.40387097504559516




For n_clusters = 6 The average silhouette_score is : 0.37271224444876




For n_clusters = 7 The average silhouette_score is : 0.3459927343513429




For n_clusters = 8 The average silhouette_score is : 0.3625481400819796




For n_clusters = 9 The average silhouette_score is : 0.35077434405169505




For n_clusters = 10 The average silhouette_score is : 0.35806009248213827




For n_clusters = 11 The average silhouette_score is : 0.34021942036608915




For n_clusters = 12 The average silhouette_score is : 0.3415381446081745




For n_clusters = 13 The average silhouette_score is : 0.34488148535163066




For n_clusters = 14 The average silhouette_score is : 0.34158899213141625




For n_clusters = 15 The average silhouette_score is : 0.31828160326889104




For n_clusters = 16 The average silhouette_score is : 0.32368603222964876




For n_clusters = 17 The average silhouette_score is : 0.33240755302727487




For n_clusters = 18 The average silhouette_score is : 0.3308499343846033




For n_clusters = 19 The average silhouette_score is : 0.33437895899201275




For n_clusters = 20 The average silhouette_score is : 0.3294608175313527




For n_clusters = 21 The average silhouette_score is : 0.3345101952483468




For n_clusters = 22 The average silhouette_score is : 0.3315662484986409




For n_clusters = 23 The average silhouette_score is : 0.3254906982224996




For n_clusters = 24 The average silhouette_score is : 0.32817483132908126




For n_clusters = 25 The average silhouette_score is : 0.3283882083783871




For n_clusters = 26 The average silhouette_score is : 0.3247246103991301




For n_clusters = 27 The average silhouette_score is : 0.33258650871622053




For n_clusters = 28 The average silhouette_score is : 0.31839971082280843




For n_clusters = 29 The average silhouette_score is : 0.3259909704309849
Working on : mar, 2018-06-21.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.6609436299728905




For n_clusters = 3 The average silhouette_score is : 0.5105105123877249




For n_clusters = 4 The average silhouette_score is : 0.5050349613551376




For n_clusters = 5 The average silhouette_score is : 0.45256012955938446




For n_clusters = 6 The average silhouette_score is : 0.4349581921095916




For n_clusters = 7 The average silhouette_score is : 0.41859038867691084




For n_clusters = 8 The average silhouette_score is : 0.4234211024927338




For n_clusters = 9 The average silhouette_score is : 0.39687201470812955




For n_clusters = 10 The average silhouette_score is : 0.3905707154209053




For n_clusters = 11 The average silhouette_score is : 0.3967275609725787




For n_clusters = 12 The average silhouette_score is : 0.38434726585607015




For n_clusters = 13 The average silhouette_score is : 0.3986738515306575




For n_clusters = 14 The average silhouette_score is : 0.3737025902641873




For n_clusters = 15 The average silhouette_score is : 0.38916974209688804




For n_clusters = 16 The average silhouette_score is : 0.3752412371293679




For n_clusters = 17 The average silhouette_score is : 0.3706667320676292




For n_clusters = 18 The average silhouette_score is : 0.35376695869423946




For n_clusters = 19 The average silhouette_score is : 0.3506500852417926




For n_clusters = 20 The average silhouette_score is : 0.35246371799249554




For n_clusters = 21 The average silhouette_score is : 0.34969524318980466




For n_clusters = 22 The average silhouette_score is : 0.35712780513259396




For n_clusters = 23 The average silhouette_score is : 0.36449864065112




For n_clusters = 24 The average silhouette_score is : 0.36259915011923927




For n_clusters = 25 The average silhouette_score is : 0.3632761644662602




For n_clusters = 26 The average silhouette_score is : 0.3615503998187348




For n_clusters = 27 The average silhouette_score is : 0.35910807606656103




For n_clusters = 28 The average silhouette_score is : 0.3633695806923143




For n_clusters = 29 The average silhouette_score is : 0.3502038741046167
Working on : mar, 2018-06-01.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.578290780043451




For n_clusters = 3 The average silhouette_score is : 0.4504509920689841




For n_clusters = 4 The average silhouette_score is : 0.4249015523189814




For n_clusters = 5 The average silhouette_score is : 0.4141959217333221




For n_clusters = 6 The average silhouette_score is : 0.3725695245735541




For n_clusters = 7 The average silhouette_score is : 0.3904307832590041




For n_clusters = 8 The average silhouette_score is : 0.38825665407100834




For n_clusters = 9 The average silhouette_score is : 0.3621595657627527




For n_clusters = 10 The average silhouette_score is : 0.3587727609149331




For n_clusters = 11 The average silhouette_score is : 0.36672030461453325




For n_clusters = 12 The average silhouette_score is : 0.3702095671177475




For n_clusters = 13 The average silhouette_score is : 0.3509654932786643




For n_clusters = 14 The average silhouette_score is : 0.336122089714643




For n_clusters = 15 The average silhouette_score is : 0.3619735857276992




For n_clusters = 16 The average silhouette_score is : 0.3439128953907145




For n_clusters = 17 The average silhouette_score is : 0.3515310511131326




For n_clusters = 18 The average silhouette_score is : 0.35448912674256056




For n_clusters = 19 The average silhouette_score is : 0.3383383167789686




For n_clusters = 20 The average silhouette_score is : 0.3588742492537112




For n_clusters = 21 The average silhouette_score is : 0.3366145839290013




For n_clusters = 22 The average silhouette_score is : 0.34092521588014096




For n_clusters = 23 The average silhouette_score is : 0.34407015003613445




For n_clusters = 24 The average silhouette_score is : 0.33922677592652745




For n_clusters = 25 The average silhouette_score is : 0.33636386674664476




For n_clusters = 26 The average silhouette_score is : 0.3408016979032742




For n_clusters = 27 The average silhouette_score is : 0.3398205970662597




For n_clusters = 28 The average silhouette_score is : 0.3331857698617566




For n_clusters = 29 The average silhouette_score is : 0.32752843575403423


  0%|          | 0/6 [00:00<?, ?it/s]

Working on : leo, 2019-07-31.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5518537832303795




For n_clusters = 3 The average silhouette_score is : 0.5150786057848612




For n_clusters = 4 The average silhouette_score is : 0.513486998364905




For n_clusters = 5 The average silhouette_score is : 0.47680759479265894




For n_clusters = 6 The average silhouette_score is : 0.4582328487690697




For n_clusters = 7 The average silhouette_score is : 0.44228542995869785




For n_clusters = 8 The average silhouette_score is : 0.411621151194679




For n_clusters = 9 The average silhouette_score is : 0.3817713441906702




For n_clusters = 10 The average silhouette_score is : 0.36502530439019965




For n_clusters = 11 The average silhouette_score is : 0.371644582370442




For n_clusters = 12 The average silhouette_score is : 0.3584694589253414




For n_clusters = 13 The average silhouette_score is : 0.34121690056210346




For n_clusters = 14 The average silhouette_score is : 0.33050873808547393




For n_clusters = 15 The average silhouette_score is : 0.3171252488805076




For n_clusters = 16 The average silhouette_score is : 0.32201647437053604




For n_clusters = 17 The average silhouette_score is : 0.32539224359804947




For n_clusters = 18 The average silhouette_score is : 0.3238518489798979




For n_clusters = 19 The average silhouette_score is : 0.31309663935999804




For n_clusters = 20 The average silhouette_score is : 0.31735420126470026




For n_clusters = 21 The average silhouette_score is : 0.3225964077502343




For n_clusters = 22 The average silhouette_score is : 0.31575969131507103




For n_clusters = 23 The average silhouette_score is : 0.3177461221811649




For n_clusters = 24 The average silhouette_score is : 0.3141507730878398




For n_clusters = 25 The average silhouette_score is : 0.3111576806410848




For n_clusters = 26 The average silhouette_score is : 0.31403853103298535




For n_clusters = 27 The average silhouette_score is : 0.31352127191491164




For n_clusters = 28 The average silhouette_score is : 0.31188382066667997




For n_clusters = 29 The average silhouette_score is : 0.3065801460480327
Working on : leo, 2019-03-28.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5634616425925951




For n_clusters = 3 The average silhouette_score is : 0.5236636698122008




For n_clusters = 4 The average silhouette_score is : 0.5203349956914852




For n_clusters = 5 The average silhouette_score is : 0.4914633214135697




For n_clusters = 6 The average silhouette_score is : 0.4623972530110335




For n_clusters = 7 The average silhouette_score is : 0.4449742830495565




For n_clusters = 8 The average silhouette_score is : 0.4167587607731406




For n_clusters = 9 The average silhouette_score is : 0.3942027060264091




For n_clusters = 10 The average silhouette_score is : 0.3774653770452928




For n_clusters = 11 The average silhouette_score is : 0.3606511601691365




For n_clusters = 12 The average silhouette_score is : 0.36989352137501286




For n_clusters = 13 The average silhouette_score is : 0.3604795080890571




For n_clusters = 14 The average silhouette_score is : 0.34827891116634985




For n_clusters = 15 The average silhouette_score is : 0.3530454178443469




For n_clusters = 16 The average silhouette_score is : 0.3449996335663034




For n_clusters = 17 The average silhouette_score is : 0.3310212230727586




For n_clusters = 18 The average silhouette_score is : 0.31918000625888904




For n_clusters = 19 The average silhouette_score is : 0.3221455501099101




For n_clusters = 20 The average silhouette_score is : 0.33680368324081006




For n_clusters = 21 The average silhouette_score is : 0.32465051208369783




For n_clusters = 22 The average silhouette_score is : 0.3214639502983794




For n_clusters = 23 The average silhouette_score is : 0.32910621330347223




For n_clusters = 24 The average silhouette_score is : 0.31975056128330903




For n_clusters = 25 The average silhouette_score is : 0.31963813504630434




For n_clusters = 26 The average silhouette_score is : 0.3153350415143114




For n_clusters = 27 The average silhouette_score is : 0.31784138336009476




For n_clusters = 28 The average silhouette_score is : 0.3142991975235358




For n_clusters = 29 The average silhouette_score is : 0.3112165656643442
Working on : leo, 2019-02-11.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5282994882885715




For n_clusters = 3 The average silhouette_score is : 0.5295847331979272




For n_clusters = 4 The average silhouette_score is : 0.47379611268791894




For n_clusters = 5 The average silhouette_score is : 0.4326705257010293




For n_clusters = 6 The average silhouette_score is : 0.4197133058559315




For n_clusters = 7 The average silhouette_score is : 0.4008491269016066




For n_clusters = 8 The average silhouette_score is : 0.3771677079137611




For n_clusters = 9 The average silhouette_score is : 0.35291870357111876




For n_clusters = 10 The average silhouette_score is : 0.3375654288873137




For n_clusters = 11 The average silhouette_score is : 0.3256210090536853




For n_clusters = 12 The average silhouette_score is : 0.33219359650764335




For n_clusters = 13 The average silhouette_score is : 0.3367969187752316




For n_clusters = 14 The average silhouette_score is : 0.33737725354846687




For n_clusters = 15 The average silhouette_score is : 0.3149574126829252




For n_clusters = 16 The average silhouette_score is : 0.31767854912575916




For n_clusters = 17 The average silhouette_score is : 0.3065665397669977




For n_clusters = 18 The average silhouette_score is : 0.31452087211398766




For n_clusters = 19 The average silhouette_score is : 0.3101889186280614




For n_clusters = 20 The average silhouette_score is : 0.31268153411467214




For n_clusters = 21 The average silhouette_score is : 0.30702775852162406




For n_clusters = 22 The average silhouette_score is : 0.30975068928433946




For n_clusters = 23 The average silhouette_score is : 0.3041787687958906




For n_clusters = 24 The average silhouette_score is : 0.29989827414339865




For n_clusters = 25 The average silhouette_score is : 0.301185682808451




For n_clusters = 26 The average silhouette_score is : 0.29293325977779505




For n_clusters = 27 The average silhouette_score is : 0.2934237427714375




For n_clusters = 28 The average silhouette_score is : 0.2977416853076181




For n_clusters = 29 The average silhouette_score is : 0.29457248708235884
Working on : leo, 2018-09-20.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5290878074320715




For n_clusters = 3 The average silhouette_score is : 0.5043780771951185




For n_clusters = 4 The average silhouette_score is : 0.48661424583677965




For n_clusters = 5 The average silhouette_score is : 0.4779911214169779




For n_clusters = 6 The average silhouette_score is : 0.4511960058429212




For n_clusters = 7 The average silhouette_score is : 0.42393376513659453




For n_clusters = 8 The average silhouette_score is : 0.39732880587067204




For n_clusters = 9 The average silhouette_score is : 0.37607787140511445




For n_clusters = 10 The average silhouette_score is : 0.3545358664422001




For n_clusters = 11 The average silhouette_score is : 0.33852366272477286




For n_clusters = 12 The average silhouette_score is : 0.3229471779694178




For n_clusters = 13 The average silhouette_score is : 0.32717829683310523




For n_clusters = 14 The average silhouette_score is : 0.32765356666247325




For n_clusters = 15 The average silhouette_score is : 0.3250691953501139




For n_clusters = 16 The average silhouette_score is : 0.3275686317177245




For n_clusters = 17 The average silhouette_score is : 0.32569298347331876




For n_clusters = 18 The average silhouette_score is : 0.3169563642540465




For n_clusters = 19 The average silhouette_score is : 0.32341944255684935




For n_clusters = 20 The average silhouette_score is : 0.3228541575082305




For n_clusters = 21 The average silhouette_score is : 0.3130903370833637




For n_clusters = 22 The average silhouette_score is : 0.31659527044291735




For n_clusters = 23 The average silhouette_score is : 0.31731461346821166




For n_clusters = 24 The average silhouette_score is : 0.31456631064485746




For n_clusters = 25 The average silhouette_score is : 0.302726759627373




For n_clusters = 26 The average silhouette_score is : 0.30018236756712247




For n_clusters = 27 The average silhouette_score is : 0.3096602531925565




For n_clusters = 28 The average silhouette_score is : 0.30378902276775266




For n_clusters = 29 The average silhouette_score is : 0.3062314982924048
Working on : leo, 2018-07-13.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5753864268697486




For n_clusters = 3 The average silhouette_score is : 0.5375197361463815




For n_clusters = 4 The average silhouette_score is : 0.5017786518848011




For n_clusters = 5 The average silhouette_score is : 0.4763433737371254




For n_clusters = 6 The average silhouette_score is : 0.4494917166323217




For n_clusters = 7 The average silhouette_score is : 0.4260131276274154




For n_clusters = 8 The average silhouette_score is : 0.40839500443489574




For n_clusters = 9 The average silhouette_score is : 0.3916889473245145




For n_clusters = 10 The average silhouette_score is : 0.39576221609634094




For n_clusters = 11 The average silhouette_score is : 0.38472330080779166




For n_clusters = 12 The average silhouette_score is : 0.39238748229375925




For n_clusters = 13 The average silhouette_score is : 0.393199195296985




For n_clusters = 14 The average silhouette_score is : 0.3802363882192906




For n_clusters = 15 The average silhouette_score is : 0.37495700925702363




For n_clusters = 16 The average silhouette_score is : 0.3780030571631571




For n_clusters = 17 The average silhouette_score is : 0.36845372501257684




For n_clusters = 18 The average silhouette_score is : 0.3714174340753664




For n_clusters = 19 The average silhouette_score is : 0.35969308369209935




For n_clusters = 20 The average silhouette_score is : 0.3608320765556475




For n_clusters = 21 The average silhouette_score is : 0.3462759582316819




For n_clusters = 22 The average silhouette_score is : 0.34718349870682197




For n_clusters = 23 The average silhouette_score is : 0.3384466996890852




For n_clusters = 24 The average silhouette_score is : 0.3309578697587057




For n_clusters = 25 The average silhouette_score is : 0.3460126222671057




For n_clusters = 26 The average silhouette_score is : 0.3321933742504139




For n_clusters = 27 The average silhouette_score is : 0.3328219463363413




For n_clusters = 28 The average silhouette_score is : 0.33246087294394544




For n_clusters = 29 The average silhouette_score is : 0.32819783249764006
Working on : leo, 2018-06-06.


  0%|          | 0/28 [00:00<?, ?it/s]



For n_clusters = 2 The average silhouette_score is : 0.5008678386014929




For n_clusters = 3 The average silhouette_score is : 0.5176453893418258




For n_clusters = 4 The average silhouette_score is : 0.48915366432594615




For n_clusters = 5 The average silhouette_score is : 0.46995713766600206




For n_clusters = 6 The average silhouette_score is : 0.4483522398737253




For n_clusters = 7 The average silhouette_score is : 0.4195879451291704




For n_clusters = 8 The average silhouette_score is : 0.40021927696111975




For n_clusters = 9 The average silhouette_score is : 0.38270233817458005




For n_clusters = 10 The average silhouette_score is : 0.3720573840083791




For n_clusters = 11 The average silhouette_score is : 0.3787622560915184




For n_clusters = 12 The average silhouette_score is : 0.35882123700197605




For n_clusters = 13 The average silhouette_score is : 0.3473311880751708




For n_clusters = 14 The average silhouette_score is : 0.373423481407436




For n_clusters = 15 The average silhouette_score is : 0.3581056860741961




For n_clusters = 16 The average silhouette_score is : 0.36718095623730623




For n_clusters = 17 The average silhouette_score is : 0.35299848526882444




For n_clusters = 18 The average silhouette_score is : 0.344531953822663




For n_clusters = 19 The average silhouette_score is : 0.34947215865786446




For n_clusters = 20 The average silhouette_score is : 0.3381533889905088




For n_clusters = 21 The average silhouette_score is : 0.3273988286615219




For n_clusters = 22 The average silhouette_score is : 0.3326053650447848




For n_clusters = 23 The average silhouette_score is : 0.3335733836359078




For n_clusters = 24 The average silhouette_score is : 0.32235107406792446




For n_clusters = 25 The average silhouette_score is : 0.3227988561097893




For n_clusters = 26 The average silhouette_score is : 0.32268035694219194




For n_clusters = 27 The average silhouette_score is : 0.31950903037311534




For n_clusters = 28 The average silhouette_score is : 0.31749012568521073




For n_clusters = 29 The average silhouette_score is : 0.31870873417125234
Wall time: 23min 10s


##  Sub-optimal k

Find sub-optimal k by searching inflexion points where an additional cluster do not considerably degrade the overall clustering performance.

In [14]:
opt_k=get_opt_k(sil_df, sigma=0 )
opt_k

{'leo_2018-06-06': 10,
 'leo_2018-07-13': 9,
 'leo_2018-09-20': 12,
 'leo_2019-02-11': 11,
 'leo_2019-03-28': 11,
 'leo_2019-07-31': 10,
 'mar_2018-06-01': 6,
 'mar_2018-06-21': 7,
 'mar_2018-07-27': 7,
 'mar_2018-09-25': 8,
 'mar_2018-11-13': 7,
 'mar_2018-12-11': 7,
 'mar_2019-02-05': 5,
 'mar_2019-03-13': 5,
 'mar_2019-05-16': 3}

If we are not satisfied with the sub-optimal k returned by the algorithm, we can manually specify each survey k
by defining a dictionary.

In [12]:
# Based on our observations on a dataset comprising 87 surveys, 10 clusters (k=10) is generally a good tradeoff.

opt_k={'leo_2018-06-06': 10,
 'leo_2018-07-13': 10,
 'leo_2018-09-20': 10,
 'leo_2019-02-11': 10,
 'leo_2019-03-28': 10,
 'leo_2019-07-31': 10,
 'mar_2018-06-01': 10,
 'mar_2018-06-21': 10,
 'mar_2018-07-27': 10,
 'mar_2018-09-25': 10,
 'mar_2018-11-13': 10,
 'mar_2018-12-11': 10,
 'mar_2019-02-05': 10,
 'mar_2019-03-13': 10,
 'mar_2019-05-16': 10}

or, update one value only. For instance, in mar_2019-05-16 dataset, it is unlikely that 3 clusters are enough.<br>
So, we replace only that value with 10.


In [15]:
opt_k['mar_2019-05-16']=10
opt_k

{'leo_2018-06-06': 10,
 'leo_2018-07-13': 9,
 'leo_2018-09-20': 12,
 'leo_2019-02-11': 11,
 'leo_2019-03-28': 11,
 'leo_2019-07-31': 10,
 'mar_2018-06-01': 6,
 'mar_2018-06-21': 7,
 'mar_2018-07-27': 7,
 'mar_2018-09-25': 8,
 'mar_2018-11-13': 7,
 'mar_2018-12-11': 7,
 'mar_2019-02-05': 5,
 'mar_2019-03-13': 5,
 'mar_2019-05-16': 10}

## Optimised K-Means clustering

With the sub-optimal k dictionary and keeping the same feature set, we finally cluster the dataset.

In [19]:
feature_set=["band1","band2","band3"]
data_classified=kmeans_sa(data_merged,opt_k, feature_set=feature_set)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_merged.dropna(inplace=True)


  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_in["label_k"] = clusterer.fit_predict(minmax_scaled_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_in["label_k"] = clusterer.fit_predict(minmax_scaled_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_in["label_k"] = clusterer.fit_predict(minmax_scaled_df)
A value is trying to b

  0%|          | 0/6 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_in["label_k"] = clusterer.fit_predict(minmax_scaled_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_in["label_k"] = clusterer.fit_predict(minmax_scaled_df)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_in["label_k"] = clusterer.fit_predict(minmax_scaled_df)
A value is trying to b

In [20]:
data_classified=pd.merge(data_classified[["point_id","label_k"]],data_merged, how="left", on="point_id", validate="one_to_one")
data_classified

Unnamed: 0,point_id,label_k,distance,z,tr_id,raw_date,coordinates,location,survey_date,x,y,geometry,band1,band2,band3,slope,curve
0,67143080l2610320eo00,3,0.2,1.105616,47,20180606,POINT (299873.4167173313 5773731.881880409),leo,2018-06-06,299873.4167173313,5773731.881880409,POINT (299873.417 5773731.882),141.0,142.0,132.0,-0.006003,0.002122
1,67142080l2670630eo00,3,0.3,1.101189,47,20180606,POINT (299873.516093276 5773731.893034852),leo,2018-06-06,299873.51609327603,5773731.893034852,POINT (299873.516 5773731.893),148.0,148.0,143.0,-0.003264,0.001769
2,67142080l2600940eo00,3,0.4,1.099089,47,20180606,POINT (299873.6154692209 5773731.904189295),leo,2018-06-06,299873.61546922085,5773731.904189295,POINT (299873.615 5773731.904),140.0,142.0,129.0,-0.002465,0.001138
3,67146080l2650750eo00,6,0.5,1.096259,47,20180606,POINT (299873.7148451657 5773731.915343738),leo,2018-06-06,299873.71484516567,5773731.915343738,POINT (299873.715 5773731.915),162.0,165.0,155.0,-0.000988,0.001301
4,67141080l2600560eo00,3,0.6,1.097113,47,20180606,POINT (299873.8142211105 5773731.92649818),leo,2018-06-06,299873.8142211105,5773731.92649818,POINT (299873.814 5773731.926),152.0,154.0,137.0,0.000136,0.001117
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
185180,60108091m2528500ar21,9,28.1,1.753726,0,20190516,POINT (731474.0709976825 5705142.514267173),mar,2019-05-16,731474.0709976825,5705142.514267173,POINT (731474.071 5705142.514),198.0,190.0,164.0,-0.007479,0.001382
185181,60103091m2518200ar22,4,28.2,1.748035,0,20190516,POINT (731474.1704055312 5705142.503400728),mar,2019-05-16,731474.1704055312,5705142.503400728,POINT (731474.170 5705142.503),196.0,187.0,161.0,-0.006537,0.000006
185182,60107091m2598900ar23,9,28.3,1.740652,0,20190516,POINT (731474.2698133799 5705142.492534284),mar,2019-05-16,731474.2698133799,5705142.492534284,POINT (731474.270 5705142.493),200.0,192.0,165.0,-0.007468,-0.000615
185183,60102091m2588500ar24,9,28.4,1.733099,0,20190516,POINT (731474.3692212285 5705142.481667838),mar,2019-05-16,731474.3692212285,5705142.481667838,POINT (731474.369 5705142.482),200.0,191.0,164.0,-0.007767,-0.000747


### GOOD!

save the __data_classified__ dataframe as a CSV file and head to the __Example_3_Labels_correction_and_multitemporal_table notebook__.

In [21]:
data_classified.to_csv(r"C:\my_packages\doc_data\labels\data_classified.csv", index=False)

___