# Clustering with stat_tool   
## This notebook illustates how to perform clustering of vectors with stat_tool.cluster ##

Load a data set (chene_sessile.vec)    

In [1]:
from openalea.stat_tool import get_shared_data
from openalea.stat_tool.data_transform import SelectVariable
from openalea.stat_tool.comparison import Compare
from openalea.stat_tool.cluster import Clustering

from openalea.stat_tool.vectors import VectorDistance, Vectors

vec = Vectors(get_shared_data("chene_sessile.vec"))

Running cmake --build & --install in /home/jdurand/devlp/Git/openalea/StructureAnalysis/stat_tool/build


In [2]:
vec.nb_vector

138

In [3]:
vec.nb_variable

6

vec contains 138 vectors in dimension 6

Discard variables 1, 3 and 6 in vec

In [4]:
from openalea.stat_tool.data_transform import SelectVariable
vec2 = SelectVariable(vec, [1, 3, 6], Mode="Reject")

Computation of a 138x138 distance matrix using a standardization procedure   
Arguments "N" specify that vector components are Numerical

In [5]:
matrix = Compare(vec2, VectorDistance("N", "N", "N"))

In [6]:
matrix.nb_column, matrix.nb_row

(138, 138)

## Clustering using a partitioning method

In [7]:
clust1 = Clustering(matrix, "Partition", 2)
print(clust1)

cluster 1 (69 vectors): 69, 48, 41, 44, 32, 47, 81, 95, 11, 36, 75, 108, 56, 83, 38, 98, 113, 134, 110, 101, 77, 35, 74, 80, 50, 24, 89, 128, 5, 45, 8, 116, 119, 132, 61, 78, 53, 29, 131, 65, 90, 96, 104, 20, 86, 66, 42, 68, 125, 14, 23, 54, 33, 26, 71, 129, 102, 51, 70, 111, 138, 19, 127, 62, 117, 137, 2, 28, 17
cluster 2 (69 vectors): 100, 13, 133, 105, 72, 9, 93, 109, 30, 115, 63, 7, 55, 37, 15, 114, 106, 46, 73, 18, 3, 87, 58, 43, 60, 76, 52, 6, 39, 31, 12, 99, 121, 123, 22, 79, 94, 88, 21, 97, 25, 40, 57, 136, 67, 49, 10, 4, 120, 92, 27, 91, 64, 124, 16, 130, 84, 107, 126, 103, 122, 112, 59, 1, 82, 34, 135, 118, 85

cluster distance matrix

          | cluster 1 | cluster 2
cluster 1   0.444986   1.46646
cluster 2    1.46646  0.608382

          | within-cluster distance | between-cluster distance | diameter | separation
cluster 1   0.444986   1.46646    1.26444  0.0656281
cluster 2   0.608382   1.46646    2.02706  0.0656281

cluster 1: non-isolated
cluster 2: non-isolated



Number of clusters

In [8]:
nb_clusters = clust1.get_nb_cluster()
print(nb_clusters)

2


Cluster of individual ("pattern") 2

In [9]:
clust1.get_assignment(2)

1

To get the partition:

In [10]:
part1 = [[i for i in range(1,vec.nb_vector+1) if clust1.get_assignment(i) == c] for c in range(1,nb_clusters+1)]
print(part1)

[[2, 5, 8, 11, 14, 17, 19, 20, 23, 24, 26, 28, 29, 32, 33, 35, 36, 38, 41, 42, 44, 45, 47, 48, 50, 51, 53, 54, 56, 61, 62, 65, 66, 68, 69, 70, 71, 74, 75, 77, 78, 80, 81, 83, 86, 89, 90, 95, 96, 98, 101, 102, 104, 108, 110, 111, 113, 116, 117, 119, 125, 127, 128, 129, 131, 132, 134, 137, 138], [1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16, 18, 21, 22, 25, 27, 30, 31, 34, 37, 39, 40, 43, 46, 49, 52, 55, 57, 58, 59, 60, 63, 64, 67, 72, 73, 76, 79, 82, 84, 85, 87, 88, 91, 92, 93, 94, 97, 99, 100, 103, 105, 106, 107, 109, 112, 114, 115, 118, 120, 121, 122, 123, 124, 126, 130, 133, 135, 136]]


Recreate clusters from partition

In [11]:
print(matrix.partitioning_clusters(part1))

cluster 1 (69 vectors): 69, 48, 41, 44, 32, 47, 81, 95, 11, 36, 75, 108, 56, 83, 38, 98, 113, 134, 110, 101, 77, 35, 74, 80, 50, 24, 89, 128, 5, 45, 8, 116, 119, 132, 61, 78, 53, 29, 131, 65, 90, 96, 104, 20, 86, 66, 42, 68, 125, 14, 23, 54, 33, 26, 71, 129, 102, 51, 70, 111, 138, 19, 127, 62, 117, 137, 2, 28, 17
cluster 2 (69 vectors): 100, 13, 133, 105, 72, 9, 93, 109, 30, 115, 63, 7, 55, 37, 15, 114, 106, 46, 73, 18, 3, 87, 58, 43, 60, 76, 52, 6, 39, 31, 12, 99, 121, 123, 22, 79, 94, 88, 21, 97, 25, 40, 57, 136, 67, 49, 10, 4, 120, 92, 27, 91, 64, 124, 16, 130, 84, 107, 126, 103, 122, 112, 59, 1, 82, 34, 135, 118, 85

cluster distance matrix

          | cluster 1 | cluster 2
cluster 1   0.444986   1.46646
cluster 2    1.46646  0.608382

          | within-cluster distance | between-cluster distance | diameter | separation
cluster 1   0.444986   1.46646    1.26444  0.0656281
cluster 2   0.608382   1.46646    2.02706  0.0656281

cluster 1: non-isolated
cluster 2: non-isolated



## Hierarchical clustering and dendrogram using an agglomerative algorithm

In [12]:
clust2 = Clustering(matrix, "Hierarchy", "Agglomerative")
print(clust2)

138 vectors

        | child cluster distance | within-cluster distance | between-cluster distance | diameter | separation | composition
step   1          0          0   0.845491          0  0.0563999   7, 55
step   2          0          0   0.883565          0  0.0345239   24, 89
step   3          0          0   0.911651          0  0.0345239   41, 44
step   4  0.0126479  0.0126479    1.18625  0.0126479  0.0782759   21, 94
step   5  0.0126479  0.0126479   0.760368  0.0126479   0.021876   27, 120
step   6  0.0126479  0.0126479   0.913273  0.0126479  0.0345239   81, 95
step   7  0.0126479  0.0126479   0.762643  0.0126479  0.0345239   103, 122
step   8  0.0126479  0.0126479     1.0078  0.0126479  0.0345239   113, 134
step   9   0.021876   0.021876    1.08646   0.021876  0.0252957   8, 61
step  10   0.021876   0.021876   0.967306   0.021876  0.0345239   11, 38
step  11     0.0282  0.0230159   0.765583  0.0345239  0.0252957   27, 120, 91
step  12   0.021876   0.021876   0.868665   0.021876

## Hierarchical clustering and dendrogram using a divise algorithm

In [13]:
clust3 = Clustering(matrix, "Hierarchy", "Divisive")
print(clust3)

138 vectors

        | child cluster distance | within-cluster distance | between-cluster distance | diameter | separation | composition
step   1          0          0   0.845491          0  0.0563999   7, 55
step   2          0          0   0.883565          0  0.0345239   24, 89
step   3          0          0   0.911651          0  0.0345239   41, 44
step   4  0.0126479  0.0126479    1.18625  0.0126479  0.0782759   21, 94
step   5  0.0126479  0.0126479   0.760368  0.0126479   0.021876   27, 120
step   6  0.0126479  0.0126479   0.913273  0.0126479  0.0345239   81, 95
step   7  0.0126479  0.0126479   0.762643  0.0126479  0.0345239   103, 122
step   8  0.0126479  0.0126479     1.0078  0.0126479  0.0345239   113, 134
step   9   0.021876   0.021876    1.08646   0.021876  0.0252957   8, 61
step  10   0.021876   0.021876   0.967306   0.021876  0.0345239   11, 38
step  11     0.0282  0.0230159   0.765583  0.0345239  0.0252957   27, 120, 91
step  12   0.021876   0.021876   0.868665   0.021876

In [14]:
matrix.nb_row

138