# <a id="inicio"></a> Exercícios Métricas de avaliação e seleção de hiper parâmetro

-----

### **Autor:** Glauco Lauria Marques Filho

-----

# <a id="resumo"></a> Resumo 

#### Este arquivo contém a resolução dos exercícios da Aula 9 do curso CEDS-808: Aprendizado de Máquina. 

# <a id="sumario"></a> Sumário


* [Início](#inicio)
* [Resumo](#resumo)
* [Sumário](#sumario)
* [Importação de Requisitos](#requisitos)

- 1.a [Dataset Aggregation](#dataset_1_a)
- 1.a.I [Dataset Aggregation - K-means](#dataset_1_a_I)
- 1.a.II [Dataset Aggregation - Average Linkage](#dataset_1_a_II)
- 1.a.III [Dataset Aggregation - DBSCAN](#dataset_1_a_III)

<br>

- 1.b [Dataset D31](#dataset_1_b)
- 1.b.I [Dataset D31 - K-means](#dataset_1_b_I)
- 1.b.II [Dataset D31 - Average Linkage](#dataset_1_b_II)
- 1.b.III [Dataset D31 - DBSCAN](#dataset_1_b_III)

<br>

- 1.c [Dataset Pathbased](#dataset_1_c)
- 1.c.I [Dataset Pathbased - K-means](#dataset_1_c_I)
- 1.c.II [Dataset Pathbased - Average Linkage](#dataset_1_c_II)
- 1.c.III [Dataset Pathbased - DBSCAN](#dataset_1_c_III)

<br>

- 1.d [Dataset Flame](#dataset_1_d)
- 1.d.I [Dataset Flame - K-means](#dataset_1_d_I)
- 1.d.II [Dataset Flame - Average Linkage](#dataset_1_d_II)
- 1.d.III [Dataset Flame - DBSCAN](#dataset_1_d_III)

# <a id="requisitos"></a> Importação de Requisitos

In [69]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.metrics import adjusted_rand_score, jaccard_score, fowlkes_mallows_score
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score

In [70]:
def dunn_index(X, labels, cluster_centers):
    max_intra_cluster_distances = np.zeros(len(np.unique(labels)))
    for cluster_label in np.unique(labels):
        cluster_points = X[labels == cluster_label]
        max_intra_cluster_distances[cluster_label] = np.max(pairwise_distances(cluster_points))

    inter_cluster_distances = pairwise_distances(cluster_centers)
    min_inter_cluster_distance = np.min(inter_cluster_distances[np.nonzero(inter_cluster_distances)])

    return min_inter_cluster_distance / np.max(max_intra_cluster_distances)

# <a id="dataset_1"></a> 1.a. Dataset Aggregation

In [71]:
df = pd.read_csv("aggregation.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_a_I"></a> 1.a.I. Dataset Aggregation - K-Means

In [72]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.4560
Dunn Score = 0.6354
Calinski-Harabasz Score = 704.6910
Davies-Bouldin Score = 0.8936
Jaccard Score = 0.0057
Rand Score = 0.3453
Folkes e Mallows Score = 0.6108


K-Means com k=3
Silouette Score = 0.5234
Dunn Score = 0.6553
Calinski-Harabasz Score = 1053.6642
Davies-Bouldin Score = 0.6669
Jaccard Score = 0.1807
Rand Score = 0.6744
Folkes e Mallows Score = 0.7829


K-Means com k=4
Silouette Score = 0.5221
Dunn Score = 0.6912
Calinski-Harabasz Score = 1169.3026
Davies-Bouldin Score = 0.6149
Jaccard Score = 0.0034
Rand Score = 0.7562
Folkes e Mallows Score = 0.8259


K-Means com k=5
Silouette Score = 0.4987
Dunn Score = 0.5457
Calinski-Harabasz Score = 1326.0476
Davies-Bouldin Score = 0.7042
Jaccard Score = 0.2576
Rand Score = 0.7350
Folkes e Mallows Score = 0.7914


K-Means com k=6
Silouette Score = 0.4665
Dunn Score = 0.3972
Calinski-Harabasz Score = 1178.2734
Davies-Bouldin Score = 0.7971
Jaccard Score = 0.2548
Rand Score = 0.6917
Folkes e Mallow

In [73]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5234 - Best Model: K-Means com k=3
Dunn Score = 0.6912 - Best Model: K-Means com k=4
Calinski-Harabasz Score = 1424.3189 - Best Model: K-Means com k=10
Davies-Bouldin Score = 0.8936 - Best Model: K-Means com k=2
Jaccard Score = 0.2576 - Best Model: K-Means com k=5
Rand Score = 0.7633 - Best Model: K-Means com k=7
Folkes e Mallows Score = 0.8259 - Best Model: K-Means com k=4


# <a id="dataset_1_a_II"></a> 1.a.II. Dataset Aggregation - Average Linkage

In [74]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.4533
Calinski-Harabasz Score = 692.9809
Davies-Bouldin Score = 0.9182
Jaccard Score = 0.0093
Rand Score = 0.3768
Folkes e Mallows Score = 0.6312


Average Linkage com k=3
Silouette Score = 0.5124
Calinski-Harabasz Score = 988.5203
Davies-Bouldin Score = 0.6735
Jaccard Score = 0.2157
Rand Score = 0.6655
Folkes e Mallows Score = 0.7793


Average Linkage com k=4
Silouette Score = 0.5220
Calinski-Harabasz Score = 1141.6939
Davies-Bouldin Score = 0.5968
Jaccard Score = 0.3599
Rand Score = 0.7864
Folkes e Mallows Score = 0.8510


Average Linkage com k=5
Silouette Score = 0.5008
Calinski-Harabasz Score = 1208.0022
Davies-Bouldin Score = 0.6824
Jaccard Score = 0.6916
Rand Score = 0.9358
Folkes e Mallows Score = 0.9516


Average Linkage com k=6
Silouette Score = 0.5124
Calinski-Harabasz Score = 1306.1808
Davies-Bouldin Score = 0.6132
Jaccard Score = 0.3452
Rand Score = 0.9891
Folkes e Mallows Score = 0.9915


Average Linkage com k=7
Silouette Score = 

In [75]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5220 - Best Model: Average Linkage com k=4
Calinski-Harabasz Score = 1306.1808 - Best Model: Average Linkage com k=6
Davies-Bouldin Score = 0.9182 - Best Model: Average Linkage com k=2
Jaccard Score = 0.6916 - Best Model: Average Linkage com k=5
Rand Score = 1.0000 - Best Model: Average Linkage com k=7
Folkes e Mallows Score = 1.0000 - Best Model: Average Linkage com k=7


# <a id="dataset_1_a_III"></a> 1.a.III Dataset Aggregation - DBSCAN

In [76]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = [0.3, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=0.3 e min_samples=8
Silouette Score = 0.4491
Calinski-Harabasz Score = 1288.1706
Davies-Bouldin Score = 0.7227
Jaccard Score = 0.1294
Rand Score = 0.7126
Folkes e Mallows Score = 0.7828


DBSCAN com eps=0.5 e min_samples=8
Silouette Score = 0.4491
Calinski-Harabasz Score = 1288.1706
Davies-Bouldin Score = 0.7227
Jaccard Score = 0.1294
Rand Score = 0.7126
Folkes e Mallows Score = 0.7828


DBSCAN com eps=1.0 e min_samples=8
Silouette Score = -0.3123
Calinski-Harabasz Score = 22.0591
Davies-Bouldin Score = 1.1601
Jaccard Score = 0.0305
Rand Score = 0.0362
Folkes e Mallows Score = 0.3230


DBSCAN com eps=1.5 e min_samples=8
Silouette Score = 0.4385
Calinski-Harabasz Score = 1052.3234
Davies-Bouldin Score = 0.6019
Jaccard Score = 0.1270
Rand Score = 0.9850
Folkes e Mallows Score = 0.9882


DBSCAN com eps=2.0 e min_samples=8
Silouette Score = 0.4120
Calinski-Harabasz Score = 754.2002
Davies-Bouldin Score = 0.6249
Jaccard Score = 0.0000
Rand Score = 0.8089
Folkes e Mallows Scor

In [77]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.4695 - Best Model: DBSCAN com eps=3.0 e min_samples=8
Calinski-Harabasz Score = 1052.3234 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Davies-Bouldin Score = 1.1601 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Jaccard Score = 0.1270 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Rand Score = 0.9850 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Folkes e Mallows Score = 0.9882 - Best Model: DBSCAN com eps=1.5 e min_samples=8


---

# <a id="dataset_1_b"></a> 1.b. Dataset D31

In [78]:
df = pd.read_csv("d31.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_b_I"></a> 1.b.I. Dataset D31 - K-Means

In [79]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 34):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.3943
Dunn Score = 0.4926
Calinski-Harabasz Score = 2292.8938
Davies-Bouldin Score = 1.0661
Jaccard Score = 0.0021
Rand Score = 0.0593
Folkes e Mallows Score = 0.2437


K-Means com k=3
Silouette Score = 0.4207
Dunn Score = 0.5632
Calinski-Harabasz Score = 2890.5771
Davies-Bouldin Score = 0.7954
Jaccard Score = 0.0031
Rand Score = 0.1132
Folkes e Mallows Score = 0.2956


K-Means com k=4
Silouette Score = 0.4286
Dunn Score = 0.6614
Calinski-Harabasz Score = 3370.7142
Davies-Bouldin Score = 0.7911
Jaccard Score = 0.0061
Rand Score = 0.1640
Folkes e Mallows Score = 0.3345




K-Means com k=5
Silouette Score = 0.4120
Dunn Score = 0.4920
Calinski-Harabasz Score = 3217.5187
Davies-Bouldin Score = 0.8217
Jaccard Score = 0.0067
Rand Score = 0.2052
Folkes e Mallows Score = 0.3681


K-Means com k=6
Silouette Score = 0.4217
Dunn Score = 0.5084
Calinski-Harabasz Score = 3420.6234
Davies-Bouldin Score = 0.8114
Jaccard Score = 0.0067
Rand Score = 0.2524
Folkes e Mallows Score = 0.4058


K-Means com k=7
Silouette Score = 0.4246
Dunn Score = 0.6224
Calinski-Harabasz Score = 3675.6683
Davies-Bouldin Score = 0.8305
Jaccard Score = 0.0000
Rand Score = 0.3049
Folkes e Mallows Score = 0.4425


K-Means com k=8
Silouette Score = 0.4201
Dunn Score = 0.5638
Calinski-Harabasz Score = 3564.6539
Davies-Bouldin Score = 0.8033
Jaccard Score = 0.0000
Rand Score = 0.3418
Folkes e Mallows Score = 0.4693


K-Means com k=9
Silouette Score = 0.4512
Dunn Score = 0.5590
Calinski-Harabasz Score = 3742.6215
Davies-Bouldin Score = 0.7377
Jaccard Score = 0.0152
Rand Score = 0.3546
Folkes e Mallo

In [80]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5634 - Best Model: K-Means com k=32
Dunn Score = 0.6614 - Best Model: K-Means com k=4
Calinski-Harabasz Score = 8984.9337 - Best Model: K-Means com k=32
Davies-Bouldin Score = 1.0661 - Best Model: K-Means com k=2
Jaccard Score = 0.0610 - Best Model: K-Means com k=24
Rand Score = 0.9456 - Best Model: K-Means com k=32
Folkes e Mallows Score = 0.9474 - Best Model: K-Means com k=32


# <a id="dataset_1_b_II"></a> 1.b.II. Dataset D31 - Average Linkage

In [81]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 34):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.3737
Calinski-Harabasz Score = 1985.2568
Davies-Bouldin Score = 1.1500
Jaccard Score = 0.0000
Rand Score = 0.0637
Folkes e Mallows Score = 0.2525


Average Linkage com k=3
Silouette Score = 0.3911
Calinski-Harabasz Score = 2403.6033
Davies-Bouldin Score = 0.8052
Jaccard Score = 0.0036
Rand Score = 0.0987
Folkes e Mallows Score = 0.2866


Average Linkage com k=4
Silouette Score = 0.3855
Calinski-Harabasz Score = 2734.8714
Davies-Bouldin Score = 0.8793
Jaccard Score = 0.0089
Rand Score = 0.1688
Folkes e Mallows Score = 0.3476


Average Linkage com k=5
Silouette Score = 0.3658
Calinski-Harabasz Score = 2350.4588
Davies-Bouldin Score = 0.8332
Jaccard Score = 0.0000
Rand Score = 0.1848
Folkes e Mallows Score = 0.3607


Average Linkage com k=6
Silouette Score = 0.3614
Calinski-Harabasz Score = 2557.7393
Davies-Bouldin Score = 0.8479
Jaccard Score = 0.0000
Rand Score = 0.2285
Folkes e Mallows Score = 0.3951


Average Linkage com k=7
Silouette Score 

In [82]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5666 - Best Model: Average Linkage com k=32
Calinski-Harabasz Score = 8683.2619 - Best Model: Average Linkage com k=32
Davies-Bouldin Score = 1.1500 - Best Model: Average Linkage com k=2
Jaccard Score = 0.1064 - Best Model: Average Linkage com k=29
Rand Score = 0.9307 - Best Model: Average Linkage com k=32
Folkes e Mallows Score = 0.9329 - Best Model: Average Linkage com k=32


# <a id="dataset_1_b_III"></a> 1.b.III Dataset D31 - DBSCAN

In [83]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = list(range(1,10,1))

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=1 e min_samples=8
Silouette Score = 0.0808
Calinski-Harabasz Score = 693.4219
Davies-Bouldin Score = 9.4449
Jaccard Score = 0.0000
Rand Score = 0.1378
Folkes e Mallows Score = 0.3206


DBSCAN com eps=2 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=3 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=4 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=5 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846




DBSCAN com eps=6 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=7 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=8 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=9 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846




In [84]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.2304 - Best Model: DBSCAN com eps=2 e min_samples=8
Calinski-Harabasz Score = 693.4219 - Best Model: DBSCAN com eps=1 e min_samples=8
Davies-Bouldin Score = 9.4449 - Best Model: DBSCAN com eps=1 e min_samples=8
Jaccard Score = 0.0000 - Best Model: 
Rand Score = 0.1378 - Best Model: DBSCAN com eps=1 e min_samples=8
Folkes e Mallows Score = 0.3206 - Best Model: DBSCAN com eps=1 e min_samples=8


---

# <a id="dataset_1_c"></a> 1.c. Dataset Pathbased

In [85]:
df = pd.read_csv("pathbased.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_c_I"></a> 1.c.I. Dataset Pathbased - K-Means

In [86]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.5150
Dunn Score = 0.5330
Calinski-Harabasz Score = 371.0553
Davies-Bouldin Score = 0.7523
Jaccard Score = 0.0939
Rand Score = 0.3990
Folkes e Mallows Score = 0.6519


K-Means com k=3
Silouette Score = 0.5406
Dunn Score = 0.6796
Calinski-Harabasz Score = 357.8003
Davies-Bouldin Score = 0.6868
Jaccard Score = 0.0655
Rand Score = 0.4642
Folkes e Mallows Score = 0.6629


K-Means com k=4
Silouette Score = 0.4368
Dunn Score = 0.3696
Calinski-Harabasz Score = 318.8801
Davies-Bouldin Score = 0.8711
Jaccard Score = 0.2047
Rand Score = 0.3816
Folkes e Mallows Score = 0.5801


K-Means com k=5
Silouette Score = 0.3640
Dunn Score = 0.4370
Calinski-Harabasz Score = 298.7949
Davies-Bouldin Score = 0.9606
Jaccard Score = 0.0632
Rand Score = 0.4043
Folkes e Mallows Score = 0.5802


K-Means com k=6
Silouette Score = 0.3871
Dunn Score = 0.4179
Calinski-Harabasz Score = 298.3390
Davies-Bouldin Score = 0.8911
Jaccard Score = 0.0842
Rand Score = 0.5308
Folkes e Mallows Sc

In [87]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5406 - Best Model: K-Means com k=3
Dunn Score = 0.6796 - Best Model: K-Means com k=3
Calinski-Harabasz Score = 371.0553 - Best Model: K-Means com k=2
Davies-Bouldin Score = 0.9606 - Best Model: K-Means com k=5
Jaccard Score = 0.5264 - Best Model: K-Means com k=8
Rand Score = 0.5308 - Best Model: K-Means com k=6
Folkes e Mallows Score = 0.6695 - Best Model: K-Means com k=6


# <a id="dataset_1_c_II"></a> 1.c.II. Dataset Pathbased - Average Linkage

In [88]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.4911
Calinski-Harabasz Score = 316.9184
Davies-Bouldin Score = 0.8007
Jaccard Score = 0.0809
Rand Score = 0.3947
Folkes e Mallows Score = 0.6501


Average Linkage com k=3
Silouette Score = 0.5386
Calinski-Harabasz Score = 353.9570
Davies-Bouldin Score = 0.6309
Jaccard Score = 0.3352
Rand Score = 0.4436
Folkes e Mallows Score = 0.6526


Average Linkage com k=4
Silouette Score = 0.4756
Calinski-Harabasz Score = 286.4206
Davies-Bouldin Score = 0.6237
Jaccard Score = 0.2963
Rand Score = 0.4629
Folkes e Mallows Score = 0.6555


Average Linkage com k=5
Silouette Score = 0.4227
Calinski-Harabasz Score = 241.5164
Davies-Bouldin Score = 0.5724
Jaccard Score = 0.3018
Rand Score = 0.4515
Folkes e Mallows Score = 0.6468


Average Linkage com k=6
Silouette Score = 0.3986
Calinski-Harabasz Score = 232.0410
Davies-Bouldin Score = 0.6029
Jaccard Score = 0.0682
Rand Score = 0.4851
Folkes e Mallows Score = 0.6594


Average Linkage com k=7
Silouette Score = 0.3

In [89]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5386 - Best Model: Average Linkage com k=3
Calinski-Harabasz Score = 353.9570 - Best Model: Average Linkage com k=3
Davies-Bouldin Score = 0.8007 - Best Model: Average Linkage com k=2
Jaccard Score = 0.3352 - Best Model: Average Linkage com k=3
Rand Score = 0.6691 - Best Model: Average Linkage com k=9
Folkes e Mallows Score = 0.7746 - Best Model: Average Linkage com k=9


# <a id="dataset_1_c_III"></a> 1.c.III Dataset Pathbased - DBSCAN

In [90]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = [1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3]

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=1 e min_samples=8
Silouette Score = -0.3911
Calinski-Harabasz Score = 7.4666
Davies-Bouldin Score = 1.5807
Jaccard Score = 0.0433
Rand Score = 0.0423
Folkes e Mallows Score = 0.5228


DBSCAN com eps=1.25 e min_samples=8
Silouette Score = -0.0767
Calinski-Harabasz Score = 21.5301
Davies-Bouldin Score = 1.9202
Jaccard Score = 0.2500
Rand Score = 0.3767
Folkes e Mallows Score = 0.6226


DBSCAN com eps=1.5 e min_samples=8
Silouette Score = 0.1341
Calinski-Harabasz Score = 38.6580
Davies-Bouldin Score = 2.2988
Jaccard Score = 0.0000
Rand Score = 0.6311
Folkes e Mallows Score = 0.7619


DBSCAN com eps=1.75 e min_samples=8
Silouette Score = 0.1636
Calinski-Harabasz Score = 40.9267
Davies-Bouldin Score = 2.0803
Jaccard Score = 0.3038
Rand Score = 0.7854
Folkes e Mallows Score = 0.8556


DBSCAN com eps=2 e min_samples=8
Silouette Score = 0.1682
Calinski-Harabasz Score = 47.8750
Davies-Bouldin Score = 1.6406
Jaccard Score = 0.3370
Rand Score = 0.7873
Folkes e Mallows Score = 0.855

In [91]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.2948 - Best Model: DBSCAN com eps=2.25 e min_samples=8
Calinski-Harabasz Score = 75.4146 - Best Model: DBSCAN com eps=2.25 e min_samples=8
Davies-Bouldin Score = 4.5474 - Best Model: DBSCAN com eps=2.5 e min_samples=8
Jaccard Score = 0.3551 - Best Model: DBSCAN com eps=2.25 e min_samples=8
Rand Score = 0.7873 - Best Model: DBSCAN com eps=2 e min_samples=8
Folkes e Mallows Score = 0.8556 - Best Model: DBSCAN com eps=1.75 e min_samples=8


---

# <a id="dataset_1_d"></a> 1.d. Dataset Flame

In [92]:
df = pd.read_csv("flame.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_d_I"></a> 1.d.I. Dataset Flame - K-Means

In [93]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.3768
Dunn Score = 0.4180
Calinski-Harabasz Score = 154.2215
Davies-Bouldin Score = 1.1219
Jaccard Score = 0.2559
Rand Score = 0.4998
Folkes e Mallows Score = 0.7586


K-Means com k=3
Silouette Score = 0.4106
Dunn Score = 0.5508
Calinski-Harabasz Score = 200.5064
Davies-Bouldin Score = 0.8303
Jaccard Score = 0.0123
Rand Score = 0.5261
Folkes e Mallows Score = 0.7415


K-Means com k=4
Silouette Score = 0.4427
Dunn Score = 0.5951
Calinski-Harabasz Score = 258.0822
Davies-Bouldin Score = 0.6926
Jaccard Score = 0.3009
Rand Score = 0.4316
Folkes e Mallows Score = 0.6728


K-Means com k=5
Silouette Score = 0.3984
Dunn Score = 0.3644
Calinski-Harabasz Score = 229.7607
Davies-Bouldin Score = 0.8786
Jaccard Score = 0.4132
Rand Score = 0.3083
Folkes e Mallows Score = 0.5751


K-Means com k=6
Silouette Score = 0.3688
Dunn Score = 0.3807
Calinski-Harabasz Score = 240.4068
Davies-Bouldin Score = 0.9184
Jaccard Score = 0.1167
Rand Score = 0.2863
Folkes e Mallows Sc

In [94]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.4427 - Best Model: K-Means com k=4
Dunn Score = 0.5951 - Best Model: K-Means com k=4
Calinski-Harabasz Score = 259.9930 - Best Model: K-Means com k=8
Davies-Bouldin Score = 1.1219 - Best Model: K-Means com k=2
Jaccard Score = 0.4132 - Best Model: K-Means com k=5
Rand Score = 0.5261 - Best Model: K-Means com k=3
Folkes e Mallows Score = 0.7586 - Best Model: K-Means com k=2


# <a id="dataset_1_d_II"></a> 1.d.II. Dataset Flame - Average Linkage

In [95]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.3723
Calinski-Harabasz Score = 151.6404
Davies-Bouldin Score = 1.1325
Jaccard Score = 0.0000
Rand Score = 0.4422
Folkes e Mallows Score = 0.7311


Average Linkage com k=3
Silouette Score = 0.3973
Calinski-Harabasz Score = 172.4504
Davies-Bouldin Score = 0.8181
Jaccard Score = 0.5292
Rand Score = 0.6902
Folkes e Mallows Score = 0.8402


Average Linkage com k=4
Silouette Score = 0.4385
Calinski-Harabasz Score = 245.5487
Davies-Bouldin Score = 0.6719
Jaccard Score = 0.1667
Rand Score = 0.5043
Folkes e Mallows Score = 0.7231


Average Linkage com k=5
Silouette Score = 0.3803
Calinski-Harabasz Score = 215.2768
Davies-Bouldin Score = 0.7824
Jaccard Score = 0.4792
Rand Score = 0.4459
Folkes e Mallows Score = 0.6815


Average Linkage com k=6
Silouette Score = 0.3560
Calinski-Harabasz Score = 175.6411
Davies-Bouldin Score = 0.7434
Jaccard Score = 0.1667
Rand Score = 0.4446
Folkes e Mallows Score = 0.6805


Average Linkage com k=7
Silouette Score = 0.3

In [96]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.4385 - Best Model: Average Linkage com k=4
Calinski-Harabasz Score = 245.5487 - Best Model: Average Linkage com k=4
Davies-Bouldin Score = 1.1325 - Best Model: Average Linkage com k=2
Jaccard Score = 0.5292 - Best Model: Average Linkage com k=3
Rand Score = 0.6902 - Best Model: Average Linkage com k=3
Folkes e Mallows Score = 0.8402 - Best Model: Average Linkage com k=3


# <a id="dataset_1_d_III"></a> 1.d.III Dataset Flame - DBSCAN

In [97]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = [0.3, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=0.3 e min_samples=8
Silouette Score = 0.2904
Calinski-Harabasz Score = 192.8325
Davies-Bouldin Score = 0.7924
Jaccard Score = 0.1625
Rand Score = 0.2278
Folkes e Mallows Score = 0.4911


DBSCAN com eps=0.5 e min_samples=8
Silouette Score = 0.2904
Calinski-Harabasz Score = 192.8325
Davies-Bouldin Score = 0.7924
Jaccard Score = 0.1625
Rand Score = 0.2278
Folkes e Mallows Score = 0.4911


DBSCAN com eps=1.0 e min_samples=8
Silouette Score = 0.0327
Calinski-Harabasz Score = 37.5791
Davies-Bouldin Score = 2.3819
Jaccard Score = 0.2292
Rand Score = 0.1756
Folkes e Mallows Score = 0.4688


DBSCAN com eps=1.5 e min_samples=8
Silouette Score = 0.2952
Calinski-Harabasz Score = 6.8944
Davies-Bouldin Score = 0.5798
Jaccard Score = 0.0000
Rand Score = 0.0128
Folkes e Mallows Score = 0.7300


DBSCAN com eps=2.0 e min_samples=8
Silouette Score = 0.2952
Calinski-Harabasz Score = 6.8944
Davies-Bouldin Score = 0.5798
Jaccard Score = 0.0000
Rand Score = 0.0128
Folkes e Mallows Score = 0.73

In [98]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.2952 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Calinski-Harabasz Score = 37.5791 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Davies-Bouldin Score = 2.3819 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Jaccard Score = 0.2292 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Rand Score = 0.1756 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Folkes e Mallows Score = 0.7300 - Best Model: DBSCAN com eps=1.5 e min_samples=8


---