# <a id="inicio"></a> Exercícios Métricas de avaliação e seleção de hiper parâmetro

-----

### **Autor:** Glauco Lauria Marques Filho

-----

# <a id="resumo"></a> Resumo 

#### Este arquivo contém a resolução dos exercícios da Aula 9 do curso CEDS-808: Aprendizado de Máquina. 

# <a id="sumario"></a> Sumário


* [Início](#inicio)
* [Resumo](#resumo)
* [Sumário](#sumario)
* [Importação de Requisitos](#requisitos)

- 1.a [Dataset Aggregation](#dataset_1_a)
- 1.a.I [Dataset Aggregation - K-means](#dataset_1_a_I)
- 1.a.II [Dataset Aggregation - Average Linkage](#dataset_1_a_II)
- 1.a.III [Dataset Aggregation - DBSCAN](#dataset_1_a_III)

<br>

- 1.b [Dataset D31](#dataset_1_b)
- 1.b.I [Dataset D31 - K-means](#dataset_1_b_I)
- 1.b.II [Dataset D31 - Average Linkage](#dataset_1_b_II)
- 1.b.III [Dataset D31 - DBSCAN](#dataset_1_b_III)

<br>

- 1.c [Dataset Pathbased](#dataset_1_c)
- 1.c.I [Dataset Pathbased - K-means](#dataset_1_c_I)
- 1.c.II [Dataset Pathbased - Average Linkage](#dataset_1_c_II)
- 1.c.III [Dataset Pathbased - DBSCAN](#dataset_1_c_III)

<br>

- 1.d [Dataset Flame](#dataset_1_d)
- 1.d.I [Dataset Flame - K-means](#dataset_1_d_I)
- 1.d.II [Dataset Flame - Average Linkage](#dataset_1_d_II)
- 1.d.III [Dataset Flame - DBSCAN](#dataset_1_d_III)

<br>

- 2. [Dataset Forest Fires](#dataset_2)



# <a id="requisitos"></a> Importação de Requisitos

In [74]:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.metrics import adjusted_rand_score, jaccard_score, fowlkes_mallows_score
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import KFold
from sklearn.metrics import r2_score
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler

In [33]:
def dunn_index(X, labels, cluster_centers):
    max_intra_cluster_distances = np.zeros(len(np.unique(labels)))
    for cluster_label in np.unique(labels):
        cluster_points = X[labels == cluster_label]
        max_intra_cluster_distances[cluster_label] = np.max(pairwise_distances(cluster_points))

    inter_cluster_distances = pairwise_distances(cluster_centers)
    min_inter_cluster_distance = np.min(inter_cluster_distances[np.nonzero(inter_cluster_distances)])

    return min_inter_cluster_distance / np.max(max_intra_cluster_distances)

# <a id="dataset_1"></a> 1.a. Dataset Aggregation

In [34]:
df = pd.read_csv("aggregation.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_a_I"></a> 1.a.I. Dataset Aggregation - K-Means

In [35]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.4558
Dunn Score = 0.6337
Calinski-Harabasz Score = 704.7859
Davies-Bouldin Score = 0.8964
Jaccard Score = 0.0061
Rand Score = 0.3476
Folkes e Mallows Score = 0.6120


K-Means com k=3
Silouette Score = 0.5229
Dunn Score = 0.6540
Calinski-Harabasz Score = 1053.6429
Davies-Bouldin Score = 0.6681
Jaccard Score = 0.0081
Rand Score = 0.6752
Folkes e Mallows Score = 0.7832


K-Means com k=4
Silouette Score = 0.5221
Dunn Score = 0.6912
Calinski-Harabasz Score = 1169.3026
Davies-Bouldin Score = 0.6149
Jaccard Score = 0.0000
Rand Score = 0.7562
Folkes e Mallows Score = 0.8259


K-Means com k=5
Silouette Score = 0.5003
Dunn Score = 0.5520
Calinski-Harabasz Score = 1325.3818
Davies-Bouldin Score = 0.7017
Jaccard Score = 0.0106
Rand Score = 0.7541
Folkes e Mallows Score = 0.8067


K-Means com k=6
Silouette Score = 0.4680
Dunn Score = 0.3577
Calinski-Harabasz Score = 1203.0246
Davies-Bouldin Score = 0.8002
Jaccard Score = 0.2096
Rand Score = 0.7057
Folkes e Mallow

In [36]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5229 - Best Model: K-Means com k=3
Dunn Score = 0.6912 - Best Model: K-Means com k=4
Calinski-Harabasz Score = 1425.5110 - Best Model: K-Means com k=10
Davies-Bouldin Score = 0.8964 - Best Model: K-Means com k=2
Jaccard Score = 0.3831 - Best Model: K-Means com k=9
Rand Score = 0.7562 - Best Model: K-Means com k=4
Folkes e Mallows Score = 0.8259 - Best Model: K-Means com k=4


# <a id="dataset_1_a_II"></a> 1.a.II. Dataset Aggregation - Average Linkage

In [37]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.4533
Calinski-Harabasz Score = 692.9809
Davies-Bouldin Score = 0.9182
Jaccard Score = 0.0093
Rand Score = 0.3768
Folkes e Mallows Score = 0.6312


Average Linkage com k=3
Silouette Score = 0.5124
Calinski-Harabasz Score = 988.5203
Davies-Bouldin Score = 0.6735
Jaccard Score = 0.2157
Rand Score = 0.6655
Folkes e Mallows Score = 0.7793


Average Linkage com k=4
Silouette Score = 0.5220
Calinski-Harabasz Score = 1141.6939
Davies-Bouldin Score = 0.5968
Jaccard Score = 0.3599
Rand Score = 0.7864
Folkes e Mallows Score = 0.8510


Average Linkage com k=5
Silouette Score = 0.5008
Calinski-Harabasz Score = 1208.0022
Davies-Bouldin Score = 0.6824
Jaccard Score = 0.6916
Rand Score = 0.9358
Folkes e Mallows Score = 0.9516


Average Linkage com k=6
Silouette Score = 0.5124
Calinski-Harabasz Score = 1306.1808
Davies-Bouldin Score = 0.6132
Jaccard Score = 0.3452
Rand Score = 0.9891
Folkes e Mallows Score = 0.9915


Average Linkage com k=7
Silouette Score = 

In [38]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5220 - Best Model: Average Linkage com k=4
Calinski-Harabasz Score = 1306.1808 - Best Model: Average Linkage com k=6
Davies-Bouldin Score = 0.9182 - Best Model: Average Linkage com k=2
Jaccard Score = 0.6916 - Best Model: Average Linkage com k=5
Rand Score = 1.0000 - Best Model: Average Linkage com k=7
Folkes e Mallows Score = 1.0000 - Best Model: Average Linkage com k=7


# <a id="dataset_1_a_III"></a> 1.a.III Dataset Aggregation - DBSCAN

In [39]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = [0.3, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=0.3 e min_samples=8
Silouette Score = 0.4491
Calinski-Harabasz Score = 1288.1706
Davies-Bouldin Score = 0.7227
Jaccard Score = 0.1294
Rand Score = 0.7126
Folkes e Mallows Score = 0.7828


DBSCAN com eps=0.5 e min_samples=8
Silouette Score = 0.4491
Calinski-Harabasz Score = 1288.1706
Davies-Bouldin Score = 0.7227
Jaccard Score = 0.1294
Rand Score = 0.7126
Folkes e Mallows Score = 0.7828


DBSCAN com eps=1.0 e min_samples=8
Silouette Score = -0.3123
Calinski-Harabasz Score = 22.0591
Davies-Bouldin Score = 1.1601
Jaccard Score = 0.0305
Rand Score = 0.0362
Folkes e Mallows Score = 0.3230


DBSCAN com eps=1.5 e min_samples=8
Silouette Score = 0.4385
Calinski-Harabasz Score = 1052.3234
Davies-Bouldin Score = 0.6019
Jaccard Score = 0.1270
Rand Score = 0.9850
Folkes e Mallows Score = 0.9882


DBSCAN com eps=2.0 e min_samples=8
Silouette Score = 0.4120
Calinski-Harabasz Score = 754.2002
Davies-Bouldin Score = 0.6249
Jaccard Score = 0.0000
Rand Score = 0.8089
Folkes e Mallows Scor

In [40]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.4695 - Best Model: DBSCAN com eps=3.0 e min_samples=8
Calinski-Harabasz Score = 1052.3234 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Davies-Bouldin Score = 1.1601 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Jaccard Score = 0.1270 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Rand Score = 0.9850 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Folkes e Mallows Score = 0.9882 - Best Model: DBSCAN com eps=1.5 e min_samples=8


---

# <a id="dataset_1_b"></a> 1.b. Dataset D31

In [41]:
df = pd.read_csv("d31.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_b_I"></a> 1.b.I. Dataset D31 - K-Means

In [42]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 34):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.3943
Dunn Score = 0.4926
Calinski-Harabasz Score = 2292.8938
Davies-Bouldin Score = 1.0661
Jaccard Score = 0.0000
Rand Score = 0.0593
Folkes e Mallows Score = 0.2437




K-Means com k=3
Silouette Score = 0.4207
Dunn Score = 0.5632
Calinski-Harabasz Score = 2890.5771
Davies-Bouldin Score = 0.7954
Jaccard Score = 0.0031
Rand Score = 0.1132
Folkes e Mallows Score = 0.2956


K-Means com k=4
Silouette Score = 0.4276
Dunn Score = 0.6604
Calinski-Harabasz Score = 3371.3707
Davies-Bouldin Score = 0.7890
Jaccard Score = 0.0063
Rand Score = 0.1637
Folkes e Mallows Score = 0.3340


K-Means com k=5
Silouette Score = 0.4117
Dunn Score = 0.5479
Calinski-Harabasz Score = 3212.1083
Davies-Bouldin Score = 0.8240
Jaccard Score = 0.0151
Rand Score = 0.2012
Folkes e Mallows Score = 0.3657


K-Means com k=6
Silouette Score = 0.4118
Dunn Score = 0.5296
Calinski-Harabasz Score = 3282.6927
Davies-Bouldin Score = 0.8123
Jaccard Score = 0.0042
Rand Score = 0.2480
Folkes e Mallows Score = 0.4004


K-Means com k=7
Silouette Score = 0.4236
Dunn Score = 0.6319
Calinski-Harabasz Score = 3675.3392
Davies-Bouldin Score = 0.8253
Jaccard Score = 0.0000
Rand Score = 0.3013
Folkes e Mallo

In [43]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5675 - Best Model: K-Means com k=32
Dunn Score = 0.6604 - Best Model: K-Means com k=4
Calinski-Harabasz Score = 9001.3209 - Best Model: K-Means com k=32
Davies-Bouldin Score = 1.0661 - Best Model: K-Means com k=2
Jaccard Score = 0.0863 - Best Model: K-Means com k=31
Rand Score = 0.9434 - Best Model: K-Means com k=32
Folkes e Mallows Score = 0.9452 - Best Model: K-Means com k=32


# <a id="dataset_1_b_II"></a> 1.b.II. Dataset D31 - Average Linkage

In [44]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 34):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.3737
Calinski-Harabasz Score = 1985.2568
Davies-Bouldin Score = 1.1500
Jaccard Score = 0.0000
Rand Score = 0.0637
Folkes e Mallows Score = 0.2525


Average Linkage com k=3
Silouette Score = 0.3911
Calinski-Harabasz Score = 2403.6033
Davies-Bouldin Score = 0.8052
Jaccard Score = 0.0036
Rand Score = 0.0987
Folkes e Mallows Score = 0.2866


Average Linkage com k=4
Silouette Score = 0.3855
Calinski-Harabasz Score = 2734.8714
Davies-Bouldin Score = 0.8793
Jaccard Score = 0.0089
Rand Score = 0.1688
Folkes e Mallows Score = 0.3476


Average Linkage com k=5
Silouette Score = 0.3658
Calinski-Harabasz Score = 2350.4588
Davies-Bouldin Score = 0.8332
Jaccard Score = 0.0000
Rand Score = 0.1848
Folkes e Mallows Score = 0.3607


Average Linkage com k=6
Silouette Score = 0.3614
Calinski-Harabasz Score = 2557.7393
Davies-Bouldin Score = 0.8479
Jaccard Score = 0.0000
Rand Score = 0.2285
Folkes e Mallows Score = 0.3951


Average Linkage com k=7
Silouette Score 

In [45]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5666 - Best Model: Average Linkage com k=32
Calinski-Harabasz Score = 8683.2619 - Best Model: Average Linkage com k=32
Davies-Bouldin Score = 1.1500 - Best Model: Average Linkage com k=2
Jaccard Score = 0.1064 - Best Model: Average Linkage com k=29
Rand Score = 0.9307 - Best Model: Average Linkage com k=32
Folkes e Mallows Score = 0.9329 - Best Model: Average Linkage com k=32


# <a id="dataset_1_b_III"></a> 1.b.III Dataset D31 - DBSCAN

In [46]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = list(range(1,10,1))

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=1 e min_samples=8
Silouette Score = 0.0808
Calinski-Harabasz Score = 693.4219
Davies-Bouldin Score = 9.4449
Jaccard Score = 0.0000
Rand Score = 0.1378
Folkes e Mallows Score = 0.3206


DBSCAN com eps=2 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=3 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=4 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=5 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846




DBSCAN com eps=6 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=7 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=8 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846


DBSCAN com eps=9 e min_samples=8
Silouette Score = 0.2304
Calinski-Harabasz Score = 268.8047
Davies-Bouldin Score = 0.6332
Jaccard Score = 0.0000
Rand Score = 0.0044
Folkes e Mallows Score = 0.1846




In [47]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.2304 - Best Model: DBSCAN com eps=2 e min_samples=8
Calinski-Harabasz Score = 693.4219 - Best Model: DBSCAN com eps=1 e min_samples=8
Davies-Bouldin Score = 9.4449 - Best Model: DBSCAN com eps=1 e min_samples=8
Jaccard Score = 0.0000 - Best Model: 
Rand Score = 0.1378 - Best Model: DBSCAN com eps=1 e min_samples=8
Folkes e Mallows Score = 0.3206 - Best Model: DBSCAN com eps=1 e min_samples=8


---

# <a id="dataset_1_c"></a> 1.c. Dataset Pathbased

In [48]:
df = pd.read_csv("pathbased.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_c_I"></a> 1.c.I. Dataset Pathbased - K-Means

In [49]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.5150
Dunn Score = 0.5330
Calinski-Harabasz Score = 371.0553
Davies-Bouldin Score = 0.7523
Jaccard Score = 0.0939
Rand Score = 0.3990
Folkes e Mallows Score = 0.6519


K-Means com k=3
Silouette Score = 0.5420
Dunn Score = 0.6828
Calinski-Harabasz Score = 359.0336
Davies-Bouldin Score = 0.6662
Jaccard Score = 0.0759
Rand Score = 0.4618
Folkes e Mallows Score = 0.6620


K-Means com k=4
Silouette Score = 0.4402
Dunn Score = 0.3491
Calinski-Harabasz Score = 318.0389
Davies-Bouldin Score = 0.8772
Jaccard Score = 0.0556
Rand Score = 0.3867
Folkes e Mallows Score = 0.5840


K-Means com k=5
Silouette Score = 0.3838
Dunn Score = 0.4502
Calinski-Harabasz Score = 308.8591
Davies-Bouldin Score = 0.9403
Jaccard Score = 0.0597
Rand Score = 0.4112
Folkes e Mallows Score = 0.5866


K-Means com k=6
Silouette Score = 0.3663
Dunn Score = 0.3845
Calinski-Harabasz Score = 283.4670
Davies-Bouldin Score = 0.8861
Jaccard Score = 0.0604
Rand Score = 0.3367
Folkes e Mallows Sc

In [50]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5420 - Best Model: K-Means com k=3
Dunn Score = 0.6828 - Best Model: K-Means com k=3
Calinski-Harabasz Score = 371.0553 - Best Model: K-Means com k=2
Davies-Bouldin Score = 0.9403 - Best Model: K-Means com k=5
Jaccard Score = 0.0939 - Best Model: K-Means com k=2
Rand Score = 0.4618 - Best Model: K-Means com k=3
Folkes e Mallows Score = 0.6620 - Best Model: K-Means com k=3


# <a id="dataset_1_c_II"></a> 1.c.II. Dataset Pathbased - Average Linkage

In [51]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.4911
Calinski-Harabasz Score = 316.9184
Davies-Bouldin Score = 0.8007
Jaccard Score = 0.0809
Rand Score = 0.3947
Folkes e Mallows Score = 0.6501


Average Linkage com k=3
Silouette Score = 0.5386
Calinski-Harabasz Score = 353.9570
Davies-Bouldin Score = 0.6309
Jaccard Score = 0.3352
Rand Score = 0.4436
Folkes e Mallows Score = 0.6526


Average Linkage com k=4
Silouette Score = 0.4756
Calinski-Harabasz Score = 286.4206
Davies-Bouldin Score = 0.6237
Jaccard Score = 0.2963
Rand Score = 0.4629
Folkes e Mallows Score = 0.6555


Average Linkage com k=5
Silouette Score = 0.4227
Calinski-Harabasz Score = 241.5164
Davies-Bouldin Score = 0.5724
Jaccard Score = 0.3018
Rand Score = 0.4515
Folkes e Mallows Score = 0.6468


Average Linkage com k=6
Silouette Score = 0.3986
Calinski-Harabasz Score = 232.0410
Davies-Bouldin Score = 0.6029
Jaccard Score = 0.0682
Rand Score = 0.4851
Folkes e Mallows Score = 0.6594


Average Linkage com k=7
Silouette Score = 0.3

In [52]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.5386 - Best Model: Average Linkage com k=3
Calinski-Harabasz Score = 353.9570 - Best Model: Average Linkage com k=3
Davies-Bouldin Score = 0.8007 - Best Model: Average Linkage com k=2
Jaccard Score = 0.3352 - Best Model: Average Linkage com k=3
Rand Score = 0.6691 - Best Model: Average Linkage com k=9
Folkes e Mallows Score = 0.7746 - Best Model: Average Linkage com k=9


# <a id="dataset_1_c_III"></a> 1.c.III Dataset Pathbased - DBSCAN

In [53]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = [1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3]

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=1 e min_samples=8
Silouette Score = -0.3911
Calinski-Harabasz Score = 7.4666
Davies-Bouldin Score = 1.5807
Jaccard Score = 0.0433
Rand Score = 0.0423
Folkes e Mallows Score = 0.5228


DBSCAN com eps=1.25 e min_samples=8
Silouette Score = -0.0767
Calinski-Harabasz Score = 21.5301
Davies-Bouldin Score = 1.9202
Jaccard Score = 0.2500
Rand Score = 0.3767
Folkes e Mallows Score = 0.6226


DBSCAN com eps=1.5 e min_samples=8
Silouette Score = 0.1341
Calinski-Harabasz Score = 38.6580
Davies-Bouldin Score = 2.2988
Jaccard Score = 0.0000
Rand Score = 0.6311
Folkes e Mallows Score = 0.7619


DBSCAN com eps=1.75 e min_samples=8
Silouette Score = 0.1636
Calinski-Harabasz Score = 40.9267
Davies-Bouldin Score = 2.0803
Jaccard Score = 0.3038
Rand Score = 0.7854
Folkes e Mallows Score = 0.8556


DBSCAN com eps=2 e min_samples=8
Silouette Score = 0.1682
Calinski-Harabasz Score = 47.8750
Davies-Bouldin Score = 1.6406
Jaccard Score = 0.3370
Rand Score = 0.7873
Folkes e Mallows Score = 0.855

In [54]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.2948 - Best Model: DBSCAN com eps=2.25 e min_samples=8
Calinski-Harabasz Score = 75.4146 - Best Model: DBSCAN com eps=2.25 e min_samples=8
Davies-Bouldin Score = 4.5474 - Best Model: DBSCAN com eps=2.5 e min_samples=8
Jaccard Score = 0.3551 - Best Model: DBSCAN com eps=2.25 e min_samples=8
Rand Score = 0.7873 - Best Model: DBSCAN com eps=2 e min_samples=8
Folkes e Mallows Score = 0.8556 - Best Model: DBSCAN com eps=1.75 e min_samples=8


---

# <a id="dataset_1_d"></a> 1.d. Dataset Flame

In [55]:
df = pd.read_csv("flame.csv", sep=";", names=["x1","x2","y"])
X = df.drop("y", axis=1)
Y = df["y"]

# <a id="dataset_1_d_I"></a> 1.d.I. Dataset Flame - K-Means

In [56]:
models_score = {
    "Silouette Score": [0,""],
    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = KMeans(n_clusters=k)
    model_text = f"K-Means com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    dunn_value = dunn_index(X, labels, model.cluster_centers_)
    if dunn_value > models_score["Dunn Score"][0]:
        models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


K-Means com k=2
Silouette Score = 0.3708
Dunn Score = 0.4206
Calinski-Harabasz Score = 148.6281
Davies-Bouldin Score = 1.1171
Jaccard Score = 0.2528
Rand Score = 0.4880
Folkes e Mallows Score = 0.7530


K-Means com k=3
Silouette Score = 0.4096
Dunn Score = 0.5792
Calinski-Harabasz Score = 201.4141
Davies-Bouldin Score = 0.7958
Jaccard Score = 0.4840
Rand Score = 0.4116
Folkes e Mallows Score = 0.6748


K-Means com k=4
Silouette Score = 0.4427
Dunn Score = 0.5951
Calinski-Harabasz Score = 258.0822
Davies-Bouldin Score = 0.6926
Jaccard Score = 0.3009
Rand Score = 0.4316
Folkes e Mallows Score = 0.6728


K-Means com k=5
Silouette Score = 0.4069
Dunn Score = 0.3731
Calinski-Harabasz Score = 238.7274
Davies-Bouldin Score = 0.8397
Jaccard Score = 0.0166
Rand Score = 0.3442
Folkes e Mallows Score = 0.6022


K-Means com k=6
Silouette Score = 0.3845
Dunn Score = 0.4290
Calinski-Harabasz Score = 238.3472
Davies-Bouldin Score = 0.8412
Jaccard Score = 0.3497
Rand Score = 0.2794
Folkes e Mallows Sc

In [57]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.4427 - Best Model: K-Means com k=4
Dunn Score = 0.5951 - Best Model: K-Means com k=4
Calinski-Harabasz Score = 258.0822 - Best Model: K-Means com k=4
Davies-Bouldin Score = 1.1171 - Best Model: K-Means com k=2
Jaccard Score = 0.4840 - Best Model: K-Means com k=3
Rand Score = 0.4880 - Best Model: K-Means com k=2
Folkes e Mallows Score = 0.7530 - Best Model: K-Means com k=2


# <a id="dataset_1_d_II"></a> 1.d.II. Dataset Flame - Average Linkage

In [58]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}

for k in range(2, 11):

    # #######
    model = AgglomerativeClustering(n_clusters=k, linkage='average')
    model_text = f"Average Linkage com k={k}"
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_

    silhouette_value = silhouette_score(X, labels)
    if silhouette_value > models_score["Silouette Score"][0]:
        models_score["Silouette Score"] = [silhouette_value, model_text]
    # dunn_value = dunn_index(X, labels, model.cluster_centers_)
    # if dunn_value > models_score["Dunn Score"][0]:
    #     models_score["Dunn Score"] = [dunn_value, model_text]
    ch_value = calinski_harabasz_score(X, labels)
    if ch_value > models_score["Calinski-Harabasz Score"][0]:
        models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
    db_value = davies_bouldin_score(X, labels)
    if db_value > models_score["Davies-Bouldin Score"][0]:
        models_score["Davies-Bouldin Score"] = [db_value, model_text]
    jaccard_value = jaccard_score(Y, labels, average='weighted')
    if jaccard_value > models_score["Jaccard Score"][0]:
        models_score["Jaccard Score"] = [jaccard_value, model_text]
    rand_value = adjusted_rand_score(Y, labels)
    if rand_value > models_score["Rand Score"][0]:
        models_score["Rand Score"] = [rand_value, model_text]
    fm_value = fowlkes_mallows_score(Y, labels)
    if fm_value > models_score["Folkes e Mallows Score"][0]:
        models_score["Folkes e Mallows Score"] = [fm_value, model_text]

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
        # + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


Average Linkage com k=2
Silouette Score = 0.3723
Calinski-Harabasz Score = 151.6404
Davies-Bouldin Score = 1.1325
Jaccard Score = 0.0000
Rand Score = 0.4422
Folkes e Mallows Score = 0.7311


Average Linkage com k=3
Silouette Score = 0.3973
Calinski-Harabasz Score = 172.4504
Davies-Bouldin Score = 0.8181
Jaccard Score = 0.5292
Rand Score = 0.6902
Folkes e Mallows Score = 0.8402


Average Linkage com k=4
Silouette Score = 0.4385
Calinski-Harabasz Score = 245.5487
Davies-Bouldin Score = 0.6719
Jaccard Score = 0.1667
Rand Score = 0.5043
Folkes e Mallows Score = 0.7231


Average Linkage com k=5
Silouette Score = 0.3803
Calinski-Harabasz Score = 215.2768
Davies-Bouldin Score = 0.7824
Jaccard Score = 0.4792
Rand Score = 0.4459
Folkes e Mallows Score = 0.6815


Average Linkage com k=6
Silouette Score = 0.3560
Calinski-Harabasz Score = 175.6411
Davies-Bouldin Score = 0.7434
Jaccard Score = 0.1667
Rand Score = 0.4446
Folkes e Mallows Score = 0.6805


Average Linkage com k=7
Silouette Score = 0.3

In [59]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.4385 - Best Model: Average Linkage com k=4
Calinski-Harabasz Score = 245.5487 - Best Model: Average Linkage com k=4
Davies-Bouldin Score = 1.1325 - Best Model: Average Linkage com k=2
Jaccard Score = 0.5292 - Best Model: Average Linkage com k=3
Rand Score = 0.6902 - Best Model: Average Linkage com k=3
Folkes e Mallows Score = 0.8402 - Best Model: Average Linkage com k=3


# <a id="dataset_1_d_III"></a> 1.d.III Dataset Flame - DBSCAN

In [60]:
models_score = {
    "Silouette Score": [0,""],
#    "Dunn Score": [0,""],
    "Calinski-Harabasz Score": [0,""],
    "Davies-Bouldin Score": [0,""],
    "Jaccard Score": [0,""],
    "Rand Score": [0,""],
    "Folkes e Mallows Score": [0,""],
}
epsilon_values = [0.3, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]

for k in range(2, 11):

    # #######
    model = DBSCAN(eps=epsilon_values[k-2], min_samples=8)
    model_text = f'DBSCAN com eps={epsilon_values[k-2]} e min_samples=8'
    # #######

    clusters = model.fit_predict(X)
    labels = model.labels_
    try:
        silhouette_value = silhouette_score(X, labels)
        if silhouette_value > models_score["Silouette Score"][0]:
            models_score["Silouette Score"] = [silhouette_value, model_text]
        # dunn_value = dunn_index(X, labels, model.cluster_centers_)
        # if dunn_value > models_score["Dunn Score"][0]:
        #     models_score["Dunn Score"] = [dunn_value, model_text]
        ch_value = calinski_harabasz_score(X, labels)
        if ch_value > models_score["Calinski-Harabasz Score"][0]:
            models_score["Calinski-Harabasz Score"] = [ch_value, model_text]
        db_value = davies_bouldin_score(X, labels)
        if db_value > models_score["Davies-Bouldin Score"][0]:
            models_score["Davies-Bouldin Score"] = [db_value, model_text]
        jaccard_value = jaccard_score(Y, labels, average='weighted')
        if jaccard_value > models_score["Jaccard Score"][0]:
            models_score["Jaccard Score"] = [jaccard_value, model_text]
        rand_value = adjusted_rand_score(Y, labels)
        if rand_value > models_score["Rand Score"][0]:
            models_score["Rand Score"] = [rand_value, model_text]
        fm_value = fowlkes_mallows_score(Y, labels)
        if fm_value > models_score["Folkes e Mallows Score"][0]:
            models_score["Folkes e Mallows Score"] = [fm_value, model_text]
    except:
       pass

    text = (
        # #######
        model_text + "\n"
        # #######
        + f"Silouette Score = {silhouette_value:.4f}" + "\n"
     #   + f"Dunn Score = {dunn_value:.4f}" + "\n"
        + f"Calinski-Harabasz Score = {ch_value:.4f}" + "\n"
        + f"Davies-Bouldin Score = {db_value:.4f}" + "\n"
        + f"Jaccard Score = {jaccard_value:.4f}" + "\n"
        + f"Rand Score = {rand_value:.4f}" + "\n"
        + f"Folkes e Mallows Score = {fm_value:.4f}" + "\n\n")
    print(text)


DBSCAN com eps=0.3 e min_samples=8
Silouette Score = 0.2904
Calinski-Harabasz Score = 192.8325
Davies-Bouldin Score = 0.7924
Jaccard Score = 0.1625
Rand Score = 0.2278
Folkes e Mallows Score = 0.4911


DBSCAN com eps=0.5 e min_samples=8
Silouette Score = 0.2904
Calinski-Harabasz Score = 192.8325
Davies-Bouldin Score = 0.7924
Jaccard Score = 0.1625
Rand Score = 0.2278
Folkes e Mallows Score = 0.4911


DBSCAN com eps=1.0 e min_samples=8
Silouette Score = 0.0327
Calinski-Harabasz Score = 37.5791
Davies-Bouldin Score = 2.3819
Jaccard Score = 0.2292
Rand Score = 0.1756
Folkes e Mallows Score = 0.4688


DBSCAN com eps=1.5 e min_samples=8
Silouette Score = 0.2952
Calinski-Harabasz Score = 6.8944
Davies-Bouldin Score = 0.5798
Jaccard Score = 0.0000
Rand Score = 0.0128
Folkes e Mallows Score = 0.7300


DBSCAN com eps=2.0 e min_samples=8
Silouette Score = 0.2952
Calinski-Harabasz Score = 6.8944
Davies-Bouldin Score = 0.5798
Jaccard Score = 0.0000
Rand Score = 0.0128
Folkes e Mallows Score = 0.73

In [61]:
for algoritm, score in models_score.items():
    print(f"{algoritm} = {score[0]:.4f} - Best Model: {score[1]}")

Silouette Score = 0.2952 - Best Model: DBSCAN com eps=1.5 e min_samples=8
Calinski-Harabasz Score = 37.5791 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Davies-Bouldin Score = 2.3819 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Jaccard Score = 0.2292 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Rand Score = 0.1756 - Best Model: DBSCAN com eps=1.0 e min_samples=8
Folkes e Mallows Score = 0.7300 - Best Model: DBSCAN com eps=1.5 e min_samples=8


---

# <a id="dataset_2"></a> 2. Dataset Forest Fires

In [84]:
# Importando o dataset Forest Fires
ffire = fetch_ucirepo(id=162) 
# Definindo dataframe de Features e target
X = ffire.data.features 
y = ffire.data.targets
mes_para_numero = {
    'jan': 1,
    'feb': 2,
    'mar': 3,
    'apr': 4,
    'may': 5,
    'jun': 6,
    'jul': 7,
    'aug': 8,
    'sep': 9,
    'oct': 10,
    'nov': 11,
    'dec': 12
}
X['month'] = X['month'].map(mes_para_numero)
dia_para_numero = {
    'sun': 1,
    'mon': 2,
    'tue': 3,
    'wed': 4,
    'thu': 5,
    'fri': 6,
    'sat': 7,
}
X['day'] = X['day'].map(dia_para_numero)
colunas = X.columns
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
X = pd.DataFrame(X, columns=colunas)
scaler = MinMaxScaler()
y = y.values.reshape(-1, 1)
y = scaler.fit_transform(y).flatten()
y = pd.DataFrame(y, columns=['area'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['month'] = X['month'].map(mes_para_numero)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['day'] = X['day'].map(dia_para_numero)


### Melhores resultados encontrados anteriormente:

In [85]:
# Melhores resultados de cada modelo rodado no exercicio numero 6
knn_regressor = KNeighborsRegressor(n_neighbors=11)
dt_regressor = DecisionTreeRegressor(max_depth=3)
svm_regressor = SVR(kernel='linear')
mlp_regressor = MLPRegressor(max_iter=1000)

cv = KFold(n_splits=5, shuffle=True, random_state=42)

knn_r2_scores = []
dt_r2_scores = []
svm_r2_scores = []
mlp_r2_scores = []

for train_index, test_index in cv.split(X):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = np.ravel(y.iloc[train_index]), np.ravel(y.iloc[test_index])
    
    knn_regressor.fit(X_train, y_train)
    knn_pred = knn_regressor.predict(X_test)
    knn_r2 = r2_score(y_test, knn_pred)
    knn_r2_scores.append(knn_r2)

    dt_regressor.fit(X_train, y_train)
    dt_pred = dt_regressor.predict(X_test)
    dt_r2 = r2_score(y_test, dt_pred)
    dt_r2_scores.append(dt_r2)

    svm_regressor.fit(X_train, y_train)
    svm_pred = svm_regressor.predict(X_test)
    svm_r2 = r2_score(y_test, svm_pred)
    svm_r2_scores.append(svm_r2)

    mlp_regressor.fit(X_train, y_train)
    mlp_pred = mlp_regressor.predict(X_test)
    mlp_r2 = r2_score(y_test, mlp_pred)
    mlp_r2_scores.append(mlp_r2)

mean_knn_r2 = np.mean(knn_r2_scores)
mean_dt_r2 = np.mean(dt_r2_scores)
mean_svm_r2 = np.mean(svm_r2_scores)
mean_mlp_r2 = np.mean(mlp_r2_scores)

print("KNN:", mean_knn_r2)
print("DT:", mean_dt_r2)
print("SVM:", mean_svm_r2)
print("MLP:", mean_mlp_r2)


KNN: -0.396775777322406
DT: -0.6824019824918329
SVM: -9.284703149034126
MLP: -0.6517952483549271


### Melhores Resultados rodando RandomSearch

In [86]:
knn_params = {'n_neighbors': range(1, 31)}
dt_params = {'max_depth': range(1, 31)}
svm_params = {'C': [0.1, 1, 10, 100, 1000], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}
mlp_params = {'hidden_layer_sizes': [(50,), (100,), (200,), (50, 50), (100, 100), (200, 200)], 
              'alpha': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100]}

knn_random_search = RandomizedSearchCV(KNeighborsRegressor(), knn_params, n_iter=20, cv=cv, scoring='r2')
dt_random_search = RandomizedSearchCV(DecisionTreeRegressor(), dt_params, n_iter=20, cv=cv, scoring='r2')
svm_random_search = RandomizedSearchCV(SVR(), svm_params, n_iter=20, cv=cv, scoring='r2')
mlp_random_search = RandomizedSearchCV(MLPRegressor(max_iter=1000), mlp_params, n_iter=20, cv=cv, scoring='r2')

knn_random_search.fit(X, np.ravel(y))
dt_random_search.fit(X, np.ravel(y))
svm_random_search.fit(X, np.ravel(y))
mlp_random_search.fit(X, np.ravel(y))

print("KNN - Melhores parametros:", knn_random_search.best_params_)
print("KNN - Melhor R2:", knn_random_search.best_score_)
print("DT - Melhores parametros:", dt_random_search.best_params_)
print("DT - Melhor R2:", dt_random_search.best_score_)
print("SVM - Melhores parametros:", svm_random_search.best_params_)
print("SVM - Melhor R2:", svm_random_search.best_score_)
print("MLP - Melhores parametros:", mlp_random_search.best_params_)
print("MLP - Melhor R2:", mlp_random_search.best_score_)


KNN - Melhores parametros: {'n_neighbors': 26}
KNN - Melhor R2: -0.15364594311289614
DT - Melhores parametros: {'max_depth': 2}
DT - Melhor R2: -0.03758653479985612
SVM - Melhores parametros: {'gamma': 10, 'C': 10}
SVM - Melhor R2: -3.238641965951466
MLP - Melhores parametros: {'hidden_layer_sizes': (200, 200), 'alpha': 0.1}
MLP - Melhor R2: -0.023701640987247984


### Melhores Resultados rodando GridSearch

In [87]:
knn_params = {'n_neighbors': range(1, 31)}
dt_params = {'max_depth': range(1, 31)}
svm_params = {'C': [0.1, 1, 10, 100, 1000], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}
mlp_params = {'hidden_layer_sizes': [(50,), (100,), (200,), (50, 50), (100, 100), (200, 200)], 
              'alpha': [0.0001, 0.001, 0.01, 0.1, 1, 10, 100]}

knn_grid_search = GridSearchCV(KNeighborsRegressor(), knn_params, cv=cv, scoring='r2')
dt_grid_search = GridSearchCV(DecisionTreeRegressor(), dt_params, cv=cv, scoring='r2')
svm_grid_search = GridSearchCV(SVR(), svm_params, cv=cv, scoring='r2')
mlp_grid_search = GridSearchCV(MLPRegressor(max_iter=1000), mlp_params, cv=cv, scoring='r2')

knn_grid_search.fit(X, np.ravel(y))
dt_grid_search.fit(X, np.ravel(y))
svm_grid_search.fit(X, np.ravel(y))
mlp_grid_search.fit(X, np.ravel(y))

print("KNN - Melhores parametros:", knn_grid_search.best_params_)
print("KNN - Melhor R2:", knn_grid_search.best_score_)
print("DT - Melhores parametros:", dt_grid_search.best_params_)
print("DT - Melhor R2:", dt_grid_search.best_score_)
print("SVM - Melhores parametros:", svm_grid_search.best_params_)
print("SVM - Melhor R2:", svm_grid_search.best_score_)
print("MLP - Melhores parametros:", mlp_grid_search.best_params_)
print("MLP - Melhor R2:", mlp_grid_search.best_score_)
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
scaler = MinMaxScaler()
y = y.values.reshape(-1, 1)
y = scaler.fit_transform(y).flatten()

KNN - Melhores parametros: {'n_neighbors': 26}
KNN - Melhor R2: -0.15364594311289614
DT - Melhores parametros: {'max_depth': 2}
DT - Melhor R2: -0.03758653479985612
SVM - Melhores parametros: {'C': 10, 'gamma': 10}
SVM - Melhor R2: -3.238641965951466
MLP - Melhores parametros: {'alpha': 1, 'hidden_layer_sizes': (200,)}
MLP - Melhor R2: -0.002898184885603583


Sem duvidas todos os valores obtidos foram melhores que os anteriormente obtidos, em qualquer um dos modelos. Porem ainda nao tivemos um R2 positivo.