# **Clustering**

Una vez que se ha reducido la dimensionalidad de la extracción de caracteristicas, se procederá a clasificarlos con **KMeans** y **Gaussian Mixture Model**

## **Load packages**

In [1]:
import numpy as np

## **Load datasets**

In [7]:
train_tsne = np.load("reduction/train_tsne.npy")
train_umap = np.load("reduction/train_umap.npy")
train_numeric_labels = np.load("reduction/train_numeric_labels.npy")

In [3]:
from sklearn.metrics import silhouette_score, rand_score, adjusted_rand_score, mutual_info_score, normalized_mutual_info_score

def calculate_clustering_metrics(X, cluster_labels, true_labels):
    
    silhouette = silhouette_score(X, cluster_labels)  # Silhouette Score
    rand_index = rand_score(true_labels, cluster_labels)  # Rand Index
    adjusted_rand = adjusted_rand_score(true_labels, cluster_labels)  # Adjusted Rand Index
    mutual_info = mutual_info_score(true_labels, cluster_labels)  # Mutual Information
    nmi = normalized_mutual_info_score(true_labels, cluster_labels)  # Normalized Mutual Information
    
    metrics = {
        "Silhouette Score": silhouette,
        "Rand Index (RI)": rand_index,
        "Adjusted Rand Index": adjusted_rand,
        "Mutual Information Score (MI)": mutual_info,
        "Normalized Mutual Information (NMI)": nmi
    }
    
    for metric, score in metrics.items():
        print(f"{metric}: {score:.4f}")

## **Modelos**

### **t-SNE | KMeans**

In [8]:
from kmeans_plus_plus import KMeans

num_clusters = len(np.unique(train_numeric_labels))  

kmeans = KMeans(n_clusters=num_clusters, random_state=42)
cluster_labels = kmeans.fit_predict(train_tsne)

calculate_clustering_metrics(train_tsne, cluster_labels, train_numeric_labels)

Silhouette Score: 0.4811
Rand Index (RI): 0.9519
Adjusted Rand Index: 0.7524
Mutual Information Score (MI): 1.9232
Normalized Mutual Information (NMI): 0.8529


### **UMAP | KMeans**

In [9]:
from kmeans_plus_plus import KMeans

num_clusters = len(np.unique(train_numeric_labels)) 

kmeans = KMeans(n_clusters=num_clusters, random_state=42)
cluster_labels = kmeans.fit_predict(train_umap)

calculate_clustering_metrics(train_umap, cluster_labels, train_numeric_labels)



Silhouette Score: 0.7799
Rand Index (RI): 0.9846
Adjusted Rand Index: 0.9180
Mutual Information Score (MI): 2.1410
Normalized Mutual Information (NMI): 0.9413
