# Algoritmo KMeans - voltado para séries temporais

## Índice

- [Link para retornar ao notebook principal](#retornar-para-notebook-principal---mainipynb)
- [Importando bibliotecas e pacotes](#importando-bibliotecas-e-pacotes)
- [Importando datasets](#importando-datasets)
- [Testes com o algoritmo](#testes-com-o-algoritmo)
- [Aplicando o algoritmo KMeans](#implementação-do-algoritmo)

## Retornar para notebook principal - main.ipynb

[Link para notebook principal](./main.ipynb)

## Importando bibliotecas e pacotes

In [1]:
from sklearn.metrics import silhouette_score, davies_bouldin_score, calinski_harabasz_score
from sklearn.model_selection import GridSearchCV
from tslearn.clustering import TimeSeriesKMeans
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

## Importando datasets

In [2]:
df_ureia = pd.read_csv("../databases/processed/ureia_pivoted.csv", sep = ",", index_col = "subject_id")
df_creatinina = pd.read_csv("../databases/processed/creatinina_pivoted.csv", sep = ",", index_col = "subject_id")

## Testes com o Algoritmo

In [3]:
# model = TimeSeriesKMeans(n_clusters = 7,
#                          max_iter = 300,
#                          tol = 0.0004,
#                          n_init = 3,
#                          metric = "euclidean", # {“euclidean”, “dtw”, “softdtw”}
#                          max_iter_barycenter = 300,
#                          n_jobs = -1,
#                          random_state = 42,
#                          init = "k-means++") # "k-means++", "random"

In [4]:
parametros_grid_search = {
    "n_clusters": [i for i in range(2, 14)],
    "init": ["k-means++", "random"],
    "n_init": [1, 2, 3, 4, 5, "auto"],
    "max_iter": [j for j in range(100, 600, 100)],
    "metric": ["euclidean", "dtw", "softdtw"]
}

In [None]:
grid_search = GridSearchCV(estimator = TimeSeriesKMeans(),
                           param_grid = parametros_grid_search,
                           cv = 5,
                           scoring = "homogeneity_score",
                           n_jobs = -1)
grid_search.fit(df_ureia)

In [None]:
print(grid_search.best_estimator_)
print(grid_search.best_score_)
print(grid_search.best_index_)