
# TSGA: Time Series Grouping Algorithm

This notebook shows how to apply the clustering method proposed in:

Igor Manojlović, Goran Švenda, Aleksandar Erdeljan, Milan Gavrić, *Time Series Grouping Algorithm for Load Pattern Recognition*, Computers in Industry 111: 140-147, 2019, DOI: [10.1016/j.compind.2019.07.009](https://doi.org/10.1016/j.compind.2019.07.009)


In [None]:
from culearn.clustering import *

# First, we need an instance of the TSGA class that implements the algorithm.
clustering = TSGA()
# Default parameter values are set according to the original paper.
# For more options, please look at the class comments.

# Then, we need an input dataset - we can just generate one.
x = pd.DataFrame(
    np.random.rand(1000, 100),
    index=[f'Object_{i}' for i in range(1000)],
    columns=[f'Feature_{i}' for i in range(100)],
)

# Finally, we can perform the clustering.
y = clustering.fit_predict(x)

# The result is a DataFrame with one index and one column,
# where the index contains object identifiers,
# while the column contains cluster identifiers.
y

In [None]:
# The properties of the TSGA instance keep information about the clustering process,
# such the clustering scores obtained during the search for an optimal number of clusters.
print('Cluster Validity Index:', clustering.score)
score_values = pd.DataFrame(clustering.k2score.values())
score_values.plot()