### Intro
In this notebook, we are going to explore Time Series Clustering Task and techniques/methods that can be applied. I'd like to share my experience and demonstrate an approach to the task. Let's dive in!

### Content 
- <a href='#1'>1. Data Description</a>
- <a href='#2'>2. Dealing with Missing Data</a>
- <a href='#3'>3. Time Series Feature Extraction</a>
- <a href='#4'>4. Clustering Methods</a>
    - <a href='#4.1'>4.1 Time Series Smoothing</a>
    - <a href='#4.2'>4.2 Time Series Scaling</a>
    - <a href='#4.3'>4.3 Time Series K-Means</a>
    - <a href='#4.4'>4.4 Downsizing Feature Space</a>
        - <a href='#4.4.1'>4.4.1 t-SNE</a>
        - <a href='#4.4.2'>4.4.2 MultiDimensional Scaling (MDS)</a>
    - <a href='#4.5'>4.5 Hierarchical Agglomerative Clustering (HAC)</a>
    - <a href='#4.6'>4.6 Time Series KMeans Results</a>
- <a href='#5'>5. Cluster Series Extraction</a>
    - <a href='#5.1'>5.1 Cluster Series DBA</a>
- <a href='#6'>6. Time Series Embeddings</a>
- <a href='#7'>7. References</a>

In [None]:
# Some libraries installation
! git clone https://github.com/tejaslodaya/timeseries-clustering-vae.git
! pip install tslearn
! pip uninstall scikit-learn --yes 
! pip install scikit-learn==0.24.1

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os

from tslearn.clustering import TimeSeriesKMeans
from tslearn.barycenters import dtw_barycenter_averaging

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.manifold import TSNE, MDS
from sklearn.cluster import AgglomerativeClustering

from scipy.cluster.hierarchy import dendrogram
from tqdm.autonotebook import tqdm

warnings.filterwarnings("ignore")
sns.set_style("darkgrid")

SEED=23

### <a id='1'>1. Data Description</a>

So in total, we have 54 different stores where 33 different product categories are sold. A typical task that can be formulated in retail: "Sales prediction for each product category or even certain product in a certain store"

According to the data we have, we can solve several Time Series Clustering Tasks:
- Cluster product categories
- Cluster stores 

All these can help to:
- Find similar patterns in/between categories/stores 
- Reduce the number of models to be trained (cluster models)
- ...

**We are going to find similar stores according to sales feature**

In [None]:
# Data Reading 
data = pd.read_csv('../input/store-sales-time-series-forecasting/train.csv', parse_dates=['date'])
data.head()

First of all let's have a look at the number of unique stores and unique products that are sold

In [None]:
print('Number of Unique Stores: ', data['store_nbr'].unique().shape[0])
print('Unique Product Categories: ', data['family'].unique().shape[0])

In [None]:
# Make sure that each store has all these unique product categories
assert data.groupby(['store_nbr', 'family']).size().shape[0] == 54*33

In [None]:
# Time Series data for finding similar stores
store_sales = data.groupby(['date', 'store_nbr'], as_index=False).aggregate({'sales': 'sum'})

# Time Series data for finding similar product categories
product_sales = data.groupby(['date', 'store_nbr', 'family'], as_index=False).aggregate({'sales': 'sum'})

Let's find similar stores by sales feature. First, we have to make sure that the data is correct and we have no missing values

### <a id='2'>2. Dealing wtih Missing Data</a>

In [None]:
# Check if we have stores with zero sales 
stores_to_drop = []
for store in store_sales['store_nbr'].unique():
    if (store_sales.query(f'store_nbr == {store}')['sales'] == 0).all():
        stores_to_drop.append(store)
        
print('Stores with no sales: ', stores_to_drop)

# Exlude these stores
store_sales = store_sales[~store_sales['store_nbr'].isin(stores_to_drop)]

In [None]:
# Let's have a look at 5 different stores 
rand_stores = np.random.choice(store_sales['store_nbr'].unique(), 3)
selected_stores = store_sales[store_sales['store_nbr'].isin(rand_stores)][['date', 'store_nbr', 'sales']]

plt.figure(figsize=(20,10))
plt.grid(True)
sns.lineplot(data=selected_stores, x='date', y='sales', hue='store_nbr', ci=False, legend=True, palette='Set1');

There are missing values. It will affect the clustering and the results might be biased due to the noisy data

Let's apply a simple heuristic: threshold: 0.3% of missing data (everything above - drop)

In [None]:
# Find noisy series 
store_to_drop = []
for store in store_sales['store_nbr'].unique():
    current_store = store_sales.query(f'store_nbr == {store}')
    if current_store.query('sales == 0').shape[0]/current_store.shape[0] > 0.3:
        store_to_drop.append(store)
        
print('Noisy Stores: ', store_to_drop)

# Drop noisy stores
store_sales = store_sales[~store_sales['store_nbr'].isin(store_to_drop)]

Besides we have to fill missing values

In [None]:
# First convert zero values to NaN
store_sales['sales'] = store_sales['sales'].apply(lambda x: np.nan if x == 0 else x)

# Fill missing values 
res_df = pd.DataFrame()
for store in store_sales['store_nbr'].unique():
    current_store = store_sales.query(f'store_nbr == {store}')
    # Use interpolation to fill values between dates
    current_store['sales'].interpolate(method ='linear', limit_direction ='forward', inplace=True)
    # Some stores probably were closed on 01-01-YYYYY, fill it by the value on the next day
    current_store['sales'].fillna(method ='bfill', inplace=True)
    res_df = res_df.append(current_store)
    
store_sales = res_df

In [None]:
# Let's have a look at 5 different stores 
rand_stores = np.random.choice(store_sales['store_nbr'].unique(), 3)
selected_stores = store_sales[store_sales['store_nbr'].isin(rand_stores)][['date', 'store_nbr', 'sales']]

plt.figure(figsize=(20,10))
plt.grid(True)
sns.lineplot(data=selected_stores, x='date', y='sales', hue='store_nbr', ci=False, legend=True, palette='Set1');

### <a id='3'>3. Time Series Feature Extraction</a>

In general, time series clustering can be divided into 2 types:
- **Feature-Based approach**: we try to extract everything possible from the signal/time series (feature extraction)
- **Raw data-Based approach**: directly applied to time series vectors without any spatial transformations

In this notebook, we are going to use **Raw-data Based approach**. It means that we will have a matrix of features where:
- Rows: Different Time Series (stores in our case)
- Features: Time Observations (sales)

In this case, we will be clustering in a very high dimensional space and will most likely run into a problem known as the **Curse of Dimensionality**. As a result, obtained clusters may have sparse shapes, overlap with other clusters and so on.

To prevent this, we will need to use **dimensionality reduction methods** (t-SNE, PCA, MDS...)

### <a id='4'>4. Clustering Methods</a>

We will focus on the following clustering methods:
- `K-Means/TimeSeriesKMeans: (ts_learn library)`
- `Hierarchical Agglomerative Clustering`

But any known clustering algorithm can be applied

**If the time series is a signal** (data from various devices), then the best way to extract features would be methods from the `signal processing` area

For example, Fourier transformation for finding different frequencies, spectrograms and wavelet transformations

**If the series is noisy, then it would be nice to smooth it first** (various smoothing methods) so as not to find false patterns

### <a id='4.1'>4.1 Time Series Smoothing</a>
Nice, we don't have missing values **but the series is still looking noisy**. Let's apply moving average (window size = 7: weekly trend )

In [None]:
# Time Series Smoothing 
res_df = pd.DataFrame()
for store in store_sales['store_nbr'].unique():
    current_store = store_sales.query(f'store_nbr == {store}')
    current_store['smoth_7'] = current_store['sales'].rolling(7).mean()
    res_df = res_df.append(current_store[['date', 'store_nbr', 'smoth_7']])
    
store_sales = res_df
store_sales = store_sales.dropna()

In [None]:
# Let's have a look at 3 different stores 
rand_stores = np.random.choice(store_sales['store_nbr'].unique(), 3)
selected_stores = store_sales[store_sales['store_nbr'].isin(rand_stores)][['date', 'store_nbr', 'smoth_7']]

plt.figure(figsize=(20,10))
plt.grid(True)
sns.lineplot(data=selected_stores, x='date', y='smoth_7', hue='store_nbr', ci=False, legend=True, palette='Set1');

After smoothing we can get more insights about the series as well as define similarities between them

Initial preprocessing has been done and we can create the main feature matrix  

In [None]:
# Feature matrix with shape (n_series x time_observations)
series_df = store_sales.pivot(index='store_nbr', columns='date', values='smoth_7')
series_df = series_df.dropna(axis='columns')
series_df.head()

### <a id='4.2'>4.2 Time Series Scaling</a>
Scaling must be applied to each series independently

In [None]:
# Scaling
scaler = StandardScaler()

# First transposition - to have series in columns (allows scaling each series independently)
# Second Transposition - come back to initial feature matrix shape (n_series x time_observations)
scaler = StandardScaler()
scaled_ts = scaler.fit_transform(series_df.T).T 

### <a id='4.3'>4.3 Time Series K-Means</a>

When using `K-Means` clustering, it is better to use the **Feature-Based Approach**. We extract a bunch of features from the series and hope that they will describe the time series well then perform clustering. I'd like to demonstrate **out of the box solution** (Raw-Data Approach). For Feature-Based Approach, you have to get features for each series and group them using any clustering algorithm. These libraries can help: 
- <a href='https://github.com/fraunhoferportugal/tsfel'>ts_fel</a>
- <a href='https://github.com/blue-yonder/tsfresh'>ts_fresh</a>

It is important how we define the similarity between observations in a feature space. When using KMeans we can use:
- `Euclidean distance` 
- `Dynamic Time Warping Matching (DTW)`


When using <a href='https://tslearn.readthedocs.io/en/stable/user_guide/dtw.html'> Dynamic Time Warping Matching </a> the **Feature-Based approach is not suitable**, since we are trying to determine a measure of the similarity of the series (how they overlap, peaks size/similarity/location...)

For `DTW` better downsample the series using `resampling` (i.e. change the frequency of the series). For example, instead of minute observations/ticks, take 5/10/15 minute ones but we have to keep in mind that the main patterns (peaks, fluctuations) fall into this interval. It allows keeping the series structure, making it shorter and therefore much faster to identify similar series with `DTW`


First, apply KMeans algorithm from <a href='https://tslearn.readthedocs.io/en/stable/index.html'>ts_learn library</a>

In [None]:
# Run KMeans and plot the results 
def get_kmeans_results(data, max_clusters=10, metric='euclidean', seed=23):
    """
    Runs KMeans n times (according to max_cluster range)

    data: pd.DataFrame or np.array
        Time Series Data
    max_clusters: int
        Number of different clusters for KMeans algorithm
    metric: str
        Distance metric between the observations
    seed: int
        random seed
    Returns: 
    -------
    None      
    """
    # Main metrics
    distortions = []
    silhouette = []
    clusters_range = range(1, max_clusters+1)
    
    for K in tqdm(clusters_range):
        kmeans_model = TimeSeriesKMeans(n_clusters=K, metric=metric, n_jobs=-1, max_iter=10, random_state=seed)
        kmeans_model.fit(data)
        distortions.append(kmeans_model.inertia_)
        if K > 1:
            silhouette.append(silhouette_score(data, kmeans_model.labels_))
        
    # Visualization
    plt.figure(figsize=(10,4))
    plt.plot(clusters_range, distortions, 'bx-')
    plt.xlabel('k')
    plt.ylabel('Distortion')
    plt.title('Elbow Method')
    
    plt.figure(figsize=(10,4))
    plt.plot(clusters_range[1:], silhouette, 'bx-')
    plt.xlabel('k')
    plt.ylabel('Silhouette score')
    plt.title('Silhouette');

Let's try finding similar series using DTW metric

In [None]:
%%time

# Run the algorithm using DTW algorithm 
get_kmeans_results(data=scaled_ts, max_clusters=5, metric='dtw', seed=SEED)

Well, we can hardly say anything according to Elbow Method but Silhouette says that 2 clusters are good

Let's have a look at obtained clusters 

In [None]:
# Visualization for obtained clusters   
def plot_clusters(data, cluster_model, dim_red_algo):
    """
    Plots clusters obtained by clustering model 

    data: pd.DataFrame or np.array
        Time Series Data
    cluster_model: Class
        Clustering algorithm 
    dim_red_algo: Class
        Dimensionality reduction algorithm (e.g. TSNE/PCA/MDS...) 
    Returns:
    -------
    None
    """
    cluster_labels = cluster_model.fit_predict(data)
    centroids = cluster_model.cluster_centers_
    u_labels = np.unique(cluster_labels)
    
    # Centroids Visualization
    plt.figure(figsize=(16, 10))
    plt.scatter(centroids[:, 0] , centroids[:, 1] , s=150, color='r', marker="x")
    
    # Downsize the data into 2D
    if data.shape[1] > 2:
        data_2d = dim_red_algo.fit_transform(data)
        for u_label in u_labels:
            cluster_points = data[(cluster_labels == u_label)]
            plt.scatter(cluster_points[:, 0], cluster_points[:, 1], label=u_label)
    else:
        for u_label in u_labels:
            cluster_points = data[(cluster_labels == u_label)]
            plt.scatter(cluster_points[:, 0], cluster_points[:, 1], label=u_label)

    plt.title('Clustered Data')
    plt.xlabel("Feature space for the 1st feature")
    plt.ylabel("Feature space for the 2nd feature")
    plt.grid(True)
    plt.legend(title='Cluster Labels');

In [None]:
%%time

# let's look at the cluster shape
model = TimeSeriesKMeans(n_clusters=2, metric='dtw', n_jobs=-1, max_iter=10, random_state=SEED)

plot_clusters(data=scaled_ts,
              cluster_model=model,
              dim_red_algo=TSNE(n_components=2, init='pca', random_state=SEED))

Clusters overlap and cluster number 2 looks like a noise

In [None]:
# let's compare with the euclidean metric
get_kmeans_results(data=scaled_ts, max_clusters=5, metric='euclidean', seed=SEED)

The results are much worse in comparison with `DTW` algorithm. Let's try downsizing the features

### <a id='4.4'>4.4 Downsizing Feature Space</a> 

Let's apply dimensionality reduction methods (t-SNE, MDS, VRAE...)

### <a id='4.4.1'>4.4.1 t-SNE</a> 

In [None]:
# Downsize the features into 2D
tsne = TSNE(n_components=2, init='pca', random_state=SEED)
data_tsne = tsne.fit_transform(scaled_ts)

get_kmeans_results(data=data_tsne, max_clusters=10, metric='euclidean', seed=SEED)

In [None]:
# let's look at the cluster shape
model = TimeSeriesKMeans(n_clusters=2, metric='euclidean', n_jobs=-1, max_iter=10, random_state=SEED)

plot_clusters(data=data_tsne,
              cluster_model=model,
              dim_red_algo=TSNE(n_components=2, init='pca', random_state=SEED))

Cluster shape is relatively good, observations don't overlap but are a bit sparse

### <a id='4.4.2'>4.4.2 MultiDimensional Scaling (MDS)</a> 

In [None]:
mds = MDS(n_components=2, n_init=3, max_iter=100, random_state=SEED)
data_mds = mds.fit_transform(scaled_ts) 

get_kmeans_results(data=data_mds, max_clusters=10, metric='euclidean', seed=SEED)

In [None]:
# let's look at the cluster shape
model = TimeSeriesKMeans(n_clusters=2, metric='euclidean', n_jobs=-1, max_iter=10, random_state=SEED)

plot_clusters(data=data_mds,
              cluster_model=model,
              dim_red_algo=TSNE(n_components=2, init='pca', random_state=SEED))

We can choose between 2 and 5 clusters

### <a id='4.5'>4.5 Hierarchical Agglomerative Clustering (HAC)</a> 

In [None]:
# HAC clustering (similar to get_kmeans_results function)
def get_hac_results(data, max_clusters=10, linkage='euclidean', seed=23):
    silhouette = []
    clusters_range = range(2, max_clusters+1)
    for K in tqdm(clusters_range):
        model = AgglomerativeClustering(n_clusters=K, linkage=linkage)
        model.fit(data)
        silhouette.append(silhouette_score(data, model.labels_))
        
    # Plot
    plt.figure(figsize=(10,4))
    plt.plot(clusters_range, silhouette, 'bx-')
    plt.xlabel('k')
    plt.ylabel('Silhouette score')
    plt.title('Silhouette')
    plt.grid(True);

In [None]:
# Look at all results at a time 
features_df = [scaled_ts, data_tsne, data_mds]
for df in features_df:
    get_hac_results(data=df, max_clusters=10, linkage='ward', seed=SEED)

Let's choose 3 clusters with MDS features

In [None]:
def plot_dendrogram(data, model, figsize=(16,10), **kwargs):
    """
    Plots a dendogram using HAC 

    data: pd.DataFrame or np.array
        Time Series Data
    model: Class
        Clustering Model 
    figsize: tuple
        Figure size
    Returns:
    -------
    None 
    """
    model.fit(data)
    counts = np.zeros(model.children_.shape[0])
    n_samples = len(model.labels_)
    for i, merge in enumerate(model.children_):
        current_count = 0
        for child_idx in merge:
            if child_idx < n_samples:
                current_count += 1  
            else:
                current_count += counts[child_idx - n_samples]
        counts[i] = current_count

    linkage_matrix = np.column_stack([model.children_, model.distances_, counts]).astype(float)
    
    plt.figure(figsize=figsize, dpi=200)
    dendrogram(linkage_matrix, **kwargs)
    plt.title('Dendogram')
    plt.xlabel('Objects')
    plt.ylabel('Distance')
    plt.grid(False)
    plt.tight_layout();

In [None]:
# Dendrogram
model = AgglomerativeClustering(n_clusters=3, linkage='ward', affinity='euclidean', compute_distances=True)

plot_dendrogram(data=features_df[-1],
                model=model,
                color_threshold=60)

###  <a id='4.6'>4.6 Time Series KMeans Results</a> 
Finally, we will choose TimeSeriesKMeans with downsized features using MDS and 5 clusters. I think that two clusters are simply not enough. It's likely that the data is various and with 5 clusters we will get clusters with similar series.

In [None]:
# Train TimeSeriesKMeans with MDS
kmeans_model = TimeSeriesKMeans(n_clusters=5, metric='euclidean', n_jobs=-1, max_iter=10, random_state=SEED)
cluster_labels = kmeans_model.fit_predict(data_mds)

ts_clustered = [ scaled_ts[(cluster_labels == lable), :] for lable in np.unique(cluster_labels)]

In [None]:
# Objects distribution in the obtained clusters 
labels = [f'Cluster_{i}' for i in range(len(ts_clustered))]
samples_in_cluster = [val.shape[0] for val in ts_clustered]

plt.figure(figsize=(16,5))
plt.bar(labels, samples_in_cluster);

In [None]:
 def plot_cluster_ts(current_cluster):
    """
    Plots time series in a cluster 

    current_cluster: np.array
        Cluster with time series 
    Returns:
    -------
    None 
    """
    fig, ax = plt.subplots(
        int(np.ceil(current_cluster.shape[0]/4)),4,
        figsize=(45, 3*int(np.ceil(current_cluster.shape[0]/4)))
    )
    fig.autofmt_xdate(rotation=45)
    ax = ax.reshape(-1)
    for indx, series in enumerate(current_cluster):
        ax[indx].plot(series)
        plt.xticks(rotation=45)

    plt.tight_layout()
    plt.show();

Let's have a look at the obtained clusters

In [None]:
for cluster in range(len(ts_clustered)):
    print(f"==========Cluster number: {cluster}==========")
    plot_cluster_ts(ts_clustered[cluster])

Most of the series within its cluster are alike and it is cool. We have found out that all the stores can be clustered into 5 different groups. There are stores that have the same patterns 

### <a id='5'>5. Cluster Series Extraction</a>
Alright, we clustered the series data, what's next? Well, it depends on the task you are dealing with. Probably, after clustering the series you will want to get a cluster series (a series that describes all the series in the cluster)

There are several options:
- Use cluster centroid 
- Take the mean of all the series in a cluster
- Takes a series that has a minimum distance to the cluster centroid 
- <a href='https://tslearn.readthedocs.io/en/stable/variablelength.html#barycenter-computation'>DBA method</a>

We will cover:
- DBA
- Cluster Mean
- Closest Series to Cluster Centroid 

In [None]:
# Closest Series to Cluster Centroid
closest_clusters_indxs = [np.argmin([np.linalg.norm(cluster_center - point, ord=2) for point in data_mds]) \
                                                                        for cluster_center in kmeans_model.cluster_centers_]

closest_ts = scaled_ts[closest_clusters_indxs, :]

In [None]:
# DBA
dba_ts = [dtw_barycenter_averaging(cluster_serieses, max_iter=10, verbose=True) for cluster_serieses in ts_clustered]

Let's compare how a certain method affects a final cluster shape

**Choose a cluster with a few series. This will help to see the differences between the algorithms!**

In [None]:
CLUSTER_N = 2

plt.figure(figsize=(25, 5))
plt.plot(ts_clustered[CLUSTER_N].T,  alpha = 0.4) # all series in the cluster_1
plt.plot(closest_ts[CLUSTER_N], c = 'r', label='Cluster Time Series')
plt.title('Cluster Series - Closest to Cluster Centroid. Cluster 1')
plt.legend();

plt.figure(figsize=(25, 5))
plt.plot(ts_clustered[CLUSTER_N].T,  alpha = 0.4) 
plt.plot(np.mean(ts_clustered[CLUSTER_N], axis=0), c = 'r', label='Cluster Time Series')
plt.title('Cluster Series - Cluster Mean. Cluster 1')
plt.legend();

plt.figure(figsize=(25, 5))
plt.plot(ts_clustered[CLUSTER_N].T,  alpha = 0.4) 
plt.plot(dba_ts[CLUSTER_N], c = 'r', label='Cluster Time Series')
plt.title('Cluster Series - DBA. Cluster 1')
plt.legend();

Why not choose the first option? Well, it has a big spike and doesn't describe all series data. As a solution, smoothing can be applied (I think it's always a good idea to apply smoothing in this case because noisy series might be chosen)

DBA or Mean method look good. Both can be chosen!

### <a id='5.1'>5.1 Cluster Series DBA</a>
All clusters series extracted by DBA

In [None]:
for indx, series in enumerate(dba_ts):
    plt.figure(figsize=(25, 5))
    plt.plot(ts_clustered[indx].T,  alpha = 0.15)
    plt.plot(series, c = 'r', label='Cluster Time Series')
    plt.title(f'Scaled Sales. Cluster {indx}')
    plt.legend();

### <a id='6'>6. Time Series Embeddings</a>

In this approach, we will train NN (Recurrent Auto-encoders with LSTM / GRU blocks) and get compressed vector representations of series (embeddings)

Trying to train the encoder and decoder in such a way that in all the variety of data at the input they would receive series close to each other, and those that differ were separated, according to the distance that we choose.

The algorithm is trained in unsupervised mode. Obtained embeddings will be clustered in the end 

In [None]:
os.chdir('./timeseries-clustering-vae')

from vrae.vrae import VRAE
from vrae.utils import *

import torch
import plotly
from torch.utils.data import DataLoader, TensorDataset
plotly.offline.init_notebook_mode()

In [None]:
vrae_df = scaled_ts.copy()
dload = '/content/timeseries_clustering_vae/' 

In [None]:
# Model Params
hidden_size = 50
hidden_layer_depth = 1
latent_length = 20
batch_size = 5
learning_rate = 0.005
n_epochs = 40
dropout_rate = 0.1
optimizer = 'Adam' # Adam/SGD
cuda = True # Train on GPU
print_every=30
clip = True 
max_grad_norm=5
loss = 'MSELoss' # SmoothL1Loss/MSELoss
block = 'LSTM' # LSTM/GRU

In [None]:
# We don't use test_df, create train_df using all the data we have
X_train = np.expand_dims(scaled_ts, -1)
train_dataset = TensorDataset(torch.from_numpy(X_train))

sequence_length = X_train.shape[1] 
number_of_features = X_train.shape[2] 

In [None]:
# Model Creation 
vrae = VRAE(sequence_length=sequence_length,
            number_of_features = number_of_features,
            hidden_size = hidden_size, 
            hidden_layer_depth = hidden_layer_depth,
            latent_length = latent_length,
            batch_size = batch_size,
            learning_rate = learning_rate,
            n_epochs = n_epochs,
            dropout_rate = dropout_rate,
            optimizer = optimizer, 
            cuda = cuda,
            print_every=print_every, 
            clip=clip, 
            max_grad_norm=max_grad_norm,
            loss = loss,
            block = block,
            dload = dload)

In [None]:
%%time 

vrae.fit(train_dataset)

In [None]:
# Get embeddings
embeddings = vrae.transform(train_dataset)

# Cluster the embeddings
get_kmeans_results(data=embeddings, max_clusters=10, metric='euclidean', seed=SEED)

In [None]:
model = TimeSeriesKMeans(n_clusters=3, metric='euclidean', n_jobs=-1, max_iter=10, random_state=SEED)
 
plot_clusters(data=embeddings,
              cluster_model=model,
              dim_red_algo=TSNE(n_components=2, init='pca', random_state=SEED))

Obtained a good cluster shape can apply this technique as well!

### <a id='7'>7. References</a>
- https://www.kaggle.com/izzettunc/introduction-to-time-series-clustering/notebook