# Lab 3: Alignment-based metrics in Machine Learning

* Author: Romain Tavenard (@rtavenar)
* License: CC-BY-NC-SA

A lab session from a course on Machine Learning for Time Series at ENSAI.
One can find lecture notes for this course [there](https://rtavenar.github.io/ml4ts_ensai/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from tslearn.metrics import dtw, soft_dtw
from tslearn.barycenters import dtw_barycenter_averaging, softdtw_barycenter

# Data loading

**Question #1.** Using the [`CachedDatasets`](https://tslearn.readthedocs.io/en/stable/gen_modules/datasets/tslearn.datasets.CachedDatasets.html#tslearn.datasets.CachedDatasets)
utility from ``tslearn``, load the "Trace" time series dataset.
What are the dimensions of an array storing a time series dataset?
Create a new dataset `X_subset` made of 50 random time series from classes indexed 1 to 3 (`y_train < 4`)
in the training set.

# $k$-means clustering

**Question #2.** Implement the Lloyd's algorithm for a $k$-means that would use soft-DTW as 
its base metric.
You can rely on ``tslearn`` functions (see imports above) for "distance" computations and barycenter
estimation.
Your function should return both the current assignments and the barycenters.
Check that it runs smoothly for a few iterations on `X_subset` (leave quantitative evaluation aside for now).

In [None]:
def kmeans_soft_dtw(X, gamma, k, max_iter=10):
    n_timeseries, n_timestamps, n_features = X.shape

    # Init barycenters at random
    barycenters = X[np.random.randint(n_timeseries, size=k)]

    for e in range(max_iter):
        # Assign a cluster to each time series
        assign = np.zeros((n_timeseries, ), dtype=int)
        # for i in range(n_timeseries):
            # TODO
            # assign[i] = ...
        # Update centroids (barycenters) for each cluster
        # for j in range(k):
            # TODO
            # barycenters[j] = ...
    
    return assign, barycenters



**Question #3.** Implement the Lloyd's algorithm for a $k$-means that would use **DTW** as 
its base metric.
You can rely on ``tslearn`` functions (see imports above) for "distance" computations and barycenter
estimation.
Your function should return both the current assignments and the barycenters.
Check that it runs smoothly for a few iteration on `X_subset` (leave quantitative evaluation aside for now).

In [None]:
def kmeans_soft_dtw(X, k, max_iter=10):
    n_timeseries, n_timestamps, n_features = X.shape

    # Init barycenters at random
    barycenters = X[np.random.randint(n_timeseries, size=k)]

    for e in range(max_iter):
        # Assign a cluster to each time series
        assign = np.zeros((n_timeseries, ), dtype=int)
        # for i in range(n_timeseries):
            # TODO
            # assign[i] = ...
        # Update centroids (barycenters) for each cluster
        # for j in range(k):
            # TODO
            # barycenters[j] = ...
    
    return assign, barycenters

**Question #4.** Implement a function that would assess the quality of a clustering in terms of 
intra-cluster inertia, computed using **DTW** as the base metric.
Your function should take a time series dataset, corresponding assignments and barycenters as inputs.

In [None]:
def dtw_cost(X, assign, barycenters):
    n_timeseries, n_timestamps, n_features = X.shape

    total_cost = 0.
    # TODO
    
    return total_cost


**Question #5.** Compare your $k$-means implementations in terms of DTW inertia.
For a fair comparison, make sure that they are initialized similarly, by appropriately setting your
random number generator seeds.
What do you observe? Is that expected / Do you have an explanation for that?

# Multi-step ahead forecasting

In this section, your goal will be to implement a single-hidden-layer perceptron for time series forecasting.
Your network will be trained to minimize normalized soft-DTW[^1].

To do so, we will rely on a `torch`-compatible implementation of soft-DTW [available in `tslearn`](https://tslearn.readthedocs.io/en/stable/gen_modules/metrics/tslearn.metrics.SoftDTWLossPyTorch.html).

[^1]: Normalized soft-DTW (also coined soft-DTW divergence) between time series $\mathbf{x}$ and 
$\mathbf{x}^\prime$ is defined as: 
$$\text{soft-DTW}(\mathbf{x}, \mathbf{x}^\prime) - \frac{1}{2} \left( \text{soft-DTW}(\mathbf{x}, \mathbf{x}) + \text{soft-DTW}(\mathbf{x}^\prime, \mathbf{x}^\prime) \right)$$


**Question #6.** Define an MLP model that would allow training
a single-hidden-layer model using normalized soft-DTW as a criterion to be optimized.
Train your network for 200 epochs on a forecasting task that would consist, given the first 150 elements
of a time series, in predicting the next 125 ones. You can use the training loop provided below:

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset

def train_model(model, X, y, epochs, criterion, optimizer):
    dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())
    dataloader = DataLoader(dataset, batch_size=10, shuffle=True)
    for epoch in range(epochs):
        running_loss = 0.0
        for inputs, targets in dataloader:
            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, targets).mean()

            # Backward pass and optimization
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        epoch_loss = running_loss / len(dataloader)

        print(f'Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss:.4f}')

**Question #7.** Use the following code block to qualitatively assess quality of your 
network's predictions.
Do not hesitate to change the time series index to visualize the result for different time series.

In [None]:
import matplotlib.pyplot as plt

ts_index = 50

y_pred = model(torch.tensor(X_test[:, :150]).float()).detach().numpy()

plt.figure()
plt.plot(X_test[ts_index].ravel())
plt.plot(np.arange(150, 275), y_pred[ts_index], 'r-')
plt.show()