# `DTWSampler` for clustering

The original application of the `DTWSampler` was to allow clustering of resampled data using standard ML algorithms.
In this notebook, we explore the case of $k$-means clustering but other algorithms could be used.

In [1]:
import numpy
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans

from sampler import DTWSampler

# preparing data, nothing very interesting here...

data = []
data.append(numpy.loadtxt("data/Xi_ref.txt"))
data.append(numpy.loadtxt("data/Xi_0.txt"))
data.append(numpy.loadtxt("data/Xi_1.txt"))

d = data[0].shape[1]

max_sz = max([ts.shape[0] for ts in data])
n_rep = 5

npy_arr = numpy.zeros((len(data) * n_rep, max_sz, d)) + numpy.nan
std_per_d = None
for idx_rep in range(n_rep):
    for idx, ts in enumerate(data):
        sz = ts.shape[0]
        npy_arr[idx + idx_rep * len(data), :sz] = ts + 0.1 * numpy.random.randn(sz, d) * ts.std(axis=0)

In [2]:
print(npy_arr.shape)

(15, 13, 3)


As described in the README, data should a 2-dimensional array, so we should reshape it:

In [3]:
npy_arr = npy_arr.reshape(-1, max_sz * d)
print(npy_arr.shape)

(15, 39)


Now, we just have to prepare our `Pipeline` and go:

In [4]:
s = DTWSampler(scaling_col_idx=0, reference_idx=0, d=3, interp_kind="linear")
km = KMeans(n_clusters=3)

dtw_kmeans = Pipeline([('dtw_sampler', s), ('l2-kmeans', km)])
labels = dtw_kmeans.fit_predict(npy_arr)
print(labels)

[2 0 1 2 0 1 2 0 1 2 0 1 2 0 1]
