<h1> Time series distances </h1>
<h2> Distances </h2>
There are two main computations we may want to do for calculating distance
between time series.
<br><br>
The first is computing the distance between the time series x and the
time series y. Our desired return is a single float that represents the distance
between the two specified time series.
<br><br>
The second task is to find the distance between multiple time series in a singular
matrix or between two matricies (i.e. a dataset of time series). The computation of
this is known as a pairwise distance as for every time series in matrix x, we want
to find the distance between it and every time series in matrix y (i.e a pair of time
series).
Therefore the desired return is a 2d array of size length x by length y where each
index value is the distance between the corresponding time series in matrix x and y.
<br><br>
Sktime offers solutions to both these tasks. To find the distance between two time
series described first, the 'distance' function can be used. To find the
pairwise distance between a single matrix or two matricies of time series the
'pairwise' function can be used. These are described below.

<h3> distance(x, y, metric_str, **kwargs)

In [1]:
import numpy as np
from sktime.metrics.distances.distance import distance

In [20]:
# 1d univariate
x_univariate_1d = [0.81217268, -0.30587821, -0.26408588, -0.53648431, 0.43270381]
y_univariate_1d = [-1.15076935, 0.87240588, -0.38060345, 0.15951955, -0.12468519]

# Call distance and use euclidean distance
distance(x_univariate_1d, y_univariate_1d, 'euclidean')

TypeError: No valid mtype could be identified

In [26]:
# 2d univariate
x_univariate_2d = [[0.81217268], [-0.30587821], [-0.26408588], [-0.53648431],
                   [0.43270381]]
y_univariate_2d = [[-1.15076935], [0.87240588], [-0.38060345], [0.15951955],
                   [-0.12468519]]

# Call distance and use euclidean distance
distance(x_univariate_2d, y_univariate_2d, 'euclidean')

NotImplementedError: no conversion defined from type numpyflat to nested_univ

In [4]:
# Generate multivariate example
x_multivariate = [
    [ 0.81217268, -0.30587821, -0.26408588, -0.53648431,  0.43270381],
    [0.73105397, -1.03007035, -0.1612086 , -0.19202718,  0.56688472],
    [-0.55030959,  0.57236185,  0.45079536,  0.25124717,  0.45042797],
    [-0.34583038, -0.19837676, -0.34358635, -0.42260282, -0.33562307],
    [-0.09591778, -0.44381448, -0.37357915,  0.8462273 ,  0.02540388]
]

y_multivariate = [
    [ 0.15008516, -0.17612492, -0.5712591 , -0.17467136, -0.10444712],
    [-0.37719897,  0.62643408,  0.25646491, -0.14904642,  0.24425907],
    [-0.7220569 , -0.25223293,  0.08001853,  0.43808446,  0.15781747],
    [-0.11116407, -0.10037903,  0.0932807 ,  0.20502582,  0.09914986],
    [ 0.59945894,  0.09257821, -0.18764248, -0.3193652 ,  0.21174718]
]

# Call distance and use euclidean distance
distance(x_multivariate, y_multivariate, 'euclidean')

6.4287946355366845

The above uses euclidean to compute the distance between the two series. There are
many other distances avaliable. A full list of avaliable distances can be
retrieved by calling the following:

In [5]:
from sktime.metrics.distances.distance import get_available_distances
get_available_distances()

['squared',
 'braycurtis',
 'canberra',
 'chebyshev',
 'cityblock',
 'correlation',
 'cosine',
 'dice',
 'euclidean',
 'hamming',
 'jaccard',
 'jensenshannon',
 'kulsinski',
 'matching',
 'minkowski',
 'rogerstanimoto',
 'russellrao',
 'seuclidean',
 'sokalmichener',
 'sokalsneath',
 'sqeuclidean',
 'yule']

Any of the above string distances are valid distance string to pass to distance (or
pairwise see below).

A distance can also be computed by passing a callable or BaseDistance to distance as
shown below

In [6]:
# BaseDistance example
from sktime.metrics.distances.base.base import BaseDistance

class AbsoluteDistance(BaseDistance):

    def _distance(self, x: np.ndarray, y: np.ndarray) -> float:
        distance = 0.0

        for i in range(x.shape[0]):
            distance += np.sum(np.abs(x[i] - y[i]))

        return distance

distance(x_multivariate, y_multivariate, AbsoluteDistance())

11.99415645

In [7]:
# Callable example
def absolute_distance_callable(x: np.ndarray, y: np.ndarray) -> float:
        distance = 0.0

        for i in range(x.shape[0]):
            distance += np.sum(np.abs(x[i] - y[i]))

        # Must return a float
        return distance

distance(x_multivariate, y_multivariate, absolute_distance_callable)

11.99415645

<h4>Passing addition arguments to the metric.</h4>

In [8]:
# Note this works regardless of how the metric is passed
# The example below passes a function for clarity but doing the
# same passing a str distance or a BaseDistance works as well.
# See example of passing kwargs to a BaseDistance below (at bottom of pairwise)

# NOTE: This is a completely made up distance for example it does not work well
def distance_callable_with_arguments(
        x: np.ndarray,
        y: np.ndarray,
        parama: int,
        paramb: int
):
    distance = 0.0
    print("parama", parama)
    print("paramb", paramb)

    for i in range(x.shape[0]):
        distance += np.sum(x[i] + y[i] - parama + paramb)

    return distance

# Call distance with key word arguments specified
distance(x_multivariate,
         y_multivariate,
         distance_callable_with_arguments,
         parama=1,
         paramb=5)

parama 1
paramb 5


99.54869969

<h3> pairwise(x, y, metric_str, **kwargs) </h3>

In [9]:
from sktime.metrics.distances.distance import pairwise

# Generate two univariate matrix
x_univariate_matrix = [
    [
        [0.81217268],
        [-0.30587821]
    ],
    [
        [-0.26408588],
        [-0.53648431]
    ],
    [
        [0.43270381],
        [0.23648431]
    ],
    [
        [-0.22270381],
        [0.12648431]
    ],
    [
        [0.12370381],
        [0.91348431]
    ]
]

y_univariate_matrix = [
    [
        [0.12217268],
        [0.32187821]
    ],
    [
        [0.22208588],
        [0.53648431]
    ],
    [
        [-0.43270381],
        [0.33648431]
    ],
    [
        [-0.64320381],
        [0.44648431]
    ],
    [
        [0.92130381],
        [-0.91348431]
    ]
]

# Call distance and use euclidean distance
pairwise(x_univariate_matrix, y_univariate_matrix, 'euclidean')

array([[1.31775642, 1.43244932, 1.88723901, 2.20773901, 0.71673723],
       [1.24462108, 1.55914038, 1.04158655, 1.36208655, 1.56238969],
       [0.39592503, 0.51061793, 0.96540762, 1.28590762, 1.63856862],
       [0.54027039, 0.85478969, 0.42      , 0.7405    , 2.18397624],
       [0.59313723, 0.47538207, 1.13340762, 1.23390762, 2.62456862]])

In [10]:
# This is also a valid univariate matrix
x_univariate_matrix = [
    [0.81217268, -0.30587821],
    [-0.26408588, -0.53648431],
    [0.43270381, 0.23648431],
    [-0.22270381, 0.12648431],
    [0.12370381, 0.91348431]
]

y_univariate_matrix = [
    [0.12217268, 0.32187821],
    [0.22208588, 0.53648431],
    [-0.43270381, 0.33648431],
    [-0.64320381, 0.44648431],
    [0.92130381, -0.91348431]
]

# Call distance and use euclidean distance
pairwise(x_univariate_matrix, y_univariate_matrix, 'euclidean')

array([[1.31775642, 1.43244932, 1.88723901, 2.20773901, 0.71673723],
       [1.24462108, 1.55914038, 1.04158655, 1.36208655, 1.56238969],
       [0.39592503, 0.51061793, 0.96540762, 1.28590762, 1.63856862],
       [0.54027039, 0.85478969, 0.42      , 0.7405    , 2.18397624],
       [0.59313723, 0.47538207, 1.13340762, 1.23390762, 2.62456862]])

In [11]:
from sktime.metrics.distances.distance import pairwise

# Generate two multivariate matrix
x_multivariate_matrix = [
    [
        [0.81217268, 0.21334264, -0.21435289],
        [-0.30587821, -0.32135832, 0.63219321]
    ],
    [
        [-0.26408588, 0.25368532, -0.12345678],
        [-0.53648431, 0.82124242, 0.15327643]
    ],
    [
        [0.43270381, 0.32164321, -0.32163321],
        [0.23648431, 0.12323462, 0.63421133]
    ],
]

y_multivariate_matrix = [
    [
        [0.12217268, 0.21236321, 0.12453321],
        [0.32187821, -0.34543123, 0.53213532]
    ],
    [
        [0.22208588, -0.64312342, -0.53231232],
        [0.53648431, 0.85321234, 0.21843212]
    ],
    [
        [-0.43270381, 0.92421532, 0.82412434],
        [0.33648431, 0.22144332, -0.25312342]
    ],
]

# Call distance and use euclidean distance
pairwise(x_multivariate_matrix, y_multivariate_matrix, 'euclidean')

array([[1.40486546, 2.5910411 , 2.99125241],
       [1.95801767, 2.17441617, 2.30746759],
       [1.04166462, 1.90175546, 2.45550794]])

The above uses euclidean to compute the pairwise between the two series. There are
many other distances available. A full list of available distances can be
retrieved by calling the following:

In [12]:
get_available_distances()

['squared',
 'braycurtis',
 'canberra',
 'chebyshev',
 'cityblock',
 'correlation',
 'cosine',
 'dice',
 'euclidean',
 'hamming',
 'jaccard',
 'jensenshannon',
 'kulsinski',
 'matching',
 'minkowski',
 'rogerstanimoto',
 'russellrao',
 'seuclidean',
 'sokalmichener',
 'sokalsneath',
 'sqeuclidean',
 'yule']

A pairwise can also be computed by passing a callable or BaseDistance to distance as
shown below

In [13]:
pairwise(x_multivariate_matrix, y_multivariate_matrix, AbsoluteDistance())

array([[1.78175275, 4.19520656, 5.06470719],
       [3.07946572, 2.96193027, 3.66589662],
       [1.52211331, 2.8318206 , 3.69928073]])

In [14]:
def absolute_pairwise_distance(x, y):
    x_size = x.shape[0]
    y_size = y.shape[0]

    pairwise_matrix = np.zeros((x_size, y_size))

    for i in range(x_size):
        curr_x = x[i]
        for j in range(y_size):
            pairwise_matrix[i, j] = \
                absolute_distance_callable(curr_x, y[j])

    return pairwise_matrix

pairwise(x_multivariate_matrix, y_multivariate_matrix, absolute_pairwise_distance)

array([[1.78175275, 4.19520656, 5.06470719],
       [3.07946572, 2.96193027, 3.66589662],
       [1.52211331, 2.8318206 , 3.69928073]])

Pairwise as shown above can be between two different time series matrix or can be
computed on a singular time series matrix as shown below:

In [15]:
pairwise(x_multivariate_matrix, metric='euclidean')

array([[0.        , 2.34103248, 1.11024509],
       [2.34103248, 0.        , 1.87477203],
       [1.11024509, 1.87477203, 0.        ]])

In [16]:
# These two are equivalent
pairwise(x_multivariate_matrix, x_multivariate_matrix, metric='euclidean')

array([[0.        , 2.34103248, 1.11024509],
       [2.34103248, 0.        , 1.87477203],
       [1.11024509, 1.87477203, 0.        ]])

<h4>Passing addition arguments to the metric.</h4>

In [17]:
# Note this works regardless of how the metric is passed
# The example below passes a BaseDistance but doing the
# same passing a str distance or a function works as well.
# See above for example using a function and kwargs

# NOTE: This is a completely made up distance for example it does not work well
class DistanceWithArguments(BaseDistance):

    def __init__(self, parama: int, paramb: int):
        self.parama: int = parama
        self.paramb: int = paramb

    def _distance(
        self,
        x: np.ndarray,
        y: np.ndarray,
    ) -> float:
        distance = 0.0
        for i in range(x.shape[0]):
            distance += np.sum(x[i] + y[i] - self.parama + self.paramb)

        return distance

# Call distance with key word arguments specified
pairwise(
    x_multivariate_matrix,
    y_multivariate_matrix,
    DistanceWithArguments( parama=1, paramb=5)
)

array([[25.78377051, 25.47089802, 26.43655917],
       [25.2718286 , 24.95895611, 25.92461726],
       [26.39429547, 26.08142298, 27.04708413]])