# Notebook for studying the Matrix Profile

#### [References]
- Matrix Profile
 - https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html
    
- STUMPY documentation:  
 - https://stumpy.readthedocs.io/en/latest/  

- The Matrix Profile
 - https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html

- GitHub repository of stumpy
 - https://github.com/TDAmeritrade/stumpy

---
## z-normalized Euclidean distanceについて
参考(Youtube)：[Time Series data Mining Using the Matrix Profile part 2](https://www.youtube.com/watch?v=LnQneYvg84M&t=374s)

以下の2つの時系列 $\boldsymbol{x}$, $\boldsymbol{y}$ に対するz-normalized Euclidean distanceを考える。

$$
    \boldsymbol{x} = x_{1}, x_{2}, \cdots , x_{n} \\
    \boldsymbol{y} = y_{1}, y_{2}, \cdots , y_{n} \\
$$
に対して、z-normalized Euclidean distance $d(\boldsymbol{x}, \boldsymbol{y})$ を以下で定義する。
$$
    d(\boldsymbol{x}, \boldsymbol{y}) := \sqrt(\sum_{i=1}^{n} (\hat{x}_{i} - \hat{y}_{i})^2)
$$
ただし、ここで $\hat{x}_{i}$ および $\hat{y}_{i}$ は

$$
    \hat{x}_{i} = \frac{x_{i}-\mu_{x}}{\sigma_{x}} \\
    \hat{y}_{i} = \frac{y_{i}-\mu_{x}}{\sigma_{y}} \\
$$
であるとする。

In [1]:
def xNorm(x):
    return (x-np.mean(x))/np.std(x, 1)

---
# Confirmation of first example and its z-normalized Euclidean distance
`stump`関数を使った場合のMatrix Profileが、直接z-normalized Euclidean distanceの定義を用いてsubsequence同士の距離を計算したものと一致することを確認する。

In [2]:
import math
import random
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import stumpy 

In [3]:
time_series = np.array([0, 1, 3, 2, 9, 1, 14, 15, 1, 2, 2, 10, 7], dtype=float)
n = len(time_series)

### `stump`関数を使った場合

In [4]:
window_size = 4
matrix_profile = stumpy.stump(time_series, m=window_size)
df_matrix_profile = pd.DataFrame(matrix_profile)
display(df_matrix_profile)

Unnamed: 0,0,1,2,3
0,0.642486,9,-1,9
1,0.285705,8,-1,8
2,1.640169,9,0,9
3,0.898131,1,1,8
4,1.279547,9,0,9
5,1.781965,2,2,9
6,2.987226,3,3,8
7,2.839433,4,4,9
8,0.285705,1,1,-1
9,0.642486,0,0,-1


   ## z-normalized Euclidean distanceを手で計算

In [5]:
def xNorm(x):
    return (x-np.mean(x))/np.std(x, ddof=0)

def EuclideanDistance(x, y):
    return math.sqrt(sum(x-y)^2)

In [6]:
# w/o z-normalixed Euclidean distance
window_size = 4
ret = np.array([])
for i in range(n-window_size+1):
    tmp = np.array([])
    for j in range(n-window_size+1):
        diff = time_series[i:i+window_size]-time_series[j:j+window_size]
        t = np.sqrt(np.dot(diff, diff))
        tmp = np.append(tmp, t)
    tmp = np.sort(tmp)
    ret = np.append(ret, tmp[1])

print('ret =', ret)

ret = [ 6.8556546   1.41421356  6.164414    7.93725393 11.40175425 13.56465997
 14.07124728 13.96424004  1.41421356  6.164414  ]


上記のように、z-normalized Euclideanを考慮せずに計算すると、[Matrix Profile](https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html)のページで算出されているMatrix Profileと同じものが得られる。

In [7]:
# w/ z-normalixed Euclidean distance
window_size = 4
ret = np.array([])
for i in range(n-window_size+1):
    tmp = np.array([])
    for j in range(n-window_size+1):
        x_hat = xNorm(time_series[i:i+window_size])
        y_hat = xNorm(time_series[j:j+window_size])
        diff = x_hat - y_hat
        t = np.sqrt(np.dot(diff, diff))
        tmp = np.append(tmp, t)
    tmp = np.sort(tmp)
    ret = np.append(ret, tmp[1])

print('ret =', ret)

ret = [0.64248634 0.28570485 1.64016944 0.89813064 1.27954715 1.78196466
 2.05831901 2.05831901 0.28570485 0.64248634]


In [8]:
display(df_matrix_profile)

Unnamed: 0,0,1,2,3
0,0.642486,9,-1,9
1,0.285705,8,-1,8
2,1.640169,9,0,9
3,0.898131,1,1,8
4,1.279547,9,0,9
5,1.781965,2,2,9
6,2.987226,3,3,8
7,2.839433,4,4,9
8,0.285705,1,1,-1
9,0.642486,0,0,-1


よって、両者は一致！