Merge branch 'dtw' into dev

koyo922 · Oct 19, 2019 · 994a119 · 994a119
2 parents d4a31e5 + 0534555
commit 994a119
Show file tree

Hide file tree

Showing 10 changed files with 1,086 additions and 4 deletions.
diff --git a/.gitignore b/.gitignore
@@ -103,3 +103,7 @@ venv.bak/
 .mypy_cache/
 
 .idea
+data
+lb_kim.result
+.swp
+*.lprof
diff --git a/CHANGELOG b/CHANGELOG
@@ -5,9 +5,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [PEP440](https://www.python.org/dev/peps/pep-0440/)
 
 ## [Unreleased]
+...
+
+1.2.0dev1 - 2019-10-20
+---
 ### Added
-- pandas display more columns/col_width
-- test speed up
+- UCR DTW implementation
 
 1.1.0 - 2019-09-29
 ---

diff --git a/docs/time_series.md b/docs/time_series.md
@@ -0,0 +1,43 @@
+# Time Series Analysis Tool
+
+## Python implementation of the famous UCR DTW algorithm
+
+For theoretical details,
+please read the [paper](https://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf) or this [article]()
+
+Usage below:
+
+```python
+from kinoko.time_series.dtw import UCR_DTW
+import numpy as np
+import pytest
+
+# `dist_cb` means "callback function for distance", you can try `abs(x-y)` etc.
+# `window_frac` is the fraction of query length used as window during various LB calculation
+# for "Euclidean Distance" instead of DTW, just set window_frac to zero
+ucr_dtw = UCR_DTW(dist_cb=lambda x,y: (x-y)**2, window_frac=0.05)
+
+x1 = np.linspace(0, 50, 100, endpoint=False)
+y1 = 3.1 * np.sin(x1 / 1.5) + 3.5
+
+x2 = np.linspace(0, 25, 50, endpoint=False)  # half slice of x1
+y2 = 3.1 * np.sin((x2 + 4) / 1.5) + 3.5  # same function but slided 4-units toward west
+
+# `content` can be a `Iterable`(stream) of float/int, of any length
+# `query` is supposed to be a sequence of fixed length, which would be loaded into RAM
+loc, dist, _stat = ucr_dtw.search(content=y1, query=y2)
+assert 8 == loc  # 4 unit / 0.5 gap = 8
+assert pytest.approx(0) == dist  # almost zero
+```
+
+!!!caution "Known Issues"
+
+   - Currently, it only supports ==float/int== sequence
+    - vectors or even `object`s can not be uniformly `norm`ed
+    - hook mechanism seems like a "Premature Optimization"
+
+   - Speed is approximately same as [another Python implementation](https://github.com/JozeeLin/ucr-suite-python/blob/master/DTW.ipynb)
+     Completed time/memory efficiency comparison with the [original C implementation](https://github.com/klon/ucrdtw/blob/master/src/ucrdtw.c)
+     was not conducted.
+
+   - It is ==NOT production-ready==, use it with caution.
diff --git a/kinoko/time_series/__init__.py b/kinoko/time_series/__init__.py
@@ -0,0 +1,10 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# vim: tabstop=4 shiftwidth=4 expandtab number
+"""
+TODO: module_docstring$ qianweishuo$ 2019/10/17
+
+Authors: qianweishuo<qzy922@gmail.com>
+Date:    2019/10/17 下午2:27
+"""
+