Skip to content

Commit

Permalink
Merge branch 'dtw' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
koyo922 committed Oct 19, 2019
2 parents d4a31e5 + 0534555 commit 994a119
Show file tree
Hide file tree
Showing 10 changed files with 1,086 additions and 4 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Expand Up @@ -103,3 +103,7 @@ venv.bak/
.mypy_cache/

.idea
data
lb_kim.result
.swp
*.lprof
7 changes: 5 additions & 2 deletions CHANGELOG
Expand Up @@ -5,9 +5,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [PEP440](https://www.python.org/dev/peps/pep-0440/)

## [Unreleased]
...

1.2.0dev1 - 2019-10-20
---
### Added
- pandas display more columns/col_width
- test speed up
- UCR DTW implementation

1.1.0 - 2019-09-29
---
Expand Down
43 changes: 43 additions & 0 deletions docs/time_series.md
@@ -0,0 +1,43 @@
# Time Series Analysis Tool

## Python implementation of the famous UCR DTW algorithm

For theoretical details,
please read the [paper](https://www.cs.ucr.edu/~eamonn/SIGKDD_trillion.pdf) or this [article]()

Usage below:

```python
from kinoko.time_series.dtw import UCR_DTW
import numpy as np
import pytest

# `dist_cb` means "callback function for distance", you can try `abs(x-y)` etc.
# `window_frac` is the fraction of query length used as window during various LB calculation
# for "Euclidean Distance" instead of DTW, just set window_frac to zero
ucr_dtw = UCR_DTW(dist_cb=lambda x,y: (x-y)**2, window_frac=0.05)

x1 = np.linspace(0, 50, 100, endpoint=False)
y1 = 3.1 * np.sin(x1 / 1.5) + 3.5

x2 = np.linspace(0, 25, 50, endpoint=False) # half slice of x1
y2 = 3.1 * np.sin((x2 + 4) / 1.5) + 3.5 # same function but slided 4-units toward west

# `content` can be a `Iterable`(stream) of float/int, of any length
# `query` is supposed to be a sequence of fixed length, which would be loaded into RAM
loc, dist, _stat = ucr_dtw.search(content=y1, query=y2)
assert 8 == loc # 4 unit / 0.5 gap = 8
assert pytest.approx(0) == dist # almost zero
```

!!!caution "Known Issues"

- Currently, it only supports ==float/int== sequence
- vectors or even `object`s can not be uniformly `norm`ed
- hook mechanism seems like a "Premature Optimization"

- Speed is approximately same as [another Python implementation](https://github.com/JozeeLin/ucr-suite-python/blob/master/DTW.ipynb)
Completed time/memory efficiency comparison with the [original C implementation](https://github.com/klon/ucrdtw/blob/master/src/ucrdtw.c)
was not conducted.

- It is ==NOT production-ready==, use it with caution.
10 changes: 10 additions & 0 deletions kinoko/time_series/__init__.py
@@ -0,0 +1,10 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim: tabstop=4 shiftwidth=4 expandtab number
"""
TODO: module_docstring$ qianweishuo$ 2019/10/17
Authors: qianweishuo<qzy922@gmail.com>
Date: 2019/10/17 下午2:27
"""

0 comments on commit 994a119

Please sign in to comment.