GitHub - nchrys/haskell-sequence-kernels: Implementation in Haskell of the one-sided mean kernel and other sequence kernels

About

This is a very small toy project that I made to experiment with the Haskell programming language.

It reads a set of sequences from file example_data.txt, computes a Gram matrix of these sequences using the one-sided mean alignment kernel, computes the eigenvalues of this matrix and displays them. As the one-sided mean alignment kernel is provably positive definite, the eigenvalues are all positive.

The project also contains code to generate random dummy time series.

Usage

If you do not have stack installed you can on OS X with Homebrew:
```
brew update
brew install haskell-stack
```

Then build and run the program:

cd haskell-sequence-kernels
stack setup
stack build
stack exec haskell-sequence-kernels-exe

Performance

Performance is terrible.

This is because currently dynamic programming is implemented by keeping intermediate results in a Data.Map structure which does not have constant access time like vectors. In the future I plan on using a mutable vector or matrix with the ST monad. Nevertheless, on this dataset the Haskell program compiled with O2 optimization level is quite surprisingly almost faster than the Python equivalent.

About the one-sided mean alignment kernel

Well-known kernel methods such as for example SVM and Kernel PCA rely on a kernel which is used to compute a Gram matrix which can generally be interpreted as a matrix of similarity values between any pair of samples in the dataset. These algorithms require that the kernel be positive definite, which guarantees that the resulting optimization programs are convex whatever the samples.

When dealing with vector data the most common choice is generally a Gaussian kernel, but when the samples are time-series (or more generally sequences) custom kernels must be used. There are not many sensible choices, as merely using a classic dynamic time warping distance does not lead to a positive definite kernel.

To this end we propose the one-sided mean kernel, which has many advantages:

Provably positive definite,
Faster than competing techniques with a time complexity of O(l × (m - l)) instead of O(l × m) for sequences of length l < m,
Consistent with a vector kernel when time series are of equal length,
Does not suffer from issues of diagonal dominance like the global alignment kernel for example.

For comparison an implementation in Python is available here.

TODO

Rewrite the algorithm using the ST monad,
Implement other kernels such as classic dynamic time warping and the global alignment kernel,
Add support for multivariate time-series,
Automatic selection of the kernel bandwith,

References

N. Chrysanthos, P. Beauseroy, H. Snoussi, E. Grall-Maës, Theoretical properties and implementation of the one-sided mean kernel for time series, in: Neurocomputing 169 (2015) 196–204
M. Cuturi, J.-P. Vert, O. Birkenes, T. Matsui, A kernel for time series based on global alignments, in: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007
J.-P. Vert, The Optimal Assignment Kernel is not Positive Definite in: arXiv:0801. 4061

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Setup.hs		Setup.hs
example_data.txt		example_data.txt
haskell-sequence-kernels.cabal		haskell-sequence-kernels.cabal
stack.yaml		stack.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Usage

Performance

About the one-sided mean alignment kernel

TODO

References

About

Releases

Packages

Languages

License

nchrys/haskell-sequence-kernels

Folders and files

Latest commit

History

Repository files navigation

About

Usage

Performance

About the one-sided mean alignment kernel

TODO

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages