Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildly varying time to run STAMP with query #36

Closed
ginoa opened this issue Dec 10, 2018 · 4 comments
Closed

Wildly varying time to run STAMP with query #36

ginoa opened this issue Dec 10, 2018 · 4 comments
Assignees
Milestone

Comments

@ginoa
Copy link

ginoa commented Dec 10, 2018

Description
Running time of STAMP varies hugely depending on size of data. For some data sizes it even gets stuck without showing the progress bar.
For example computing the distance profile between a dataset and a query of length 34 and window size 13, I observe the following

  • average of 1 it/s for dataset of length 620579
  • average of 0.5 it/s for dataset of length 620578
  • average of 0.3 it/s for dataset of length 620577
  • average of 2.5 it/s for dataset of length 620576
  • average of 0.04 it/s for dataset of length 620574
  • stamp gets stuck without error for some sizes of dataset - or more probably it takes a lot of time to get to the point of showing the progress bar - so much that I do not have patience to wait and want to interrupt it. I must then hard kill Rstudio to interrupt evaluation.

In other words, changing the size of the dataset by just one unit has a big, nonlinear and unintuitive effect on execution time of stamp. For some size of data the function gets stuck without error.

Working examples
I could reproduce this behavior with random data.

ref_data <- rnorm(620576)
query_data <- rnorm(34)
tst <- tsmp::stamp(ref_data,query_data,window_size=13)
## execution speed: 0.4 it/s

ref_data <- rnorm(620577)
query_data <- rnorm(34)
tst <- tsmp::stamp(ref_data,query_data,window_size=13)
## execution speed: 2.5 it/s

Expected behavior
I would expect that running time get shorter as data decreases in size.

@franzbischoff
Copy link
Member

Confirmed. With STOMP this behaviour is not present.
I'll have to profile the code and check where is the bottleneck.

Thanks.

@franzbischoff franzbischoff self-assigned this Dec 10, 2018
@franzbischoff franzbischoff added this to the v0.3.5 milestone Dec 10, 2018
@franzbischoff
Copy link
Member

The bottleneck is stats::fft(), this may have something to do with the FFT algorithm. Maybe padding the data with zeroes (the right amount) can solve this. Need to look into it.

@franzbischoff
Copy link
Member

Implementing mass_v3 solves this problem

https://www.cs.unm.edu/~mueen/MASS_V3.m

@franzbischoff
Copy link
Member

Changes STAMP to use MASS_V3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants