Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: STUMPY: A Powerful and Scalable Python Library for Time Series Data Mining #1504

Closed
33 of 36 tasks
whedon opened this issue Jun 13, 2019 · 45 comments
Closed
33 of 36 tasks
Assignees
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review

Comments

@whedon
Copy link

whedon commented Jun 13, 2019

Submitting author: @seanmylaw (Sean Law)
Repository: https://github.com/TDAmeritrade/stumpy
Version: 1.0.0
Editor: @mbobra
Reviewer: @ejolly, @hooman650
Archive: 10.5281/zenodo.3340125

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29"><img src="http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29/status.svg)](http://joss.theoj.org/papers/eb91faaf9219d46c9acd373cfee8ac29)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ejolly & @hooman650, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mbobra know.

Please try and complete your review in the next two weeks

Review checklist for @ejolly

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: 1.0.0
  • Authorship: Has the submitting author (@seanmylaw) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @hooman650

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: 1.0.0
  • Authorship: Has the submitting author (@seanmylaw) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon
Copy link
Author

whedon commented Jun 13, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ejolly, @hooman650 it looks like you're currently assigned to review this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

@whedon
Copy link
Author

whedon commented Jun 13, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Jun 13, 2019

@mbobra
Copy link
Member

mbobra commented Jun 13, 2019

@ejolly @hooman650 Thank you for agreeing to review this submission! Whedon generated a checklist and linked a reviewer guide above -- let me know if you have any questions.

@mbobra
Copy link
Member

mbobra commented Jul 2, 2019

👋 @ejolly @hooman650 How is it going? Would you like more time to review? Do you have any questions? Please let me know!

@ejolly
Copy link

ejolly commented Jul 6, 2019

Sorry @mbobra, I’ve been traveling but I can have this done by the end of this week if that’s ok.

@hooman650
Copy link

hooman650 commented Jul 13, 2019

Ok, I have done a preliminary review of STUMPY.

Summary of work from my perspective:

Stumpy simply computes the euclidean distance between a segment with length window to a sequence of data. While a simple operation, it requires a high computational time and space. Stumpy builds upon the ideas published in several papers that employ FFT and algebra to improve the computational time of such process. The space complexity is handled by simply storing the smallest value for each comparison. In general, the work done sounds interesting and can be handy to find patterns in large time-series.

Comments:

  1. Tried to install stumpy with PyPI and everything went well on my python 3.6 which runs in windows 64bit OS.

  2. The author has prepared a nice documentation page as well as contributing guidelines and examples. However, I just ran into an exception as I tried to run the following example from the documentation:

your_time_series = np.random.rand(10000)
window_size = 50  # Approximately, how many data points might be found in a pattern

matrix_profile = stumpy.stump(your_time_series, m=window_size)

left_matrix_profile_index = matrix_profile[2]
right_matrix_profile_index = matrix_profile[3]
idx = 10  # Subsequence index for which to retrieve the anchored time series chain for

anchored_chain = stumpy.atsc(left_matrix_profile_index, right_matrix_profile_index, idx)

all_chain_set, longest_unanchored_chain = stumpy.allc(left_matrix_profile_index, right_matrix_profile_index)

I copy the exception :

index 10 is out of bounds for axis 0 with size 4

Of course the arrays are of size 4 and the index is asking for 10. Please fix.

  1. A comment in regards to the performance comparison graph that is mentioned here; How did the authors of GPU-STOMP implement their algorithm? In tensorflow? I feel like a good implementation is totally feasible in tensorflow, given that recently almost every hard computation is performing way better on GPUs compared to CPUs, I have difficulties believing a good GPU implementation would be beaten by CPUs. But, I might be wrong.

  2. My final comment is in regards to the "window size" input of STUMPY. This could be compared to kernel size in CNNs. This will be always challenging to know prior to analysis. Any suggestions for determining that?

  3. There is no version release number in the repository.

Overview:

In general, I feel like the author has done a good job and STUMPY can be a good contribution to the time-series analysis tool-chain. The documentation look good but more work should be done to make sure all the examples run smoothly and correctly.

@seanlaw
Copy link

seanlaw commented Jul 15, 2019

@hooman650 Thank you for your thorough review.

Regarding 2:

The author has prepared a nice documentation page as well as contributing guidelines and examples. However, I just ran into an exception as I tried to run the following example from the documentation:

Indeed, this was a typo and we have submitted an issue and fixed it accordingly in both our README and ReadTheDocs. Thank you for pointing this out!

Regarding 3:

A comment in regards to the performance comparison graph that is mentioned here; How did the authors of GPU-STOMP implement their algorithm? In tensorflow? I feel like a good implementation is totally feasible in tensorflow, given that recently almost every hard computation is performing way better on GPUs compared to CPUs, I have difficulties believing a good GPU implementation would be beaten by CPUs. But, I might be wrong.

Unfortunately, I am not the original author of the GPU-STOMP publication/code and the numbers shown were simply extracted from their published paper for comparison. Currently, I only have access to CPUs but it is a top priority in our project roadmap to port this work over to GPUs. We are in the process of looking for assistance and resources from folks at NVIDIA. The initial goal of our scalable CPU implementation was to allow non-tech-savvy scientists to be able to get up and running quickly without needing access to any specialized hardware. We believe that we have achieved this goal.

Regarding 4:

My final comment is in regards to the "window size" input of STUMPY. This could be compared to kernel size in CNNs. This will be always challenging to know prior to analysis. Any suggestions for determining that?

This is an excellent question and has been discussed by the original authors (not me) in Section D of their paper matrix profile II. In summary, the window size is certainly a user input that requires some level of domain expertise. However, the original authors have demonstrated that the matrix profile is robust to varying window sizes and that being "in the ballpark" is often enough to find motifs.

Regarding 5:

There is no version release number in the repository.

We currently provide a version number in the standard setup.py file and the version number is also accessible within Python via stumpy.__version__. Perhaps there is a better or more standard place to specify the version release number? Any guidance would be greatly appreciated.

@mbobra
Copy link
Member

mbobra commented Jul 15, 2019

👋 @seanlaw

Github is showing that there aren't any releases in the repository. Could you please go ahead and create a release?

Could you also include your responses to points 3 and 4 above in the text of the paper along with the appropriate references?

@seanlaw
Copy link

seanlaw commented Jul 16, 2019

@mbobra Thank you for pointing me to the helpful resources. I've gone back and tagged the commit that coincides with the first upload to PyPI (May 3rd) as v1.0.0. Let me know if that is sufficient.

Regarding points 3 and 4, both references (Matrix Profile I and Matrix Profile II) were already included in the original article proof (note that the references that I mention above are just the preprints that are available directly on the original author's group website and the references in the original article proof are referencing the published IEEE manuscripts).

@ejolly
Copy link

ejolly commented Jul 17, 2019

Hi @mbobra sorry for the delay on my review!

First of all, I'd like to note that Stumpy is a great addition to the time-series analyst's toolkit and is very well-documented, explained, and referenced. I also rather enjoyed the talk. Really nice @seanlaw!

My testing was done using Python 3.6 on Mac OS 10.14.2 and everything installed without issues. I updated my installation given the most recent changes made as per @hooman650's review and can attest that fix for comment 2 now works.

I was unable to test out the performance claims given limited current access to a distributed compute system at this time, but I was able to at least test the functionality by running a local dask server without issues.

Just a few minor suggestions:

  1. Since the tutorial examples are provided as jupyter notebooks and the prerendered file for Tutorial_1 includes inline plots, it would good to add the %matplotlib inline command to the top code cell of that notebook so that anyone just downloading and running the notebooks immediately reproduces the prerendered files on github
  2. In its present form, Tutorial_2 is a bit lacking in context and explanation. While it's nice to see how to use some additional functionality and be provided links to the original paper, I might suggest at least a bit more of an explanation as to what and why one might use time-series chains. Tutorial_1 for example, does a great job in providing a high-level overview of the matrix profile and elucidating the impact of the window size free parameter
  3. More of a minor suggestion, but it might be nice to have the text link to the documentation a bit higher up on the github README page, for example in the website field of the repo description or edit the text in the first paragraph to clearly mention the documentation site. While badge and "matrix normal" links work, the actualy documentation site itself isn't discussed until halfway through the README. Feel free to ignore this if you prefer.

With those minor changes I think this would make a great addition to JOSS.

@seanlaw
Copy link

seanlaw commented Jul 17, 2019

@ejolly Thank you for your constructive and useful feedback. Please see my responses below:

Regarding 1:

Since the tutorial examples are provided as jupyter notebooks and the prerendered file for Tutorial_1 includes inline plots, it would good to add the %matplotlib inline command to the top code cell of that notebook so that anyone just downloading and running the notebooks immediately reproduces the prerendered files on github

This is an excellent suggestion and we've filed/fixed/closed the this issue as per your recommendation. For completeness, we have also provided interactive Binder notebooks in addition to the pre-rendered notebooks so that the user can "try before installing".

Regarding 2:

In its present form, Tutorial_2 is a bit lacking in context and explanation. While it's nice to see how to use some additional functionality and be provided links to the original paper, I might suggest at least a bit more of an explanation as to what and why one might use time-series chains. Tutorial_1 for example, does a great job in providing a high-level overview of the matrix profile and elucidating the impact of the window size free parameter

We completely agree and it is one of our older issues dating back to May 18th that we are hoping to get some help on identifying a good example dataset for and writing up a more complete tutorial like Tutorial 1. Currently, the tutorial only demonstrates the time series chains API and we'd really like to provide some more intuitive insight with a better data set than the current Taxi data set.

In all fairness, the goal of the STUMPY software is to faithfully implement the algorithms based on the published papers (not written by us) and so we strongly recommend that the user read the papers (clearly referenced) as the papers can provide far more detail and insight than STUMPY can. One needs to keep in mind that without STUMPY, there is really no scalable, performant, and easy to install implementation for computing the matrix profile and so our current focus is to provide a suite of tools based on the published papers and to save the user the time and headache from having to implement the published papers (which are not without errors and missing important implementation details). Eventually, once we've created a community/user base and developed most of the published features then we will certainly spend more time improving the tutorials. It's probably important to point out that STUMPY was created and currently maintained by a single person (me) and, for better or for worse, it is mostly done on my personal time and, without additional assistance, one person can only do so much.

While I completely and wholeheartedly agree that the tutorials could be better (and they will be once the feature set stabilizes), I would respectfully argue that the JOSS requirements make no mention of tutorials and so they are a "nice to have" but should not be used as a criteria to judge the completeness of the software. From an API documentation, unit testing/code coverage, installation instructions, and example usage standpoint, we humbly believe that this open source software meets the JOSS requirements.

Regarding 3:

More of a minor suggestion, but it might be nice to have the text link to the documentation a bit higher up on the github README page, for example in the website field of the repo description or edit the text in the first paragraph to clearly mention the documentation site. While badge and "matrix normal" links work, the actualy documentation site itself isn't discussed until halfway through the README. Feel free to ignore this if you prefer.

This is good feedback. We've filed/fixed/closed the following new issue and added a clearer link in the opening paragraph of the README.

@ejolly
Copy link

ejolly commented Jul 17, 2019

@seanlaw the binder addition is a great one and I think will be great for new users.

Regarding my point 2:

I apologize as I should have been more clear. I completely agree with your response that publication in JOSS should be not contingent on you adding a more comprehensive tutorial 2 and my comment was more of a suggestion for something that would improve the tutorials, i.e. would be "nice to have". From my review, you have already done a fantastic job of documenting, testing, and providing the requisite high-level explanation of the package functionality as per JOSS requirements.

I completely understand that creating and maintaining a solo project is a huge demand on your time and it will be great to see how tutorials and functionality grow with the community base!

@seanlaw
Copy link

seanlaw commented Jul 17, 2019

@ejolly No need to apologize as I assumed no ill intent. Thank you (as well as to @hooman650 and @mbobra) for taking the time to review! I really appreciate the valuable feedback.

@mbobra
Copy link
Member

mbobra commented Jul 17, 2019

@hooman650 and @ejolly Thank you so much for reviewing! We really appreciate your time and effort ☀️

@seanlaw We're almost there! Can you please archive your release on Zenodo to obtain a DOI and then put that in your README.rst file? After that I think we're done 🎉

@seanlaw
Copy link

seanlaw commented Jul 17, 2019

@mbobra I've added the DOI as a badge a the top of the README.rst. Is that what you mean?

@mbobra
Copy link
Member

mbobra commented Jul 17, 2019

@whedon set 10.5281/zenodo.3340125 as archive

@whedon
Copy link
Author

whedon commented Jul 17, 2019

OK. 10.5281/zenodo.3340125 is the archive.

@mbobra
Copy link
Member

mbobra commented Jul 17, 2019

@whedon set 1.0.0 as version

@whedon
Copy link
Author

whedon commented Jul 17, 2019

OK. 1.0.0 is the version.

@mbobra
Copy link
Member

mbobra commented Jul 17, 2019

@whedon check references

@whedon
Copy link
Author

whedon commented Jul 17, 2019

Attempting to check references...

@whedon
Copy link
Author

whedon commented Jul 17, 2019


OK DOIs

- 10.1109/ICDM.2016.0179 is OK
- 10.1109/ICDM.2016.0085 is OK
- 10.1109/ICDM.2017.66 is OK
- 10.1109/ICDM.2017.79 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@mbobra
Copy link
Member

mbobra commented Jul 17, 2019

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jul 17, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Jul 17, 2019

@mbobra
Copy link
Member

mbobra commented Jul 17, 2019

@openjournals/joss-eics This paper is ready for acceptance! Nice work @seanlaw 🎉

@seanlaw
Copy link

seanlaw commented Jul 18, 2019

Thanks @mbobra, @hooman650, and @ejolly! This was a wonderful and pleasant submission experience. Hopefully, I will run into you at a conference one day!

@danielskatz
Copy link

@whedon accept

@whedon
Copy link
Author

whedon commented Jul 18, 2019

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented Jul 18, 2019

Check final proof 👉 openjournals/joss-papers#842

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#842, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

@seanlaw
Copy link

seanlaw commented Jul 18, 2019

@danielskatz The final proof looks good. Is there anything else that I need to do or is whedon’s command for you to handle? I just want to make sure I am not holding up the process.

Sent with GitHawk

@danielskatz
Copy link

It's fine - between being on a plane where I couldn't see the final PDF to check it, and then driving and sleeping, I've just gotten back to it :)

@danielskatz
Copy link

Thanks to @ejolly & @hooman650 for reviewing and to @mbobra for editing

@danielskatz
Copy link

@whedon accept deposit=true

@whedon
Copy link
Author

whedon commented Jul 18, 2019

Doing it live! Attempting automated processing of paper acceptance...

@whedon
Copy link
Author

whedon commented Jul 18, 2019

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

@whedon
Copy link
Author

whedon commented Jul 18, 2019

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.01504 joss-papers#843
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01504
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? notify your editorial technical team...

@whedon
Copy link
Author

whedon commented Jul 18, 2019

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](http://joss.theoj.org/papers/10.21105/joss.01504/status.svg)](https://doi.org/10.21105/joss.01504)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01504">
  <img src="http://joss.theoj.org/papers/10.21105/joss.01504/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.01504/status.svg
   :target: https://doi.org/10.21105/joss.01504

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

@seanlaw
Copy link

seanlaw commented Aug 16, 2019

@danielskatz I just spotted a minor typo in the PDF. Is there some way that I can fix it?

@seanlaw
Copy link

seanlaw commented Aug 16, 2019

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Aug 16, 2019

Attempting PDF compilation. Reticulating splines etc...

@seanlaw
Copy link

seanlaw commented Aug 24, 2019

@arfon I have fixed the typo in the original source repository. Can you please take a look?

@arfon
Copy link
Member

arfon commented Aug 24, 2019

@arfon I have fixed the typo in the original source repository. Can you please take a look?

Done. It could take a few hours to show up as fixed on the JOSS site as there's caching in place for the PDFs.

@seanlaw
Copy link

seanlaw commented Aug 24, 2019

Thanks, @arfon! It looks good now

@whedon whedon added published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. labels Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review
Projects
None yet
Development

No branches or pull requests

7 participants