New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW]: STUMPY: A Powerful and Scalable Python Library for Time Series Data Mining #1504
Comments
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ejolly, @hooman650 it looks like you're currently assigned to review this paper 🎉. ⭐ Important ⭐ If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿 To fix this do the following two things:
For a list of things I can do to help you, just type:
|
|
@ejolly @hooman650 Thank you for agreeing to review this submission! Whedon generated a checklist and linked a reviewer guide above -- let me know if you have any questions. |
👋 @ejolly @hooman650 How is it going? Would you like more time to review? Do you have any questions? Please let me know! |
Sorry @mbobra, I’ve been traveling but I can have this done by the end of this week if that’s ok. |
Ok, I have done a preliminary review of STUMPY. Summary of work from my perspective: Stumpy simply computes the euclidean distance between a segment with length window to a sequence of data. While a simple operation, it requires a high computational time and space. Stumpy builds upon the ideas published in several papers that employ FFT and algebra to improve the computational time of such process. The space complexity is handled by simply storing the smallest value for each comparison. In general, the work done sounds interesting and can be handy to find patterns in large time-series. Comments:
I copy the exception :
Of course the arrays are of size 4 and the index is asking for 10. Please fix.
Overview: In general, I feel like the author has done a good job and STUMPY can be a good contribution to the time-series analysis tool-chain. The documentation look good but more work should be done to make sure all the examples run smoothly and correctly. |
@hooman650 Thank you for your thorough review. Regarding 2:
Indeed, this was a typo and we have submitted an issue and fixed it accordingly in both our README and ReadTheDocs. Thank you for pointing this out! Regarding 3:
Unfortunately, I am not the original author of the GPU-STOMP publication/code and the numbers shown were simply extracted from their published paper for comparison. Currently, I only have access to CPUs but it is a top priority in our project roadmap to port this work over to GPUs. We are in the process of looking for assistance and resources from folks at NVIDIA. The initial goal of our scalable CPU implementation was to allow non-tech-savvy scientists to be able to get up and running quickly without needing access to any specialized hardware. We believe that we have achieved this goal. Regarding 4:
This is an excellent question and has been discussed by the original authors (not me) in Section D of their paper matrix profile II. In summary, the window size is certainly a user input that requires some level of domain expertise. However, the original authors have demonstrated that the matrix profile is robust to varying window sizes and that being "in the ballpark" is often enough to find motifs. Regarding 5:
We currently provide a version number in the standard |
👋 @seanlaw Github is showing that there aren't any releases in the repository. Could you please go ahead and create a release? Could you also include your responses to points 3 and 4 above in the text of the paper along with the appropriate references? |
@mbobra Thank you for pointing me to the helpful resources. I've gone back and tagged the commit that coincides with the first upload to PyPI (May 3rd) as v1.0.0. Let me know if that is sufficient. Regarding points 3 and 4, both references (Matrix Profile I and Matrix Profile II) were already included in the original article proof (note that the references that I mention above are just the preprints that are available directly on the original author's group website and the references in the original article proof are referencing the published IEEE manuscripts). |
Hi @mbobra sorry for the delay on my review! First of all, I'd like to note that Stumpy is a great addition to the time-series analyst's toolkit and is very well-documented, explained, and referenced. I also rather enjoyed the talk. Really nice @seanlaw! My testing was done using Python 3.6 on Mac OS 10.14.2 and everything installed without issues. I updated my installation given the most recent changes made as per @hooman650's review and can attest that fix for comment 2 now works. I was unable to test out the performance claims given limited current access to a distributed compute system at this time, but I was able to at least test the functionality by running a local dask server without issues. Just a few minor suggestions:
With those minor changes I think this would make a great addition to JOSS. |
@ejolly Thank you for your constructive and useful feedback. Please see my responses below: Regarding 1:
This is an excellent suggestion and we've filed/fixed/closed the this issue as per your recommendation. For completeness, we have also provided interactive Binder notebooks in addition to the pre-rendered notebooks so that the user can "try before installing". Regarding 2:
We completely agree and it is one of our older issues dating back to May 18th that we are hoping to get some help on identifying a good example dataset for and writing up a more complete tutorial like Tutorial 1. Currently, the tutorial only demonstrates the time series chains API and we'd really like to provide some more intuitive insight with a better data set than the current Taxi data set. In all fairness, the goal of the STUMPY software is to faithfully implement the algorithms based on the published papers (not written by us) and so we strongly recommend that the user read the papers (clearly referenced) as the papers can provide far more detail and insight than STUMPY can. One needs to keep in mind that without STUMPY, there is really no scalable, performant, and easy to install implementation for computing the matrix profile and so our current focus is to provide a suite of tools based on the published papers and to save the user the time and headache from having to implement the published papers (which are not without errors and missing important implementation details). Eventually, once we've created a community/user base and developed most of the published features then we will certainly spend more time improving the tutorials. It's probably important to point out that STUMPY was created and currently maintained by a single person (me) and, for better or for worse, it is mostly done on my personal time and, without additional assistance, one person can only do so much. While I completely and wholeheartedly agree that the tutorials could be better (and they will be once the feature set stabilizes), I would respectfully argue that the JOSS requirements make no mention of tutorials and so they are a "nice to have" but should not be used as a criteria to judge the completeness of the software. From an API documentation, unit testing/code coverage, installation instructions, and example usage standpoint, we humbly believe that this open source software meets the JOSS requirements. Regarding 3:
This is good feedback. We've filed/fixed/closed the following new issue and added a clearer link in the opening paragraph of the README. |
@seanlaw the binder addition is a great one and I think will be great for new users. Regarding my point 2: I apologize as I should have been more clear. I completely agree with your response that publication in JOSS should be not contingent on you adding a more comprehensive tutorial 2 and my comment was more of a suggestion for something that would improve the tutorials, i.e. would be "nice to have". From my review, you have already done a fantastic job of documenting, testing, and providing the requisite high-level explanation of the package functionality as per JOSS requirements. I completely understand that creating and maintaining a solo project is a huge demand on your time and it will be great to see how tutorials and functionality grow with the community base! |
@ejolly No need to apologize as I assumed no ill intent. Thank you (as well as to @hooman650 and @mbobra) for taking the time to review! I really appreciate the valuable feedback. |
@hooman650 and @ejolly Thank you so much for reviewing! We really appreciate your time and effort ☀️ @seanlaw We're almost there! Can you please archive your release on Zenodo to obtain a DOI and then put that in your README.rst file? After that I think we're done 🎉 |
@mbobra I've added the DOI as a badge a the top of the README.rst. Is that what you mean? |
@whedon set 10.5281/zenodo.3340125 as archive |
OK. 10.5281/zenodo.3340125 is the archive. |
@whedon set 1.0.0 as version |
OK. 1.0.0 is the version. |
@whedon check references |
|
|
@whedon generate pdf |
|
@openjournals/joss-eics This paper is ready for acceptance! Nice work @seanlaw 🎉 |
Thanks @mbobra, @hooman650, and @ejolly! This was a wonderful and pleasant submission experience. Hopefully, I will run into you at a conference one day! |
@whedon accept |
|
Check final proof 👉 openjournals/joss-papers#842 If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#842, then you can now move forward with accepting the submission by compiling again with the flag
|
@danielskatz The final proof looks good. Is there anything else that I need to do or is whedon’s command for you to handle? I just want to make sure I am not holding up the process. Sent with GitHawk |
It's fine - between being on a plane where I couldn't see the final PDF to check it, and then driving and sleeping, I've just gotten back to it :) |
Thanks to @ejolly & @hooman650 for reviewing and to @mbobra for editing |
@whedon accept deposit=true |
|
🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦 |
🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨 Here's what you must now do:
Any issues? notify your editorial technical team... |
🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉 If you would like to include a link to your paper from your README use the following code snippets:
This is how it will look in your documentation: We need your help! Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:
|
@danielskatz I just spotted a minor typo in the PDF. Is there some way that I can fix it? |
@whedon generate pdf |
|
@arfon I have fixed the typo in the original source repository. Can you please take a look? |
Done. It could take a few hours to show up as fixed on the JOSS site as there's caching in place for the PDFs. |
Thanks, @arfon! It looks good now |
Submitting author: @seanmylaw (Sean Law)
Repository: https://github.com/TDAmeritrade/stumpy
Version: 1.0.0
Editor: @mbobra
Reviewer: @ejolly, @hooman650
Archive: 10.5281/zenodo.3340125
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@ejolly & @hooman650, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mbobra know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @ejolly
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?Review checklist for @hooman650
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?The text was updated successfully, but these errors were encountered: