Skip to content

Conversation

@NimaSarajpoor
Copy link
Collaborator

Corresponding to Issue #374

Pull Request Checklist

Below is a simple checklist but please do not hesitate to ask for assistance!

  • Fork, clone, and checkout the newest version of the code
  • Create a new branch
  • Make necessary code changes
  • Install black (i.e., python -m pip install black or conda install -c conda-forge black)
  • Install flake8 (i.e., python -m pip install flake8 or conda install -c conda-forge flake8)
  • Install pytest-cov (i.e., python -m pip install pytest-cov or conda install -c conda-forge pytest-cov)
  • Run black . in the root stumpy directory
  • Run flake8 . in the root stumpy directory
  • Run ./setup.sh && ./test.sh in the root stumpy directory
  • Reference a Github issue (and create one if one doesn't already exist)

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw:
I didn't do ./setup.sh && ./test.sh in the root stumpy directory as it takes a long time on my PC and I remembered you said it is not necessary when doing Tutorial. Right?

And, as you suggested, I tried to submit the first PR on this issue that includes only small changes. I will consider that 50 lines rule of thumb for subsequent pushes.

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2021

Codecov Report

Merging #397 (109efe0) into main (f6d0a37) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #397   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           34        35    +1     
  Lines         2713      2719    +6     
=========================================
+ Hits          2713      2719    +6     
Impacted Files Coverage Δ
stumpy/__init__.py 100.00% <100.00%> (ø)
stumpy/aamp.py 100.00% <100.00%> (ø)
stumpy/aamp_motifs.py 100.00% <100.00%> (ø)
stumpy/aampdist_snippets.py 100.00% <100.00%> (ø)
stumpy/aamped.py 100.00% <100.00%> (ø)
stumpy/aampi.py 100.00% <100.00%> (ø)
stumpy/core.py 100.00% <100.00%> (ø)
stumpy/floss.py 100.00% <100.00%> (ø)
stumpy/gpu_stimp.py 100.00% <100.00%> (ø)
stumpy/maamped.py 100.00% <100.00%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6d0a37...109efe0. Read the comment docs.

@@ -1,5 +1,29 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ninimama I think this is pretty good and here are a couple of comments:

  • In your opening sentence, it is claimed that it is a "common question" but it would be nice to provide some (obvious) real-world examples if possible
  • I like the point that you are trying to make about snippets vs clustering but the clustering point could be a bit clearer
  • "Although a clustering" should be "Although clustering"
  • Since we acknowledge the original papers in the first sentence, there is no need to repeat things like "Authors in Matrix Profile XIII". We make no claims of these ideas being our own original ideas
  • So, I don't think the final tutorial will go through the creation of the toy data. We will just create it in Zenodo.ipynb and then download the final data to analyze

Reply via ReviewNB

@seanlaw
Copy link
Contributor

seanlaw commented Jun 3, 2021

I didn't do ./setup.sh && ./test.sh in the root stumpy directory as it takes a long time on my PC and I remembered you said it is not necessary when doing Tutorial. Right?

Yes, that should be fine

And, as you suggested, I tried to submit the first PR on this issue that includes only small changes. I will consider that 50 lines rule of thumb for subsequent pushes.

Sounds good! Thanks

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

  • So, I don't think the final tutorial will go through the creation of the toy data. We will just create it in Zenodo.ipynb and then download the final data to analyze

In my recent commit, I keep the toy data for now and just add textual context and improve the current flow/figures. I tried to follow the same structure used in the other tutorials.

I can take care of retrieving the data from Zenodo in the future commits (If you have any suggestion regarding this matter, please let me know)

================================================

I have one question that has been bothering me:

Sometimes I try to limit the changes into one particular area to avoid the term "and" in the commit message. In other words, I would like to consider each commit for one particular issue. So, it would be easier for the reviewer (you) to go through the changes in the commit. However, in some occasions, there are only a few changes that need to be done for a particular issue. For instance, adding the heading to the notebook is probably less than 10 lines. Or, modifying figures (and change their orders) is again less than 10 lines. This can easily happen in different scenarios, when I try to improve different parts of the Tutorial.

Is that okay to do one git commit for all those small changes? (This is what I did in my recent commit that I pushed)

OR: Can I do three small commits (each corresponds to small changes). Then, after a few commits, I can do git push. Since I don't know how the review process looks like on your end, I have no idea if the latter approach makes sense or not.

@seanlaw
Copy link
Contributor

seanlaw commented Jun 4, 2021

@ninimama I can handle the Zenodo upload part. Let's just keep it as is for now

Regarding commits, I wouldn't worry so much about tutorials. I think you can bundle a few things into a single commit for a notebook and then let me know in this comments section if there is anything that I need to know.

I can tell you that I am not looking at every single commit. In fact, I'm only looking at the latest version via the "ReviewNB" purple button above.

@@ -1,8 +1,51 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a header called "Snippets".


Reply via ReviewNB

@@ -1,8 +1,51 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You said "common question" but you've only presented a statement.
  • Please remove "large" in "large time series" and then combine that sentence with the one following it
  • Please replace "his sleep" with "their sleep"

This paragraph seems somewhat forced and out of place:

For instance, the respiration data of a person recorded during his sleep can be reported by showing only the top-2 snippets. As an another example, snippets can be used to find different stages in the walking-nordic walking-skipping-running cycle of a person. Particularly, the y-axis acceleration data recorded by a chest-worn sensor can be summarized into top-4 snippets where each snippet shows a different type of exercise.

This may be something that we'll need to think through and provide proper motivation without getting into too much unnecessary detail.

I think that the It is worthwhile to note what snippet is NOT part should not be in the starting section and should be its own section header

This sentence isn't so clear:

one should specify the beginning and ending of subsequences and then apply clustering on the set of subseuqences

Reply via ReviewNB

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

This may be something that we'll need to think through and provide proper motivation without getting into too much unnecessary detail.

I am still thinking about it (I put "?" in the corresponding part in the notebook.

===================================================

Just to confirm: Would you like to reproduce the result of another data (other than the toy data currently provided in the notebook)?

@seanlaw
Copy link
Contributor

seanlaw commented Jun 7, 2021

Just to confirm: Would you like to reproduce the result of another data (other than the toy data currently provided in the notebook)?

I like the toy data as it is simple/easy to understand and it helps to lay the foundation for explaining things. However, I am open to adding a slightly more interesting example from the paper (if it exists) AFTER the toy data set. Of course, this runs the risk of making the tutorial too long/redundant. Unfortunately, it's one of those things that we'll need to decide whether it makes the tutorial "better"/"worse" only after it is added. How does that sound?

@NimaSarajpoor
Copy link
Collaborator Author

. Of course, this runs the risk of making the tutorial too long/redundant.

Exactly! In fact, I, as a reader, prefer to understand the basics quickly and easily, and then go and explore my own data.

==================================================================

The reason I asked was that I misinterpreted one of our previous conversations and I thought we should add a real data set. So, for now, I can try to polish the tutorial and see if I can find a good example of snippets application.

@@ -1,8 +1,84 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added a Matplotlib style file so the following lines:

plt.rcParams["figure.figsize"] = [20, 6]  # width, height
plt.rcParams['xtick.direction'] = 'out'

can be replaced with:

plt.style.use('stumpy.mplstyle')

Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I have been busy with my research lately. Will take care of it and the other things soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been busy with my research lately. Will take care of it and the other things soon.

@ninimama There is absolutely no rush. Quality over quantity! Please take your time

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

According to the github guide, I did git fetch upstream to synch my fork (on github) with the origin repo but it shows the following error:

fatal: couldn't find remote ref refs/heads/master

According to a stackoverflow question, I checked out the .git/config file (shown below) but things sound okay to me:

image

I also tried git fetch and git fetch origin. But, none of them works.

Could you please help me resolve this issue? (So, I can import mplstyle in my local repo)

@seanlaw
Copy link
Contributor

seanlaw commented Jun 18, 2021

@ninimama I'm not entirely sure but it may be due to the fact that refs/heads/master does not exist in the TDAmeritrade/stumpy.git repo since we actually use refs/heads/main (note the difference is master vs main). I recommend following this documentation:

https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
I did follow that before. In fact, git remote -v shows the upstreams. But, I just removed them and added them again.

And, now it works!
Thanks

@seanlaw
Copy link
Contributor

seanlaw commented Jun 19, 2021

I did follow that before. In fact, git remote -v shows the upstreams. But, I just removed them and added them again.

After asking around, I also learned that there is a button in your fork:

E4M-4IYXoAwDUUA

@seanlaw
Copy link
Contributor

seanlaw commented Jun 19, 2021

@ninimama Did you also happen to follow step 5 in https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/syncing-a-fork? It is one step to sync your fork with the TDAmeritrade/stumpy.git repo but it is another step to sync your branch with your fork's main.

I think this important process is missing from our contributor's guide. I've added a new issue #414

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jun 19, 2021

@seanlaw

I think I did. And, if I remember correctly, the stumpy.mplstyle appears in the main branch (the default local branch on my laptop); however, the same file doesn't appear in the branch Snippet_Tutorial.

So, I did git switch Snippet_Tutorial && git rebase main and now it is okay.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
I checked out @alvii147 PR and I think he already updated the contributing doc.

As I said, I should have first checked out the Development branch first and then do the merge.

NOTE:
I am not sure if git rebase is better than git merge or not.

@seanlaw
Copy link
Contributor

seanlaw commented Jun 19, 2021

@ninimama I believe that both will work. However, git merge main is probably preferred as it preserves history. I came across this article that explains it quite well:

https://www.atlassian.com/git/tutorials/merging-vs-rebasing

Having said that, please don't be worried about doing it "perfectly". I am not familiar with this part of the process either and I am learning a lot about the challenges that a contributor is facing and this experience will help us make better documentation so that future contributors can benefit! So, you are doing us a huge favor by sharing your process. Thank you!

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jul 8, 2021

Should I now think of writing the unit tests for these modules?

Yes. I think you should start with writing the unit test for _get_mask_slices first. If you look inside tests/test_core.py you'll see tests for each function from core.py. I would like to see a naive implementation of _get_mask_slices using simple Python `for-loops

I finished the unit tests. But before committing and pushing it, I would like to know if you have any particular definition of the concept of naive code.

Below is my implementation:

def naive_get_mask_slices(mask):
    if np.sum(mask)==0:
      return np.empty(shape=(0,2), dtype="int64")

    else:
        slices_list = []
        prev_val = False
        
        for i in range(len(mask)):
            if mask[i] and ~prev_val and (i < len(mask)-1): 
                #finding the start index of a new slice
                new_slice = [i]
                prev_val = mask[i]
            elif mask[i] and ~prev_val and (i ==len(mask)-1): 
                #finding the start index of a new slice, mask is exhausted
                slices_list.append([i,i+1])
            elif ~mask[i] and prev_val:
                #finding the end index of the current slice
                new_slice.append(i)
                slices_list.append(new_slice)
                new_slice = []
                prev_val = mask[i]
            elif (mask[i] and prev_val and (i==len(mask)-1)):
                #finding the end index of the current slice, mask is exhausted
                new_slice.append(i+1)
                slices_list.append(new_slice)
            else:
                continue

I also wrote a test function to compare it with the output of _get_mask_slices. And, it works

===================================================

However, after a quick search, I found a piece of code that is used in more_itertools package. And, I just used it for our purpose as follows:

bool_arr = np.array([True, True, True, False, False, True, True, False, False, True])

def consecutive_groups(iterable, ordering=lambda x: x):
  from itertools import groupby
  from operator import itemgetter
  for k, g in groupby(
    enumerate(iterable), key=lambda x: x[0] - ordering(x[1])
    ):
    yield map(itemgetter(1), g)
  
all_slices=[]
for group in consecutive_groups(np.where(bool_arr)[0]):
    lst = list(group)
    all_slices.append([lst[0], lst[-1]+1])

np.array(all_slices)

# output:
array([[ 0,  3],
       [ 5,  7],
       [ 9, 10]])

And, both itertools and operator are python standard libraries. Do you think the latter piece of code is better (for the purpose of the unit test)?

@seanlaw
Copy link
Contributor

seanlaw commented Jul 8, 2021

However, after a quick search, I found a piece of code that is used in more_itertools package. And, I just used it for our purpose as follows:

So, we strongly prefer not adding any additional dependencies for testing as it affects our ability to maintain it long term. Self contained code is always preferred. I think your function is fine though the many elif statements seems to be a red-flag to me. I'm guessing that there might be a way to simplify this.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 8, 2021

Can you check if something like this might suffice?

def naive_get_mask_slices(mask):
    tmp = np.r_[0, mask]
    idx = []
    for i, val in enumerate(np.diff(tmp)):
        if val == 1:
            idx.append(i)
        if val == -1:
            idx.append(i)

    if tmp[-1]:
        idx.append(len(mask))

    return np.array(idx).reshape(len(idx)//2, 2)

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jul 8, 2021

@seanlaw

Can you check if something like this might suffice?

your implementation of the function naive_get_mask_slices worked !

I haven't tested this yet so it may require some adjustment

Regarding the snippets_regimes, I just did a minor adjustment to correct the output.

=================================

You can check out the recently-pushed commits.

stumpy/core.py Outdated
Parameters
----------
mask: ndarray
A boolean 1D array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks off here

stumpy/core.py Outdated
Returns
-------
slices: ndarray
slices of indices where the mask is True. Each slice has a size of two:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks off here

npt.assert_array_equal(left, right)


def naive_get_mask_slices(mask):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this naive implementation to the top of the test file. See other test files to see the location of the naive implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this naive implementation to the top of the test file. See other test files to see the location of the naive implementation

I moved it to the TOP of the test_core.py file.
FYI: Just in case you missed it, NOT all naive implementations are at the TOP. there are some test functions between the naive implementations.

Copy link
Contributor

@seanlaw seanlaw Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, would you mind creating a new issue to track this and then we can handle it in a separate issue from this one? I think it's best when the naive implementations are at the top of the file and then followed by the tests below it



def test_get_mask_slices():
bool_lst = [False, True]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
I will take a look at the test_snippets.py and test_aampdist_snippets.py to include the test for the snippets_regime.

I noticed you used the naive implementation of mpdist in the unit test of snippets. I was wondering if I should move the function naive_get_mask_slices to the naive.py file and reuse it again later in the unit test of snippets modules.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 9, 2021

I will take a look at the test_snippets.py and test_aampdist_snippets.py to include the test for the snippets_regime.

I noticed you used the naive implementation of mpdist in the unit test of snippets. I was wondering if I should move the function naive_get_mask_slices to the naive.py file and reuse it again later in the unit test of snippets modules.

Yes, if you plan to use naive_get_mask_slices in more than one test file then please move naive_get_mask_slices into naive.py. naive.py is there to help us avoid writing the same piece of code in multiple places.i is

@seanlaw
Copy link
Contributor

seanlaw commented Jul 14, 2021

@ninimama Nit trying to rush you as I'm sure you are busy but I just wanted to check in in case you need something from me. Please let me know how I can help

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

@ninimama Nit trying to rush you as I'm sure you are busy but I just wanted to check in in case you need something from me. Please let me know how I can help

Thanks for checking in. I took a look but then I got a little busy doing my research. I will try to finish this at the end of this week.

@seanlaw seanlaw mentioned this pull request Jul 16, 2021
@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jul 19, 2021

@seanlaw

I revised the test_snippets.py and test_aampdist_snippets.py to include regimes in the unit test. I also revise the naive.mpdist_snippets.py and naive.aampdist_snippets.py to include the regimes. However, I still get error and cannot figure it out why. I took a look at the ./test.sh file and it seems you use -rx option which should show the details when a test fails. But, this is what I get when I run ./test.sh

image

I am trying to fix this before doing the git push. If you think I should better do git push so you can take a look, please let me know.

===================================================================

UPDATE:

I ran the pytest only on test_snippets.py, test_aampdist_snippets.py , and test_core.py. All three were okay. There were just some warnings for test_core.py (please see below)

image

====================================================================

So, I don't know what causes the error.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 19, 2021

Yes, please push your latest changes and I can take a look.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 19, 2021

@ninimama So, it looks like the first thing that needs to be fixed is the code formatting. For STUMPY, we use the black code formatter and I can see that it has already been installed on your system as evidenced by the earlier error:

Oh no! 💥 💔 💥
1 file would be reformatted, 74 files would be left unchanged.
Error: Process completed with exit code 1.

By running cd contributions\stumpy && black ., this will tell black to automatically correct the formatting of your code if it finds any inconsistencies that it can resolve. Note that black does not change the actual logic of the code. It just makes sure that things like indentation, spacing, and line length are the same across the entire codebase but this ensures that everything looks the same aesthetically.

After you've resolved this, please commit the changes and push again.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jul 19, 2021

@seanlaw

we use the black code formatter and I can see that it has already been installed on your system

I thought black can automatically fix things and save them. But, after your note, I realized I have to do it and then do git add/ commit. I also noticed that I previously didn't use . in black .

==========================================================

There we go! Finally! :) Thanks for your input.

==========================================================

Just one note:

The intro part of the tutorial notebook for snippet needs to be revised. It still misses a good real-world yet easily understandable example for the use of the snippet algorithm.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 20, 2021

I thought black can automatically fix things and save them. But, after your note, I realized I have to do it and then do git add/ commit. I also noticed that I previously didn't use . in black .

As mentioned in the Contributing guide, for Python code, you should run both black . and then flake8 .

The intro part of the tutorial notebook for snippet needs to be revised. It still misses a good real-world yet easily understandable example for the use of the snippet algorithm.

Do you have any suggestions for this? I think the current example is simple and easy to understand but I agree that it would be nice to have a "real" example

@NimaSarajpoor
Copy link
Collaborator Author

Do you have any suggestions for this? I think the current example is simple and easy to understand but I agree that it would be nice to have a "real" example

I, unfortunately, don't have any nice examples in mind. The authors provided several examples in the paper; however, I think it is not completely easy to understand them.

One of the examples was about the electricity demand and it is somehow understandable why the electricity demand is different in summer and winter (two snippets). Since I am an electrical engineer, I might be a little biased.

I think we can leave it as it is. In the future, some of the users might come forward and share their stories and provide a nice example for the application of this algorithm.

@seanlaw
Copy link
Contributor

seanlaw commented Jul 20, 2021

I think we can leave it as it is. In the future, some of the users might come forward and share their stories and provide a nice example for the application of this algorithm.

Sounds good. This PR is focused on adding the Introduction so we'll continue to work on this tutorial until it is ready. The unit tests look great! The naive implementation isn't "naive" enough for my liking (i.e., it is too similar to the original code) and so I'm going to adjust it after the PR is merged. Thanks again for working on this and please look out for the next release v1.9.0 coming out really soon!

I really appreciate all of your contributions @ninimama and I have enjoyed collaborating with you!

@seanlaw seanlaw merged commit 210c2ef into stumpy-dev:main Jul 20, 2021
@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Jul 20, 2021

@seanlaw
I really learned a lot. That was a really good experience and thanks for your support. Although I haven't used MP-based algorithms (and STUMPY) in my own research as I realized it might not be the proper tool for the purpose of my research, I enjoyed the algorithms and their application, and that's why I really enjoyed contributing to STUMPY package in the first place. And then, it was your support and how well you tried to keep things organized. Really Great Work!!!

the naive implementation isn't "naive" enough for my liking (i.e., it is too similar to the original code) and so I'm going to adjust it after the PR is merged

I will definitely check it out later to see the changes and learn more from you!

@NimaSarajpoor NimaSarajpoor deleted the Snippets_Tutorial branch July 20, 2021 04:11
@seanlaw
Copy link
Contributor

seanlaw commented Jul 20, 2021

@ninimama I made some changes/corrections to some of these files in this commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants