[WIP] Add tutorial for MPdist #433

asifmallik · 2021-07-10T00:04:32Z

This completes #290

Just noticed that I incorrectly capitalized d in MPdist so I need to correct that. Another thing is to decide whether to include a hierarchical clustering for euclidean distance. I was not able to replicate the figure from the paper so far. I am finalizing a notebook currently which showcases multiple attempts to replicate the figure and will post it in Issues.

Pull Request Checklist

Skipped ./test.shbecause this is a documentation only pull request

review-notebook-app · 2021-07-10T00:04:36Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

docs/Tutorial_MPDist.ipynb

alvii147 · 2021-07-11T21:15:23Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,309 @@
+{


Consider replacing the for loop with this:
data = loadmat('MaryBethAnhLisa_data.mat')['XSsY4'] dfs = {data[1][i][0]:pd.DataFrame(data[0][i].flatten()) for i in range(data[0].shape[0])}

Reply via ReviewNB

I think the for-loop is easier to read

Yea I agree, the dictionary comprehension is far from readable. Maybe I was just challenging myself to write a one liner.....

In any case, the final version should look like neither since we'll upload the datasets to Zenodo. Maybe you could do the data wrangling locally and THEN upload the cleaned dataset to Zenodo, so that on the final tutorial, it's just a matter of indexing a single dimension.

Yeah, cleaning it first sounds like a good idea, I will fix this later

Is using Zenodo something I can do for this something I can do? @seanlaw

@asifmalik I can help you with that. I'm thinking that we'll just create 6 separate files (one for each name)?

docs/Tutorial_MPDist.ipynb

seanlaw · 2021-07-12T13:58:10Z

@asifmallik So far, so good! Good work. I like where this is going

codecov-commenter · 2021-07-12T14:16:57Z

Codecov Report

Merging #433 (96fe7d5) into main (66b3402) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##              main      #433   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           35        35           
  Lines         2766      2781   +15     
=========================================
+ Hits          2766      2781   +15

Impacted Files	Coverage Δ
stumpy/core.py	`100.00% <0.00%> (ø)`
stumpy/motifs.py	`100.00% <0.00%> (ø)`
stumpy/aamp_motifs.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66b3402...96fe7d5. Read the comment docs.

- Correct spelling and capitlization errors - Make explanations better and less ambiguous - More succint code - Variable renaming

asifmallik · 2021-07-30T03:33:17Z

@seanlaw @alvii147 Made some changes and incorporated some of your suggestions and corrections. Still have a bit left though

seanlaw · 2021-07-30T12:50:13Z

@asifmallik No problem! I value quality over quantity and what you've done so far is really taking shape

docs/Tutorial_MPDist.ipynb

- Add dendrogram for Euclidean - Add explanation for what we expect in cluster - Add explanation for difference in Euclidean and MPdist result - Remove mention of error in paper - Improve code quality - Align different names - Other minor fixes

…ndas with numpy - Grammar fixes - Complete explanation for MPdist

asifmallik · 2021-08-01T12:23:56Z

@seanlaw ready for a more thorough review

asifmallik · 2021-08-01T12:29:33Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


*of their subsequences

Reply via ReviewNB

Instead of "determining whether most" maybe it should be "determining whether a limited subset of subsequences - as parameterized by a threshold - are similar"

For instance, if two time series is made up of the same repeating subsequences of window length m, then they would MPdist of 0 (if window size for MPdist is set to m), even if they are phase shifted. On the other hand, the Euclidean distance would be non-zero as long as they are phase shifted.
I find this sentence to be very confusing/abstract as it is not anchored to anything and expects the user to already have some prior knowledge

Consider how the original paper tries to describe this and paraphrase from there

For instance, if two time series is are made up of the same repeating subsequences of window length m, then they would have an MPdist of 0...

I agree it's a bit difficult to follow this definition, although it's tough to come up with an alternative phrasing. Maybe something like: For instance, two time series that are made up of the same periodic subsequences, but are phase shifted, their MPdist would be zero, while their Euclidean distance would be non-zero.

I understand the significance of m , but maybe in this exact definition you can remove the mention of it? Idk it seems cleaner for the sake of explaining.

seanlaw · 2021-08-01T14:22:03Z

@seanlaw ready for a more thorough review

@asifmallik I will try to find some time to review it. Thank you

docs/Tutorial_MPDist.ipynb

seanlaw · 2021-08-03T01:32:22Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


Line #4. t_1 = base[:50]

Please use T_A and T_B to refer to time series A and time series B

Reply via ReviewNB

seanlaw · 2021-08-03T01:32:22Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


Line #7. plt.style.use(mplstyle_url)
Please remove the above line

Reply via ReviewNB

seanlaw · 2021-08-03T01:32:22Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


I wonder if it would make more sense to have the long name first and then followed by the short parts

Reply via ReviewNB

Yes, I think that's a good idea too, changing the ordering

seanlaw · 2021-08-03T01:32:22Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


It feels like you are saying a lot here but you aren't showing the evidence for it until later. It is okay to make a statement and show the evidence inline.

Reply via ReviewNB

seanlaw · 2021-08-03T01:32:23Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


What do the colors in the dendrogram mean?

Reply via ReviewNB

Reading the docs, I am not exactly quite sure what it means, seems like they color it something depending on whether the distance between the children cluster nodes exceeds a certain threshold (default is set to be 70% of max distance). This doesn't seem particularly relevant for this case so I can just set link_color_func to be a function that returns the same color for every cluster.

seanlaw · 2021-08-03T01:36:11Z

@asifmallik I really like the story with both the Euclidean distance vs MPdist. It "feels" complete.

I was wondering if it would be possible to start with a simple example that only focuses on using MPdist to compare just two time series first. This way the focus is purely on MPdist and its output. After we've done through MPdist then we can talk about the name example. Otherwise, the dendrogram work might overshadow the point of this tutorial and that is to learn about MPdist and why it is useful and how it works.

@alvii147 I am curious as to your thoughts here as well!

asifmallik · 2021-08-10T04:40:17Z

@asifmallik I really like the story with both the Euclidean distance vs MPdist. It "feels" complete.

I was wondering if it would be possible to start with a simple example that only focuses on using MPdist to compare just two time series first. This way the focus is purely on MPdist and its output. After we've done through MPdist then we can talk about the name example. Otherwise, the dendrogram work might overshadow the point of this tutorial and that is to learn about MPdist and why it is useful and how it works.

Do you mean I should replace the current introduction which starts with two randomly generated time series with the time series for two of the names instead?

seanlaw · 2021-08-10T07:10:03Z

Do you mean I should replace the current introduction which starts with two randomly generated time series with the time series for two of the names instead?

I would like to replace the random data example and use the data from Figure 1 (and maybe Figure 2) and also motivate and explain the pitfalls of other measures/methods by following what the authors covered/discussed in the introduction.

alvii147 · 2021-08-13T15:20:15Z

docs/Tutorial_MPDist.ipynb

@@ -0,0 +1,380 @@
+{


It would look nicer if the repeated subsequences were highlighted, or more visible somehow.

Try this (or something similar):

from matplotlib.patches import Rectangle

rect = Rectangle((0, 0), 20, 50, facecolor='lightgrey') axs[0].add_patch(rect) axs[0].plot(np.arange(20), t_1[:20], color='aquamarine', linewidth=10, alpha=0.5)
rect = Rectangle((5, 0), 20, 50, facecolor='lightgrey') axs[1].add_patch(rect) axs[1].plot(np.arange(5, 25), t_2[5:25], color='aquamarine', linewidth=10, alpha=0.5)

Reply via ReviewNB

docs/Tutorial_MPDist.ipynb

seanlaw · 2022-02-06T10:34:58Z

@asifmallik Any updates on this?

Add tutorial for MPDist

cf41468

asifmallik changed the title ~~[WIP] Add tutorial for MPDist~~ [WIP] Add tutorial for MPdist Jul 10, 2021

alvii147 reviewed Jul 11, 2021

View reviewed changes

seanlaw reviewed Jul 12, 2021

View reviewed changes

asifmallik added 2 commits July 29, 2021 22:22

Merge branch 'main' into tutorial_mpdist

49ee847

Various fixes and minor changes:

b7b2409

- Correct spelling and capitlization errors - Make explanations better and less ambiguous - More succint code - Variable renaming

seanlaw reviewed Jul 31, 2021

View reviewed changes

docs/Tutorial_MPDist.ipynb Outdated Show resolved Hide resolved

Various improvements and fixes:

618d5d4

- Add dendrogram for Euclidean - Add explanation for what we expect in cluster - Add explanation for difference in Euclidean and MPdist result - Remove mention of error in paper - Improve code quality - Align different names - Other minor fixes

asifmallik changed the title ~~[WIP] Add tutorial for MPdist~~ Add tutorial for MPdist Aug 1, 2021

- Remove pandas and scipy.signal dependency and replace mention of pa…

ff2e309

…ndas with numpy - Grammar fixes - Complete explanation for MPdist

asifmallik commented Aug 1, 2021

View reviewed changes

seanlaw reviewed Aug 3, 2021

View reviewed changes

alvii147 reviewed Aug 13, 2021

View reviewed changes

docs/Tutorial_MPDist.ipynb Outdated Show resolved Hide resolved

docs/Tutorial_MPDist.ipynb Outdated Show resolved Hide resolved

asifmallik added 3 commits September 1, 2021 10:47

Rename tutorial file to be consistent with paper

08292d7

Configure matplotlib styling only once at the top

5e3846e

Minor fixes

96fe7d5

seanlaw changed the title ~~Add tutorial for MPdist~~ [WIP] Add tutorial for MPdist Sep 22, 2021

seanlaw merged commit 3e991ac into stumpy-dev:main Mar 17, 2022

[WIP] Add tutorial for MPdist #433

[WIP] Add tutorial for MPdist #433

Uh oh!

Conversation

asifmallik commented Jul 10, 2021

Pull Request Checklist

Uh oh!

review-notebook-app bot commented Jul 10, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seanlaw commented Jul 12, 2021

Uh oh!

codecov-commenter commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

asifmallik commented Jul 30, 2021

Uh oh!

seanlaw commented Jul 30, 2021

Uh oh!

Uh oh!

asifmallik commented Aug 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanlaw Aug 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanlaw commented Aug 1, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanlaw Aug 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanlaw Aug 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asifmallik Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jul 12, 2021 •

edited

Loading

seanlaw Aug 2, 2021 •

edited

Loading

seanlaw Aug 3, 2021 •

edited

Loading

seanlaw Aug 3, 2021 •

edited

Loading

asifmallik Sep 1, 2021 •

edited

Loading

seanlaw commented Aug 3, 2021 •

edited

Loading

alvii147 Aug 13, 2021 •

edited

Loading