Skip to content

add isconstant param to mstump#871

Merged
seanlaw merged 31 commits intostumpy-dev:mainfrom
NimaSarajpoor:add_isconstant_mstump
Jul 11, 2023
Merged

add isconstant param to mstump#871
seanlaw merged 31 commits intostumpy-dev:mainfrom
NimaSarajpoor:add_isconstant_mstump

Conversation

@NimaSarajpoor
Copy link
Copy Markdown
Collaborator

No description provided.

@NimaSarajpoor NimaSarajpoor force-pushed the add_isconstant_mstump branch from 137d831 to ad32d18 Compare May 29, 2023 04:39
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c9a84f9) 99.12% compared to head (767b5e8) 99.13%.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #871    +/-   ##
========================================
  Coverage   99.12%   99.13%            
========================================
  Files          83       83            
  Lines       13940    14068   +128     
========================================
+ Hits        13818    13946   +128     
  Misses        122      122            
Impacted Files Coverage Δ
stumpy/floss.py 100.00% <ø> (ø)
stumpy/mstump.py 100.00% <100.00%> (ø)
stumpy/mstumped.py 100.00% <100.00%> (ø)
tests/naive.py 100.00% <100.00%> (ø)
tests/test_mstump.py 100.00% <100.00%> (ø)
tests/test_mstumped.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Copy Markdown
Collaborator Author

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
Could you please check out the only comment I left for now? Afterwards, I start with adding test function and then adding the param isconstant to the performant module.

Comment thread tests/naive.py Outdated
Copy link
Copy Markdown
Collaborator Author

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I have a few comments. Can you please go through them and let me know if you have any feedback?

Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py
Comment thread stumpy/mstump.py Outdated
Comment thread tests/naive.py
@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

NimaSarajpoor commented Jun 5, 2023

There is still one function that needs to support the parameter T_subseq_isconstant, and that is the function subspace.

In this function, we can see the following lines:

https://github.com/TDAmeritrade/stumpy/blob/e91d813f553f14a7207c310ccaef1887fea02ab5/stumpy/mstump.py#L185-L193

where the variables subseqs and neighbors seem to have the shape (d, m). In other words, they are just a multi-dim subseq (or 2D subseq).

Hence, the D in line 193, i.e. D = np.linalg.norm(disc_subseqs - disc_neighbors, axis=1) , is an array of length d representing the z-norm euclidean distance between subseqs and neighbors (AFTER being discretized). Hence, D[idx] is basically the z-norm euclidean distance between subseqs[idx] and neighbors[idx] (AFTER being discretized).

Now the question is: How should we apply the "isconstant" concept here? One straightforward approach is to just say if both subseqs[idx] and neighbors[idx] are constant, their distance is 0, i.e. D[idx]=0.

@seanlaw
To me, if a user says a subseq is constant, then it is constant (I mean...it should be treated like an all-zero subseq). So, if we discretize this all-zero subseq, it should still be constant. Hence, it makes sense to just set D[idx]=0 when both subseqs[idx] and neighbors[idx] are constant before the descritization.

What do you think? Is there anything that I am missing here?

Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
@seanlaw
Copy link
Copy Markdown
Contributor

seanlaw commented Jun 5, 2023

One straightforward approach is to just say if both subseqs[idx] and neighbors[idx] are constant, their distance is 0, i.e. D[idx]=0.

That sounds reasonable to me.

To me, if a user says a subseq is constant, then it is constant (I mean...it should be treated like an all-zero subseq). So, if we discretize this all-zero subseq, it should still be constant. Hence, it makes sense to just set D[idx]=0 when both subseqs[idx] and neighbors[idx] are constant before the descritization. What do you think? Is there anything that I am missing here?

Yeah, that sounds fine. At the end of the day, the user providing T_subseq_isconstant is likely rare and it is important to ensure that the result they get by applying the default (i.e., doing nothing) is identical to when they provide a function that is identical to the default rolling_isconstant function. Prior to this PR, what would happen to the subspace calculation with just default values? We need to make sure that we don't change the behavior of the outputs of the PREVIOUS defaults with this PR. Does that make sense?

In case it matters, you would likely apply the same logic to mdl as well.

@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

and it is important to ensure that the result they get by applying the default (i.e., doing nothing) is identical to when they provide a function that is identical to the default rolling_isconstant function.

Right! I should check that out in the main branch and just get some sense about the output and see what happens in default.

Prior to this PR, what would happen to the subspace calculation with just default values? We need to make sure that we don't change the behavior of the outputs of the PREVIOUS defaults with this PR. Does that make sense?

Yep! I will go through it more carefully to make sure that we are not messing with the default mode, i.e. output of existing version.

In case it matters, you would likely apply the same logic to mdl as well.

Right! I missed that while I was going through the script. Thanks for pointing that out.

@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

NimaSarajpoor commented Jun 8, 2023

I have been thinking about the discretizing process and if we should be worried about how it handles (user-defined) constant subseqs. I now think it should be straightforward. This is what I noticed:

Before discretization, we first z-normalize the subseqs. Now, something very cool is happening here: the mean of the z-normalized subseq is zero. This means that the z-normalized subseq is either ALL zero or it has BOTH positive and negative values. (Because otherwise, the mean cannot be zero).

Therefore, when we discretize a subseq S, I think we cannot have identical values in the discretized version unless the non-discretized version, i.e. S, is all zero. For instance, let's assume a is the original subsequence which is NOT an all-zero array. Hence, its z-normalized version, az, has BOTH positive and negative values . Also, let's assume the value of param n_bit in the function mstump._inverse_norm is 1. This result in bin with edge 0, which makes sense since 0 is exactly the middle of normal distribution. Hence, negative values in az are replaced with 0 and positive values are replaced with 1. In other words, a non-constant subsequence has a normalized version that has both neg and pos values, and therefore the discretized version has at least one occurance of 1 and at least one occurance of 0. Hence, it cannot be constant.

Now, what if user defines constant subseqs by passing isconstant param? I think it should be fine. The only thing we need to do is to replace those subseq with all-zero subseq, and that's it. It will be handled correctly via discretization, and the distance will be computed according to what is implemented already in the code. Also, it is important to note that we do not z-normalize subsequence after discretization. Hence, the constant-related issue does not matter at that point, i.e. after discretization, as the discretization process reflects the constant subsequence as explained earlier.

@seanlaw
Copy link
Copy Markdown
Contributor

seanlaw commented Jun 8, 2023

Now, what if user defines constant subseqs by passing isconstant param? I think it should be fine. The only thing we need to do is to replace those subseq with all-zero subseq, and that's it. It will be handled correctly via discretization, and the distance will be computed according to what is implemented already in the code. Also, it is important to note that we do not z-normalize subsequence after discretization. Hence, the constant-related issue does not matter at that point, i.e. after discretization, as the discretization process reflects the constant subsequence as explained earlier.

Ahh, I see what you mean. Since we z-normalize prior to discretization then we should be mostly safe (at least in the case where the subsequence is "truly" constant). In the case where the user provides isconstant then we "fake" the z-normalization and simply set the z-normalized subsequence to all-zero and, to the discretization, this subsequence is indistinguishable from the first case ("truly" constant). Is that correct?

If so, then it sounds like this greatly simplifies things! Thanks for thinking through it @NimaSarajpoor. It's been so long that I've forgotten all of the inner workings. Hopefully, the code/logic wasn't too painful to understand?

@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

"fake" the z-normalization and simply set the z-normalized subsequence to all-zero and, to the discretization, this subsequence is indistinguishable from the first case ("truly" constant). Is that correct?

That is correct!

Hopefully, the code/logic wasn't too painful to understand?

I think it was good overal :)

Copy link
Copy Markdown
Collaborator Author

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I think we are close to finishing this PR. I left a few comments (some of them are for me. So, you can ignore those as I will take care of them for sure unless you have a suggestion / opposite opinion regarding them)

Comment thread stumpy/mstump.py Outdated
Comment thread tests/test_mstump.py Outdated
Comment thread tests/test_mstump.py
Comment thread tests/test_mstump.py
@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

@seanlaw
The comments are addressed. The support for param "isconstant" is also added to the "mstumped" module.

@seanlaw
Copy link
Copy Markdown
Contributor

seanlaw commented Jun 10, 2023

@seanlaw
The comments are addressed. The support for param "isconstant" is also added to the "mstumped" module.

Thank you. I will find some time to review it

Copy link
Copy Markdown
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor Let's start with these comments

Comment thread stumpy/mstump.py
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstump.py Outdated
Comment thread stumpy/mstumped.py
Comment thread stumpy/mstumped.py Outdated
Comment thread stumpy/mstumped.py Outdated
Comment thread stumpy/mstumped.py
Comment thread stumpy/mstumped.py
@seanlaw
Copy link
Copy Markdown
Contributor

seanlaw commented Jun 17, 2023

@NimaSarajpoor Do I need to review anything yet?

@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

@NimaSarajpoor Do I need to review anything yet?

No, I haven't made the changes yet. Hopefully, I will push them in upcoming days.

@NimaSarajpoor NimaSarajpoor force-pushed the add_isconstant_mstump branch from 45b965a to 4e0bd04 Compare July 8, 2023 21:40
Comment thread stumpy/mstump.py Outdated
@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

@seanlaw
I think this PR is good. Plz let me know any comment you may have.

@seanlaw
Copy link
Copy Markdown
Contributor

seanlaw commented Jul 10, 2023

@seanlaw I think this PR is good. Plz let me know any comment you may have.

Thank you. I will find some time to review it

Copy link
Copy Markdown
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment

Comment thread tests/naive.py Outdated
@seanlaw seanlaw merged commit 029a6d2 into stumpy-dev:main Jul 11, 2023
@seanlaw
Copy link
Copy Markdown
Contributor

seanlaw commented Jul 11, 2023

@NimaSarajpoor this was a more challenging one. Thank you for your contribution and persistence!

@NimaSarajpoor
Copy link
Copy Markdown
Collaborator Author

@NimaSarajpoor this was a more challenging one. Thank you for your contribution and persistence!

Thank you for the guidance!

@NimaSarajpoor NimaSarajpoor deleted the add_isconstant_mstump branch September 4, 2023 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants