Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How avoid NaN and Infinity values in the distance profile? #17

Open
barrybecker4 opened this issue Jan 18, 2019 · 6 comments
Open

How avoid NaN and Infinity values in the distance profile? #17

barrybecker4 opened this issue Jan 18, 2019 · 6 comments
Labels
bug Something isn't working

Comments

@barrybecker4
Copy link
Collaborator

While adding more tests, I noticed that there are a number of cases where the distance part of the matrix profile can have Infinity and NaN values. One simple example (but not the only one) is a straight line.

If I run matrix profile on this straight line series of 10 y values

1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2, 1.2

with a window of 5, then I get MP distance values of

NaN, NaN, NaN, POSITIVE_INFINITY, 3.1623, POSITIVE_INFINITY

And MP index values of

2.0000, 3.0000, 4.0000, 5.0000, 1.0000, 5.0000

There are 6 values in the profile (as expected). The length should be 10 – windowSize + 1 = 6.
I don’t understand yet why there are NaN values. I was expecting all 0’s for the distances. Maybe it has to do with z-normalization.

I will look at the code more closely and try to make a proposal and perhaps PR to avoid these values.

@barrybecker4 barrybecker4 added the question Further information is requested label Jan 18, 2019
@barrybecker4 barrybecker4 changed the title How avoid NaN and Infinity values in the distance profile How avoid NaN and Infinity values in the distance profile? Jan 18, 2019
@barrybecker4
Copy link
Collaborator Author

The source of the problem seems to be movStd in Mass.java.
The moving standard deviation can be 0 or even slightly less than zero in some cases (perhaps because of numerical error). Then later, at the very end of the mass method, we divide by the stdDev vector. If it has 0's (or NaN's) in it, then the result will have Infinity (or NaN's).
I propose that we never let the values in the stdDev vector ever go completely to zero. Instead use some epsilon. I tried this and it seems to work for my simple straight line example. At least it seems to give a better result in my opinion.

If I make the epsilon 1e-10 or 1e-40, the computed result is now

3.1623, 3.1623, 3.1623, 3.1623, 3.1623, 3.1623

for distances, and

5.0000, 5.0000, 5.0000, 0, 1.0000, 2.0000

for indices.

@barrybecker4
Copy link
Collaborator Author

Though the above suggestion is an improvement, it does not fix all the cases of NaN and Infinity.
Another case, is for example, an upward sloping line like this:

1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4

For this the profile is:

NaN, NaN, 0.0035, NaN, NaN, POSITIVE_INFINITY, POSITIVE_INFINITY, POSITIVE_INFINITY
3.0000, 4.0000, 4.0000, 5.0000, 6.0000, 7.0000, 7.0000, 7.0000

The problem in this case seems to be at the end of the mass method, where we do the following

        INDArray dot = Nd4j.create(realDot, new int[]{1, realDot.length});
        INDArray res = dot.get(NDArrayIndex.interval(m - 1, dot.length())).div(stdv);
        INDArray r1 = res.neg().add(m).mul(2);
        INDArray r2 = sqrt(r1);
        return r2;

The values in res are all very close to the value of m, but not quite, so when r1 is computed the values are very near 0, and even less than 0 in some cases. The sqrt makes those values NaN.

If I change add(m) to add(m + 0.01), then the matrix profile is this

0.4472, 0.4472, 0.4472, 0.4472, 0.4472, 0.4471, 0.4471, 0.4471
5.0000, 5.0000, 5.0000, 6.0000, 1.0000, 1.0000, 1.0000, 1.0000

This seems better, but I don't know if its correct. It also effects all other test cases slightly, and there are still other cases where NaN and Infinity appear.

@ensozos
Copy link
Owner

ensozos commented Jan 20, 2019

I 'm not sure if this the way to fix the issue. I saw that MASS has different distance profile computation from ours. But there is also this implementation which is similar to the current version we have. We have to make sure if these versions are equivalent.

@barrybecker4
Copy link
Collaborator Author

I just read the slides about MASS in https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part2.pdf again. There is a slide on the 3 sources of numerical error that can arise. I think the current java code does not have safeguards against all these forms of errors. I will try to add some.

@barrybecker4
Copy link
Collaborator Author

I was able to address the second 2 sources of numerical instability by adding
BooleanIndexing.replaceWhere(res, EPS, Conditions.lessThanOrEqual(EPS));
Right before the 2 places where sqrt is taken in the java mass method. This prevents the stdDev from becoming 0, and prevents a negative distance from becoming NaN in the final result. All results for distances and indices in my different manually added simple test cases work much better now. There are no more NaN's or Infinities in the results.
The first case of instability is not something we have encountered yet in our test datasets, and am not much concerned about it for the time being. PR coming shortly.

barrybecker4 pushed a commit to esi-mineset/Matrix-Profile that referenced this issue Jan 21, 2019
Done by preventing the stdv (standard deviation) from going all the way to 0.
barrybecker4 pushed a commit to esi-mineset/Matrix-Profile that referenced this issue Jan 21, 2019
Address 2 sources of numerical instability by adding
BooleanIndexing.replaceWhere(res, EPS, Conditions.lessThanOrEqual(EPS));
Right before the 2 places where sqrt is taken in the java mass method. This prevents the stdDev from becoming 0, and prevents a negative distance from becoming NaN in the final result. All results for distances and indices in simple test cases work much better now. There are no more NaN's or Infinities in the results.
Refer to https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part2.pdf to see the slide on 3 possible soources of numerical instability.
Added some additional tests that had lots of NaNa and Infinities before this change, but now behave much better.
barrybecker4 pushed a commit to esi-mineset/Matrix-Profile that referenced this issue Jan 21, 2019
Bump mineset version to 0.0.6-mineset.
@barrybecker4 barrybecker4 added bug Something isn't working and removed question Further information is requested labels Jan 21, 2019
ensozos added a commit that referenced this issue Jan 22, 2019
@ensozos
Copy link
Owner

ensozos commented Jan 22, 2019

Fixed ,Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants