-
Notifications
You must be signed in to change notification settings - Fork 340
Description
Problem:
I was trying to use stumpy.match and I realized there is a small error (maybe caused by rounding or something else) that prevents stumpy from discovering the matching pattern of a query in one of my cases. In below, I show the distance between a query and a pattern (of the same size) calculated in three ways:
when normalize=False:
1 - numpy.linalg.norm( ) gives 46.54073160107859
2- sklearn.metrics.pairwise_distances() gives 46.54073160107859
3- stumpy.match output gives 46.5407316010786
As you can see the value calculated by stumpy is a little bit lower. So, when I set the max_distance to 46.54073160107859, the function stumpy.match returned an empty array.
FYI: I had a step in my study where I had to calculate all the pairwise distances between Q (with length L) and "some" (not all) subsequences (with length L each). Then, I found the minimum distance d among the calculated distances. I set max_distance to d and I expected to get at least one matching subsequence. (why at least? because this time I searched through the whole time series not just "some" of its subsequences)
Solution (?):
I resolved it by simply add 1e-6 to max_distance value
Also:
If I understand correctly, you are using <= when trying to compare distance of two patterns with max_distance.
However, in the docstring stumpy.match function, it explains the return out as follows:
out : numpy.ndarray
The first column consists of distances of subsequences of T whose distances
to Q are smaller than max_distance ...
The term "smaller than" should be changed to "less than or equal to" (the same as what provided in the beginning of the same docstring)