-
Notifications
You must be signed in to change notification settings - Fork 338
Description
seed = 13
np.random.seed(seed)
T = np.random.uniform(-1000, 1000, [64]).astype(np.float64)
Q = T[44 : 48]
The distance between a sequence and itself should be zero. However, this is not true for the Q shown above if we use pearson approach. Let's see:
QT = np.dot(Q, Q)
μ_Q = np.mean(Q)
M_T = np.mean(Q)
σ_Q = np.std(Q)
Σ_T = np.std(Q)
denom = m * σ_Q * Σ_T
ρ = (QT - m * μ_Q * M_T) / denom
D_squared = np.abs(2 * m * (1.0 - ρ))
D = np.sqrt(D_squared)
And, we have:
>>> ρ
0.9999999999999999
>>> D_squared
8.881784197001252e-16
>>> D
2.9802322387695312e-08
Note that D should have been 0. Althought npt.assert_almost_equal(d, 0) does not raise an error, D is greater than 0. This can become an issue in testing snippet module (Explained below)
The performant snippet uses performant _mpdist_vect, which computes mpdist profile using _mass. The naive snippet uses naive mpdist_vect which computes distances using naive stump. so, the distance between Q and itself is 2.9802322387695312e-08 in naive approach. In performant approach, however, it is 0. This small difference can result in a considerable change in the boolean array mask:
which, in turn, results in a considerable change in np.sum(mask) and, consequently snippet_fractions.
Note that np.sum(mask) is a an interger. So, we are not talking about approx. 1e-8 loss of precision here! That small loss of precision in D can result in a loss of precision of at least 1 in np.sum(mask).
If we look at the errors in https://github.com/TDAmeritrade/stumpy/actions/runs/4636087916/jobs/8203665926?pr=823 we can see that all errors are related to snippets_fractions, and the errors are not negligible due to the reason explained earlier.