Skip to content

distance between Q and itself is not zero #828

@NimaSarajpoor

Description

@NimaSarajpoor
seed = 13
np.random.seed(seed)
T = np.random.uniform(-1000, 1000, [64]).astype(np.float64)

Q = T[44 : 48]

The distance between a sequence and itself should be zero. However, this is not true for the Q shown above if we use pearson approach. Let's see:

QT = np.dot(Q, Q)
μ_Q = np.mean(Q)
M_T = np.mean(Q)

σ_Q = np.std(Q)
Σ_T = np.std(Q)

denom = m * σ_Q * Σ_T
ρ = (QT - m * μ_Q * M_T) / denom 
D_squared = np.abs(2 * m * (1.0 - ρ))
D = np.sqrt(D_squared)

And, we have:

>>> ρ
0.9999999999999999

>>> D_squared
8.881784197001252e-16

>>> D
2.9802322387695312e-08

Note that D should have been 0. Althought npt.assert_almost_equal(d, 0) does not raise an error, D is greater than 0. This can become an issue in testing snippet module (Explained below)


The performant snippet uses performant _mpdist_vect, which computes mpdist profile using _mass. The naive snippet uses naive mpdist_vect which computes distances using naive stump. so, the distance between Q and itself is 2.9802322387695312e-08 in naive approach. In performant approach, however, it is 0. This small difference can result in a considerable change in the boolean array mask:

https://github.com/TDAmeritrade/stumpy/blob/2d003ff2c2e0212eba32b22211a743306bafa992/stumpy/snippets.py#L285

which, in turn, results in a considerable change in np.sum(mask) and, consequently snippet_fractions.

https://github.com/TDAmeritrade/stumpy/blob/2d003ff2c2e0212eba32b22211a743306bafa992/stumpy/snippets.py#L286

Note that np.sum(mask) is a an interger. So, we are not talking about approx. 1e-8 loss of precision here! That small loss of precision in D can result in a loss of precision of at least 1 in np.sum(mask).

If we look at the errors in https://github.com/TDAmeritrade/stumpy/actions/runs/4636087916/jobs/8203665926?pr=823 we can see that all errors are related to snippets_fractions, and the errors are not negligible due to the reason explained earlier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions