-
Notifications
You must be signed in to change notification settings - Fork 336
Description
In many functions, we currently pre-compute a T_subseq_isfinite
or T_subseq_isconstant
. However, as the length of the time series increases, these data structures also increase proportionately in length. For a typically long time series, we'd expect:
T_subseq_isfinite
will be mostly filled withTrue
T_subseq_isconstant
will be mostly filled withFalse
From a memory standpoint, it is probably best to capture/store the minority cases (i.e., T_subseq_isinfinite
, note INIFINITE here, and T_subseq_isconstant
) as they will take up the least amount of space/memory. The most efficient way to handle this (yet to be tested) is to simply use Python sets.
Here is a trivial example:
def test(T_subseq_isinfinite, T_subseq_isconstant):
for i in range(100_000_000):
if i in T_subseq_isinfinite and i in T_subseq_isconstant:
pass
Sadly, support for using Python sets directly in numba is being deprecated. Though, a typed.List
has been added:
from numba import njit
from numba.typed import List
@njit
def foo(x):
x.append(10)
a = [1, 2, 3]
typed_a = List()
[typed_a.append(x) for x in a]
foo(typed_a)
and typed.Set
is expected to be implemented soon but, for now, something like this also works but comes with a deprecation warning:
@njit
def test(T_subseq_isinfinite, T_subseq_isconstant):
for i in range(100_000_000):
if i in T_subseq_isinfinite and i in T_subseq_isconstant:
pass
Once typed.Set
is added to numba
, then we should be able to save a ton on storage!
Note: This would mean replacing the T_subseq_isfinite
with T_subseq_isinfinite
in stumpy.mass
and stumpy.match
(and in other public API) as well as all internal functions and only allowing T_subseq_isconstant
be a set. Also, if a function is used, it must return a list where True
and it gets converted to a set internally (rather than a NumPy array).
Note: One would need to check if sets are available for cuda
!