Skip to content

Convert Boolean Arrays to Sets #801

@seanlaw

Description

@seanlaw

In many functions, we currently pre-compute a T_subseq_isfinite or T_subseq_isconstant. However, as the length of the time series increases, these data structures also increase proportionately in length. For a typically long time series, we'd expect:

  1. T_subseq_isfinite will be mostly filled with True
  2. T_subseq_isconstant will be mostly filled with False

From a memory standpoint, it is probably best to capture/store the minority cases (i.e., T_subseq_isinfinite, note INIFINITE here, and T_subseq_isconstant) as they will take up the least amount of space/memory. The most efficient way to handle this (yet to be tested) is to simply use Python sets.

Here is a trivial example:

def test(T_subseq_isinfinite, T_subseq_isconstant):
    for i in range(100_000_000):
        if i in T_subseq_isinfinite and i in T_subseq_isconstant:
            pass

Sadly, support for using Python sets directly in numba is being deprecated. Though, a typed.List has been added:

from numba import njit
from numba.typed import List

@njit
def foo(x):
    x.append(10)

a = [1, 2, 3]
typed_a = List()
[typed_a.append(x) for x in a]
foo(typed_a)

and typed.Set is expected to be implemented soon but, for now, something like this also works but comes with a deprecation warning:

@njit
def test(T_subseq_isinfinite, T_subseq_isconstant):
    for i in range(100_000_000):
        if i in T_subseq_isinfinite and i in T_subseq_isconstant:
            pass

Once typed.Set is added to numba, then we should be able to save a ton on storage!

Note: This would mean replacing the T_subseq_isfinite with T_subseq_isinfinite in stumpy.mass and stumpy.match (and in other public API) as well as all internal functions and only allowing T_subseq_isconstant be a set. Also, if a function is used, it must return a list where True and it gets converted to a set internally (rather than a NumPy array).

Note: One would need to check if sets are available for cuda!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions