Convert Boolean Arrays to Sets

In many functions, we currently pre-compute a `T_subseq_isfinite` or `T_subseq_isconstant`. However, as the length of the time series increases, these data structures also increase proportionately in length. For a typically long time series, we'd expect:

1. `T_subseq_isfinite` will be mostly filled with `True`
2. `T_subseq_isconstant` will be mostly filled with `False`

From a memory standpoint, it is probably best to capture/store the minority cases (i.e., `T_subseq_isinfinite`, note INIFINITE here, and `T_subseq_isconstant`) as they will take up the least amount of space/memory. The most efficient way to handle this (yet to be tested) is to simply use Python sets.

Here is a trivial example:

```
def test(T_subseq_isinfinite, T_subseq_isconstant):
    for i in range(100_000_000):
        if i in T_subseq_isinfinite and i in T_subseq_isconstant:
            pass
```

Sadly, support for [using Python sets directly in numba is being deprecated](https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types). Though, a `typed.List` has been added:

```
from numba import njit
from numba.typed import List

@njit
def foo(x):
    x.append(10)

a = [1, 2, 3]
typed_a = List()
[typed_a.append(x) for x in a]
foo(typed_a)
```

and `typed.Set` is expected to be implemented soon but, for now, something like this also works but comes with a deprecation warning:

```
@njit
def test(T_subseq_isinfinite, T_subseq_isconstant):
    for i in range(100_000_000):
        if i in T_subseq_isinfinite and i in T_subseq_isconstant:
            pass
```

Once `typed.Set` is added to `numba`, then we should be able to save a ton on storage!  

Note: This would mean replacing the `T_subseq_isfinite` with `T_subseq_isinfinite` in `stumpy.mass` and `stumpy.match` (and in other public API) as well as all internal functions and only allowing `T_subseq_isconstant` be a set. Also, if a function is used, it must return a list where `True` and it gets converted to a set internally (rather than a NumPy array).

Note: One would need to check if sets are available for `cuda`!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Convert Boolean Arrays to Sets #801

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Convert Boolean Arrays to Sets #801

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions