Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup _numba_sampen kernel #22

Closed
FirefoxMetzger opened this issue Jan 17, 2023 · 9 comments · Fixed by #25
Closed

speedup _numba_sampen kernel #22

FirefoxMetzger opened this issue Jan 17, 2023 · 9 comments · Fixed by #25
Assignees
Labels
enhancement New feature or request

Comments

@FirefoxMetzger
Copy link
Contributor

I recently implemented my own version of _numba_sampen as a C extension. Mainly because I did not know about this package, but also because I thought it would be good to learn how to write C extensions for numpy.

From my benchmarks, it looks like my implementation is slightly faster than the current one, so I was wondering if you would be interested in a PR to update the kernel. It would also improve the memory requirement, as the current implementation allocates several copies of the array, whereas my kernel is constant (wrt. array size and window size).

Here is a benchmark figure:

result

And here is a pure python implementation of my kernel:

def sample_entropy(data, window_size: int, tolerance: float) -> float:
    """Calculate Sample Entropy

    Sample entropy is a measure of irregularity of a series. In other words,
    it quantifies how difficult it is ("on average") to predict the next value
    of a series from a window of ``window_size`` known values.

    Read more at: https://en.wikipedia.org/wiki/Sample_entropy

    Parameters
    ----------
    window_size : int
        How many elements to include in the template.
    tolerance : float
        How much sampling noise is tolerable and can be "ignored".

    Returns
    -------
    entropy : float
        A measure of how irregular the series is.

    """

    # instead of sliding two windows independently, this implementation
    # slides both windows at a constant offset (inner loop) and then varies
    # the offset (outer loop). The constant offset between windows has the
    # advantage that a lot of computation between two applications can be
    # reused. This way, we only need to track what slides out of the window
    # and what slides in.

    # further, instead of keeping the full window, we only need to know how
    # many pairs in the current window above the threshold. If there is at
    # least one, we don't count the window else we do. This way, we only
    # need to track if a pair > threshold moved out/in and keep one counter
    # per window length.

    # sliding a m-size and (m+1)-size window also has overlap, so we can
    # apply some trickery to share intermediate results when computing the
    # numerator and denominator

    sequence = np.asarray(data)
    size = sequence.size
    sequence = sequence.tolist()

    numerator = 0
    denominator = 0

    for offset in range(1, size - window_size):
        n_numerator = (
            abs(sequence[window_size] - sequence[window_size + offset]) > tolerance
        )
        n_denominator = 0

        for idx in range(window_size):
            n_numerator += abs(sequence[idx] - sequence[idx + offset]) > tolerance
            n_denominator += abs(sequence[idx] - sequence[idx + offset]) > tolerance

        if n_numerator == 0:
            numerator += 1
        if n_denominator == 0:
            denominator += 1

        prev_in_diff = (
            abs(sequence[window_size] - sequence[offset + window_size]) > tolerance
        )
        for idx in range(1, size - offset - window_size):
            out_diff = (
                abs(sequence[idx - 1] - sequence[idx + offset - 1]) > tolerance
            )
            in_diff = (
                abs(
                    sequence[idx + window_size]
                    - sequence[idx + offset + window_size]
                )
                > tolerance
            )
            n_numerator += in_diff - out_diff
            n_denominator += prev_in_diff - out_diff
            prev_in_diff = in_diff

            if n_numerator == 0:
                numerator += 1
            if n_denominator == 0:
                denominator += 1

        # one extra idx for the denominator
        idx = size - offset - window_size
        out_diff = (
            abs(
                sequence[idx - 1]
                - sequence[size - window_size - 1]
            )
            > tolerance
        )
        n_denominator = n_denominator - out_diff + prev_in_diff
        if n_denominator == 0:
            denominator += 1

    # one extra offset for the denominator
    offset = size - window_size
    n_denominator = 0
    for idx in range(window_size):
        n_denominator += abs(sequence[idx] - sequence[idx + offset]) > tolerance
    if n_denominator == 0:
        denominator += 1

    return -log(numerator / denominator)
@raphaelvallat raphaelvallat added the enhancement New feature or request label Jan 18, 2023
@raphaelvallat
Copy link
Owner

Hi @FirefoxMetzger,

Thanks for opening the issue. This sounds great. Two questions:

  • Would adding your C implementation to antropy require any new dependencies?
  • Have you validated that the output is the same as antropy and/or MNE-features?

Assuming that the answer to these questions is No and Yes, then please do feel free to work on a PR!

Thanks,
Raphael

@FirefoxMetzger
Copy link
Contributor Author

Would adding your C implementation to antropy require any new dependencies?

No additional dependencies beyond numpy. That said, I would implement this in numba for this repo since we currently don't have a pipeline to build and distribute C code. The main novelty in my version of the kernel is that I use a different set of tricks to count matching windows which is more efficient in reusing intermediate results and thus reduces the constant overhead. This improvement should also be visible when implementing in numba instead of pure C.

Have you validated that the output is the same as antropy and/or MNE-features?

I just compared it to antropy and, unfortunately, it produces different results. That said, it does produce the same result as my naive (but easy to read) reference implementation, which I used to test my algorithm against:

from local._lib.transforms import sample_entropy as c_sampleEn
from antropy.entropy import _numba_sampen, _app_samp_entropy
import numpy as np
from math import log


def reference_implementation(sequence, window_size, tolerance):
    """Easy to read but slow reference implementation."""
    sequence = np.asarray(sequence)
    size = sequence.size

    numerator = 0
    for idxA in range(size - (window_size + 1) + 1):
        for idxB in range(size - (window_size + 1) + 1):
            if idxA == idxB:
                continue

            winA = sequence[idxA : (idxA + (window_size + 1))]
            winB = sequence[idxB : (idxB + (window_size + 1))]

            if np.max(np.abs(winA - winB)) <= tolerance:
                numerator += 1

    denominator = 0
    for idxA in range(size - window_size + 1):
        for idxB in range(size - window_size + 1):
            if idxA == idxB:
                continue

            winA = sequence[idxA : (idxA + window_size)]
            winB = sequence[idxB : (idxB + window_size)]

            if np.max(np.abs(winA - winB)) <= tolerance:
                denominator += 1

    return -log(numerator / denominator)


def mne_sampleEn(data, window):
    phi = _app_samp_entropy(data, order=window, approximate=False)
    return -np.log(np.divide(phi[1], phi[0]))

def antropy_sampleEn(data, window):
    return _numba_sampen(data, order=window, r=(0.2 * np.std(data)))


# compile numba
data = np.random.randint(0, 50, 10).astype(float)
antropy_sampleEn(data, 2)

test_sequence = np.array([0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1,], dtype=float)
test_std = np.std(test_sequence)

reference_value = reference_implementation(test_sequence, 2, .2*test_std)
c_result = c_sampleEn(test_sequence, 2, .2*test_std)
antropy_result = antropy_sampleEn(test_sequence, test_std)
mne_result = mne_sampleEn(test_sequence, 2)

np.testing.assert_almost_equal(c_result, reference_value)
np.testing.assert_allclose(c_result, antropy_result)

@raphaelvallat Could I trouble you to have a look at the reference implementation (code above) and see if you spot a mistake in it?

@raphaelvallat
Copy link
Owner

This improvement should also be visible when implementing in numba instead of pure C.

Got it.

Could I trouble you to have a look at the reference implementation (code above) and see if you spot a mistake in it?

I am out of office and unavailable until mid-February so this is unfortunately going to have to wait, but yes I can dive into it after this date.

Thanks again,
Raphael

@FirefoxMetzger
Copy link
Contributor Author

I am out of office and unavailable until mid-February so this is unfortunately going to have to wait, but yes I can dive into it after this date.

Nice, thanks!

Just to avoid confusion, I'm talking about the code from this comment: #22 (comment) . It tries to be a 1:1 implementation of the definition so that a human can quickly find any deviations. Once we have that right, I can use fuzzy testing to compare it with my optimized (and less readable) implementation to see if it does the right thing, too.

For reference: One difference that I found is that my code uses distance(templateA, templateB) <= r whereas yours uses distance(templateA; templateB) < r. Wikipedia uses either; <= for the example code and < for the mathematical definition. Unfortunately though, this is not the only difference as results still differ after adjusting that.

@FirefoxMetzger
Copy link
Contributor Author

FirefoxMetzger commented Jan 19, 2023

@raphaelvallat I think antropy's current implementation is off by one.

(For reference, the algorithm works by counting all pairs of windows/templates (except the self-match) for which the Chebyshev distance between the windows is smaller than r. We do this for two window sizes (order and order+1), which antropy's implementation tracks in variables a and b. a is the number of order+1-length pairs and b is the number of order-length pairs.)

Consider a window/template size of order=2 and the example sequence: [1., 7., 0., 8., 8., 0., 9., 2., 8., 8.], which has r=.2*std(sequence)=.496.... A value of r=.49... < 1 means that we only count a pair if it is an exact match.

If we now compute a for the sequence above, we look at all pairs of size 3 windows into the array and count the number of pairs that are identical:

# all size 3 windows
array([[1., 7., 0.],
       [7., 0., 8.],
       [0., 8., 8.],
       [8., 8., 0.],
       [8., 0., 9.],
       [0., 9., 2.],
       [9., 2., 8.],
       [2., 8., 8.]])

No two rows are identical, and hence a = 0. This works correctly across all implementations (antropy, reference, and my C kernel). If we then compute b we do the same, but for all size 2 windows into the array:

# all size 2 windows
array([[1., 7.],
       [7., 0.],
       [0., 8.],
       [8., 8.],
       [8., 0.],
       [0., 9.],
       [9., 2.],
       [2., 8.],
       [8., 8.]])

We find a match for the pair [3, :] and [8, :] (with value (8,8)) and one for it's inverse. The optimized implementation of antropy doesn't count inverse pairs (they always exist, so we count it once and multiply by 2 in the end), which means we would expect it to produce a=1. However, it currently produces a=0 (whereas my reference and my C kernel produce a=1).

I've tried this for different sequences, and whenever there is a mismatch between my implementation and antropy's it is because of matches with the last order-length window which causes a difference in the count of b.


Looking into antropy's wrapper around MNE-features, I noticed that it explicitly excludes the last window before computing order-length pairs:

(note that approximate=True is hardcoded for sample entropy)

antropy/antropy/entropy.py

Lines 384 to 389 in dc84ed1

# compute phi(order, r)
_emb_data1 = _embed(x, order, 1)
if approximate:
emb_data1 = _emb_data1
else:
emb_data1 = _emb_data1[:-1]

We can use MNE-feature's implementation to compute a and b: count1 is the number of order-length pairs including the self match and inverse matches. Similarly for count2. We can use this to work out MNE's counts for a and b via:

a = (count1 - 1) / 2
b = (count2 - 1) / 2

If we do that then this confirms the difference explained above: If we exclude the last window, we get antropy's counts of order-length windows (b), if we don't we get my reference implementation's count of order-length windows.


I'm actually not sure which approach is correct. Do we have to exclude the last window in our computation? If so, why?

I think we shouldn't, but I do see the argument for "if we do then we have the same number of windows for a and b". Is this correct/wise though?


Edit: I have excluded counts for the last order-length window and now my implementation(s) match the results produced by antropy.

@raphaelvallat
Copy link
Owner

Hey @FirefoxMetzger — Thanks for the deep dive into this, and apologies about my delayed response.

To be honest, I am not sure whether the last window should be excluded or not. Looking at the neurokit2 implementation, I see that the last window is also explicitly excluded:

https://github.com/neuropsychology/NeuroKit/blob/8c564938c5fc2b4a377fd89bb14ac07a430983b9/neurokit2/complexity/utils.py#L106-L107

Perhaps @DominiqueMakowski might be able to chime in on this issue?

FYI I have also ran some tests and for larger arrays (N>100), the exclusion/inclusion of the last window has very little impact on the final entropy value.

PS2: neurokit2 uses r = 0.2 * np.std(x, ddof=1) while antropy uses ddof=1. Makes little difference on the output but I think it would be good to homogenize the implementations. I'm happy to switch to ddof=1.

@DominiqueMakowski
Copy link

If I remember, I think that's because sample entropy compares a time-delayed embedding to its m+1 version, for which the last row is unavailable (because of the bigger dimension), hence we remove it from the first so that it matches in terms of array shape.

Hope that helps :)

Regarding the homogenization, yeah I'd definitely agree that its a good idea (the discrepancies across the implementations kinda were the starting point for NeuroKit's version 😅 )

@raphaelvallat
Copy link
Owner

Thanks @DominiqueMakowski!

@FirefoxMetzger I think that for now we can keep the removal of the last row, feel free to submit a PR with your improved numba implementation!

@FirefoxMetzger
Copy link
Contributor Author

FirefoxMetzger commented Mar 7, 2023

@raphaelvallat Thanks for the update :) I won't get a chance to work on it this week, but should be able to get it going in the coming one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants