Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Type of variable 'argmax' cannot be determined #5547

Closed
LostInDarkMath opened this issue Apr 10, 2020 · 7 comments
Labels
question Notes an issue as a question

Comments

@LostInDarkMath
Copy link

Hi,
I try to speed up my python code by using numba. But it fails with the following error message:

Traceback (most recent call last):
  File "E:/Studium/Masterarbeit/Masterarbeit/code/fast_simulation.py", line 152, in <module>
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
  File "C:\Python\lib\site-packages\numba\dispatcher.py", line 401, in _compile_for_args
    error_rewrite(e, 'typing')
  File "C:\Python\lib\site-packages\numba\dispatcher.py", line 344, in error_rewrite
    reraise(type(e), e, None)
  File "C:\Python\lib\site-packages\numba\six.py", line 668, in reraise
    raise value.with_traceback(tb)
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Type of variable 'argmax' cannot be determined, operation: $300unpack_sequence.5, location: E:/Studium/Masterarbeit/Masterarbeit/code/fast_simulation.py (126)

File "fast_simulation.py", line 126:
def monte_carlo(n: int, m: int, alpha: float, epsilon_max: float, delta: float, aufloesung: int, x_position: float):
    <source elided>

            argmax, max_value = get_max_ks(random_values)

My code is

@njit
def insort(a, x, lo=0, hi=None):
    """Insert item x in list a, and keep it sorted assuming a is sorted.
    If x is already in a, insert it to the right of the rightmost x.
    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if x < a[mid]:
            hi = mid
        else:
            lo = mid+1
    a.insert(lo, x)

@njit
def get_max_ks(data: list) -> (float, float):
    def f(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x)

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = f(data_i, data)
        fx2 = f(data_i + epsilon, data)
        fx3 = f(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value

@njit
def get_max_vn(data: list) -> (float, float):
    def g(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x) / sqrt(
            x * (1 - x))

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = g(data_i, data)
        fx2 = g(data_i + epsilon, data)
        fx3 = g(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value


@njit
def get_critical_value_vn(alpha: float, n: int) -> float:
    loglogn = log(log(n))
    an = sqrt(2 * loglogn)
    dn = 2 * loglogn + 0.5 * log(loglogn) - 0.5 * log(pi)
    return (dn - log(-0.5 * log(1 - alpha))) / an


@njit
def monte_carlo(n: int, m: int, alpha: float, epsilon_max: float, delta: float, aufloesung: int, x_position: float):
    epsilons = np.linspace(min(0.0, epsilon_max), max(0.0, epsilon_max), aufloesung)
    res_ks = np.zeros(n)
    res_vn = np.zeros(n)
    res_ln = np.zeros(n)
    ks_critical_value = 1.2238478702170836  # TODO
    vn_critical_value = get_critical_value_vn(alpha, n)
    ln_critical_value = 2.7490859400901955  # TODO

    for epsilon in epsilons:
        for i in range(m):
            uniform_distributed_values = np.random.uniform(0.0, 1.0, n)
            random_values = []
            for x in uniform_distributed_values:
                if x < max(0, x_position - delta) or x > min(1, x_position + delta):
                    insort(random_values, x)
                elif max(0, x_position - delta) <= x and x <= x_position + epsilon:
                    insort(random_values,
                                  (x - x_position - epsilon) / (1 + (epsilon / min(delta, x_position))) + x_position)
                else:
                    insort(random_values,
                                  (x - x_position - epsilon) / (
                                              1 - (epsilon / min(delta, 1 - x_position))) + x_position)

            argmax, max_value = get_max_ks(random_values)
            vn_argmax, vn_max = get_max_vn(random_values)
            ks_statistic = max_value
            vn_statistic = vn_max
            ln_statistic = max_value / sqrt(argmax * (1 - argmax))

            if ks_statistic > ks_critical_value:  # if test dismisses H_0
                res_ks[i] += 1 / m
            if vn_statistic > vn_critical_value:  # if test dismisses H_0
                res_vn[i] += 1 / m
            if ln_statistic > ln_critical_value:  # if test dismisses H_0
                res_ln[i] += 1 / m
    return epsilons, res_ks, res_vn, res_ln


if __name__ == '__main__':
    # some code
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
    # some other code

How I can fix this? I don't understand the error message.
If it was not right to open aan issue for my problem, I apologize.
I didn't know how to help myself anymore.

@stuartarchibald
Copy link
Contributor

Thanks for the report. The error message is trying to tell you that the "type" of the variable "argmax" cannot be worked out from the code in its present state. I think it is essentially a convoluted variation of this problem http://numba.pydata.org/numba-doc/dev/user/troubleshoot.html#my-code-has-an-untyped-list-problem

In the code, random_values is declared as an empty list, and then there is no use of it in the scope that would hint at the type of the elements in the list, so the type inference algorithms in numba cannot work out what the type of the data in it will be when it goes into get_max_ks. get_max_ks then uses the items of data for computing argmax, but again, numba doesn't know what the type of data is so argmax ends up being of unknown type and this is what eventually breaks the type inference alg.

The fix is to use a typed.List container, these are a good replacement for reflected lists, http://numba.pydata.org/numba-doc/dev/reference/pysupported.html#typed-list and also permit direct specification of the type of the data the list will hold.

Here's some fixed code:

from numba import njit, types, typed
import numpy as np
from math import sqrt, log, pi

@njit
def insort(a, x, lo=0, hi=None):
    """Insert item x in list a, and keep it sorted assuming a is sorted.
    If x is already in a, insert it to the right of the rightmost x.
    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if x < a[mid]:
            hi = mid
        else:
            lo = mid+1
    a.insert(lo, x)

@njit
def get_max_ks(data: list) -> (float, float):
    def f(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x)

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = f(data_i, data)
        fx2 = f(data_i + epsilon, data)
        fx3 = f(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value

@njit
def get_max_vn(data: list) -> (float, float):
    def g(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x) / sqrt(
            x * (1 - x))

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = g(data_i, data)
        fx2 = g(data_i + epsilon, data)
        fx3 = g(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value


@njit
def get_critical_value_vn(alpha: float, n: int) -> float:
    loglogn = log(log(n))
    an = sqrt(2 * loglogn)
    dn = 2 * loglogn + 0.5 * log(loglogn) - 0.5 * log(pi)
    return (dn - log(-0.5 * log(1 - alpha))) / an


@njit
def monte_carlo(n: int, m: int, alpha: float, epsilon_max: float, delta: float, aufloesung: int, x_position: float):
    epsilons = np.linspace(min(0.0, epsilon_max), max(0.0, epsilon_max), aufloesung)
    res_ks = np.zeros(n)
    res_vn = np.zeros(n)
    res_ln = np.zeros(n)
    ks_critical_value = 1.2238478702170836  # TODO
    vn_critical_value = get_critical_value_vn(alpha, n)
    ln_critical_value = 2.7490859400901955  # TODO

    for epsilon in epsilons:
        for i in range(m):
            uniform_distributed_values = np.random.uniform(0.0, 1.0, n)
            random_values = typed.List.empty_list(types.float64)
            # ^---------------- Use a typed list!
            for x in uniform_distributed_values:
                if x < max(0, x_position - delta) or x > min(1, x_position + delta):
                    insort(random_values, x)
                elif max(0, x_position - delta) <= x and x <= x_position + epsilon:
                    insort(random_values,
                                  (x - x_position - epsilon) / (1 + (epsilon / min(delta, x_position))) + x_position)
                else:
                    insort(random_values,
                                  (x - x_position - epsilon) / (
                                              1 - (epsilon / min(delta, 1 - x_position))) + x_position)

            argmax, max_value = get_max_ks(random_values)
            vn_argmax, vn_max = get_max_vn(random_values)
            ks_statistic = max_value
            vn_statistic = vn_max
            ln_statistic = max_value / sqrt(argmax * (1 - argmax))

            if ks_statistic > ks_critical_value:  # if test dismisses H_0
                res_ks[i] += 1 / m
            if vn_statistic > vn_critical_value:  # if test dismisses H_0
                res_vn[i] += 1 / m
            if ln_statistic > ln_critical_value:  # if test dismisses H_0
                res_ln[i] += 1 / m
    return epsilons, res_ks, res_vn, res_ln


if __name__ == '__main__':
    # some code
    n = 1
    m = 1
    alpha = 1.0
    epsilon_max= 1.0
    delta= 1.0
    aufloesung = 1
    fehlerposition = 1.0
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
    # some other code

hope this helps?!

@stuartarchibald stuartarchibald added the question Notes an issue as a question label Apr 13, 2020
@LostInDarkMath
Copy link
Author

Yes, it finally works! Thank you so much @stuartarchibald !

Some last question (which has nothing to do with this issue here, you can close this): The speed up isn't as great as I expected. Do you have some tips for me how I could make this specific piece of code faster?

@LostInDarkMath
Copy link
Author

LostInDarkMath commented Apr 13, 2020

I was happy too early.

Python crashed completly with your Code:
crash

I just copy & pasted your code and changed the parameters as follows:

if __name__ == '__main__':
    n = 200
    m = 10000
    alpha = 0.1
    epsilon_max = 0.1
    delta = 1.0
    aufloesung = 30
    fehlerposition = 0.5
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
    print(epsis)
    print(ks)
    print(vn)
    print(ln)

None of these print statements is executed correctly.

@stuartarchibald
Copy link
Contributor

@LostInDarkMath If the JIT is disabled and the code runs solely in Python and NumPy it does this:

$ NUMBA_DISABLE_JIT=1 python issue5547.py 
Traceback (most recent call last):
  File "issue5547.py", line 131, in <module>
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
  File "issue5547.py", line 117, in monte_carlo
    res_vn[i] += 1 / m
IndexError: index 201 is out of bounds for axis 0 with size 200

which suggests there's an out of bounds access which is probably manifesting as an access violation on your windows machine.

@stuartarchibald
Copy link
Contributor

@LostInDarkMath can we close this as resolved? Thanks!

@LostInDarkMath
Copy link
Author

LostInDarkMath commented Apr 13, 2020

Thanks! That was a bug in my code.

Yes, I'm happy now :)

@stuartarchibald
Copy link
Contributor

Great, thanks for reporting back... Closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Notes an issue as a question
Projects
None yet
Development

No branches or pull requests

2 participants