numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Type of variable 'argmax' cannot be determined #5547

LostInDarkMath · 2020-04-10T19:50:34Z

Hi,
I try to speed up my python code by using numba. But it fails with the following error message:

Traceback (most recent call last):
  File "E:/Studium/Masterarbeit/Masterarbeit/code/fast_simulation.py", line 152, in <module>
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
  File "C:\Python\lib\site-packages\numba\dispatcher.py", line 401, in _compile_for_args
    error_rewrite(e, 'typing')
  File "C:\Python\lib\site-packages\numba\dispatcher.py", line 344, in error_rewrite
    reraise(type(e), e, None)
  File "C:\Python\lib\site-packages\numba\six.py", line 668, in reraise
    raise value.with_traceback(tb)
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Type of variable 'argmax' cannot be determined, operation: $300unpack_sequence.5, location: E:/Studium/Masterarbeit/Masterarbeit/code/fast_simulation.py (126)

File "fast_simulation.py", line 126:
def monte_carlo(n: int, m: int, alpha: float, epsilon_max: float, delta: float, aufloesung: int, x_position: float):
    <source elided>

            argmax, max_value = get_max_ks(random_values)

My code is

@njit
def insort(a, x, lo=0, hi=None):
    """Insert item x in list a, and keep it sorted assuming a is sorted.
    If x is already in a, insert it to the right of the rightmost x.
    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if x < a[mid]:
            hi = mid
        else:
            lo = mid+1
    a.insert(lo, x)

@njit
def get_max_ks(data: list) -> (float, float):
    def f(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x)

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = f(data_i, data)
        fx2 = f(data_i + epsilon, data)
        fx3 = f(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value

@njit
def get_max_vn(data: list) -> (float, float):
    def g(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x) / sqrt(
            x * (1 - x))

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = g(data_i, data)
        fx2 = g(data_i + epsilon, data)
        fx3 = g(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value


@njit
def get_critical_value_vn(alpha: float, n: int) -> float:
    loglogn = log(log(n))
    an = sqrt(2 * loglogn)
    dn = 2 * loglogn + 0.5 * log(loglogn) - 0.5 * log(pi)
    return (dn - log(-0.5 * log(1 - alpha))) / an


@njit
def monte_carlo(n: int, m: int, alpha: float, epsilon_max: float, delta: float, aufloesung: int, x_position: float):
    epsilons = np.linspace(min(0.0, epsilon_max), max(0.0, epsilon_max), aufloesung)
    res_ks = np.zeros(n)
    res_vn = np.zeros(n)
    res_ln = np.zeros(n)
    ks_critical_value = 1.2238478702170836  # TODO
    vn_critical_value = get_critical_value_vn(alpha, n)
    ln_critical_value = 2.7490859400901955  # TODO

    for epsilon in epsilons:
        for i in range(m):
            uniform_distributed_values = np.random.uniform(0.0, 1.0, n)
            random_values = []
            for x in uniform_distributed_values:
                if x < max(0, x_position - delta) or x > min(1, x_position + delta):
                    insort(random_values, x)
                elif max(0, x_position - delta) <= x and x <= x_position + epsilon:
                    insort(random_values,
                                  (x - x_position - epsilon) / (1 + (epsilon / min(delta, x_position))) + x_position)
                else:
                    insort(random_values,
                                  (x - x_position - epsilon) / (
                                              1 - (epsilon / min(delta, 1 - x_position))) + x_position)

            argmax, max_value = get_max_ks(random_values)
            vn_argmax, vn_max = get_max_vn(random_values)
            ks_statistic = max_value
            vn_statistic = vn_max
            ln_statistic = max_value / sqrt(argmax * (1 - argmax))

            if ks_statistic > ks_critical_value:  # if test dismisses H_0
                res_ks[i] += 1 / m
            if vn_statistic > vn_critical_value:  # if test dismisses H_0
                res_vn[i] += 1 / m
            if ln_statistic > ln_critical_value:  # if test dismisses H_0
                res_ln[i] += 1 / m
    return epsilons, res_ks, res_vn, res_ln


if __name__ == '__main__':
    # some code
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
    # some other code

How I can fix this? I don't understand the error message.
If it was not right to open aan issue for my problem, I apologize.
I didn't know how to help myself anymore.

The text was updated successfully, but these errors were encountered:

stuartarchibald · 2020-04-13T08:47:16Z

Thanks for the report. The error message is trying to tell you that the "type" of the variable "argmax" cannot be worked out from the code in its present state. I think it is essentially a convoluted variation of this problem http://numba.pydata.org/numba-doc/dev/user/troubleshoot.html#my-code-has-an-untyped-list-problem

In the code, random_values is declared as an empty list, and then there is no use of it in the scope that would hint at the type of the elements in the list, so the type inference algorithms in numba cannot work out what the type of the data in it will be when it goes into get_max_ks. get_max_ks then uses the items of data for computing argmax, but again, numba doesn't know what the type of data is so argmax ends up being of unknown type and this is what eventually breaks the type inference alg.

The fix is to use a typed.List container, these are a good replacement for reflected lists, http://numba.pydata.org/numba-doc/dev/reference/pysupported.html#typed-list and also permit direct specification of the type of the data the list will hold.

Here's some fixed code:

from numba import njit, types, typed
import numpy as np
from math import sqrt, log, pi

@njit
def insort(a, x, lo=0, hi=None):
    """Insert item x in list a, and keep it sorted assuming a is sorted.
    If x is already in a, insert it to the right of the rightmost x.
    Optional args lo (default 0) and hi (default len(a)) bound the
    slice of a to be searched.
    """

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if x < a[mid]:
            hi = mid
        else:
            lo = mid+1
    a.insert(lo, x)

@njit
def get_max_ks(data: list) -> (float, float):
    def f(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x)

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = f(data_i, data)
        fx2 = f(data_i + epsilon, data)
        fx3 = f(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value

@njit
def get_max_vn(data: list) -> (float, float):
    def g(x, data):
        return sqrt(len(data)) * abs(np.searchsorted(data, x, side='right') / n - x) / sqrt(
            x * (1 - x))

    max_value = -1000
    argmax = 0.0
    epsilon = 1 / (n * 10 ** 6)
    for i in range(n):
        data_i = data[i]
        fx1 = g(data_i, data)
        fx2 = g(data_i + epsilon, data)
        fx3 = g(data_i - epsilon, data)
        if fx1 > max_value:
            max_value = fx1
            argmax = data_i
        if fx2 > max_value:
            max_value = fx2
            argmax = data_i + epsilon
        if fx3 > max_value:
            max_value = fx3
            argmax = data_i - epsilon
    return argmax, max_value


@njit
def get_critical_value_vn(alpha: float, n: int) -> float:
    loglogn = log(log(n))
    an = sqrt(2 * loglogn)
    dn = 2 * loglogn + 0.5 * log(loglogn) - 0.5 * log(pi)
    return (dn - log(-0.5 * log(1 - alpha))) / an


@njit
def monte_carlo(n: int, m: int, alpha: float, epsilon_max: float, delta: float, aufloesung: int, x_position: float):
    epsilons = np.linspace(min(0.0, epsilon_max), max(0.0, epsilon_max), aufloesung)
    res_ks = np.zeros(n)
    res_vn = np.zeros(n)
    res_ln = np.zeros(n)
    ks_critical_value = 1.2238478702170836  # TODO
    vn_critical_value = get_critical_value_vn(alpha, n)
    ln_critical_value = 2.7490859400901955  # TODO

    for epsilon in epsilons:
        for i in range(m):
            uniform_distributed_values = np.random.uniform(0.0, 1.0, n)
            random_values = typed.List.empty_list(types.float64)
            # ^---------------- Use a typed list!
            for x in uniform_distributed_values:
                if x < max(0, x_position - delta) or x > min(1, x_position + delta):
                    insort(random_values, x)
                elif max(0, x_position - delta) <= x and x <= x_position + epsilon:
                    insort(random_values,
                                  (x - x_position - epsilon) / (1 + (epsilon / min(delta, x_position))) + x_position)
                else:
                    insort(random_values,
                                  (x - x_position - epsilon) / (
                                              1 - (epsilon / min(delta, 1 - x_position))) + x_position)

            argmax, max_value = get_max_ks(random_values)
            vn_argmax, vn_max = get_max_vn(random_values)
            ks_statistic = max_value
            vn_statistic = vn_max
            ln_statistic = max_value / sqrt(argmax * (1 - argmax))

            if ks_statistic > ks_critical_value:  # if test dismisses H_0
                res_ks[i] += 1 / m
            if vn_statistic > vn_critical_value:  # if test dismisses H_0
                res_vn[i] += 1 / m
            if ln_statistic > ln_critical_value:  # if test dismisses H_0
                res_ln[i] += 1 / m
    return epsilons, res_ks, res_vn, res_ln


if __name__ == '__main__':
    # some code
    n = 1
    m = 1
    alpha = 1.0
    epsilon_max= 1.0
    delta= 1.0
    aufloesung = 1
    fehlerposition = 1.0
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
    # some other code

hope this helps?!

LostInDarkMath · 2020-04-13T09:32:27Z

Yes, it finally works! Thank you so much @stuartarchibald !

Some last question (which has nothing to do with this issue here, you can close this): The speed up isn't as great as I expected. Do you have some tips for me how I could make this specific piece of code faster?

LostInDarkMath · 2020-04-13T09:46:48Z

I was happy too early.

Python crashed completly with your Code:

I just copy & pasted your code and changed the parameters as follows:

if __name__ == '__main__':
    n = 200
    m = 10000
    alpha = 0.1
    epsilon_max = 0.1
    delta = 1.0
    aufloesung = 30
    fehlerposition = 0.5
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
    print(epsis)
    print(ks)
    print(vn)
    print(ln)

None of these print statements is executed correctly.

stuartarchibald · 2020-04-13T09:58:27Z

@LostInDarkMath If the JIT is disabled and the code runs solely in Python and NumPy it does this:

$ NUMBA_DISABLE_JIT=1 python issue5547.py 
Traceback (most recent call last):
  File "issue5547.py", line 131, in <module>
    epsis, ks, vn, ln = monte_carlo(n, m, alpha, epsilon_max, delta, aufloesung, fehlerposition)
  File "issue5547.py", line 117, in monte_carlo
    res_vn[i] += 1 / m
IndexError: index 201 is out of bounds for axis 0 with size 200

which suggests there's an out of bounds access which is probably manifesting as an access violation on your windows machine.

stuartarchibald · 2020-04-13T11:00:37Z

@LostInDarkMath can we close this as resolved? Thanks!

LostInDarkMath · 2020-04-13T11:01:02Z

Thanks! That was a bug in my code.

Yes, I'm happy now :)

stuartarchibald · 2020-04-13T11:03:58Z

Great, thanks for reporting back... Closing!

stuartarchibald added the question Notes an issue as a question label Apr 13, 2020

stuartarchibald closed this as completed Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Type of variable 'argmax' cannot be determined #5547

numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Type of variable 'argmax' cannot be determined #5547

LostInDarkMath commented Apr 10, 2020

stuartarchibald commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020 •

edited

stuartarchibald commented Apr 13, 2020

stuartarchibald commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020 •

edited

stuartarchibald commented Apr 13, 2020

numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Type of variable 'argmax' cannot be determined #5547

numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Type of variable 'argmax' cannot be determined #5547

Comments

LostInDarkMath commented Apr 10, 2020

stuartarchibald commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020 • edited

stuartarchibald commented Apr 13, 2020

stuartarchibald commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020 • edited

stuartarchibald commented Apr 13, 2020

LostInDarkMath commented Apr 13, 2020 •

edited

LostInDarkMath commented Apr 13, 2020 •

edited