Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba does not error out when an if statement does not account for all conditions #7327

Open
e-dvl opened this issue Aug 20, 2021 · 8 comments

Comments

@e-dvl
Copy link

e-dvl commented Aug 20, 2021

  • [Yes] I have tried using the latest released version of Numba (most recent is
    visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).

  • [Yes] I have included a self contained code sample to reproduce the problem.
    i.e. it's possible to run as 'python bug.py'.

I discovered that I had a hanging if statement which, when Numba was disabled, encountered a condition that should have been covered by an 'else' part of the conditional but wasn't, causing the function to fail. Numba, however, never threw an error, which led me to investigate why.
It turns out that Numba will implicitly assign and return the variable. I tried searching to see if this is expected/designed behavior but couldn't find anything.

I want to show that a variable will be successfully returned even in the event of a hanging if statement (incompletely enumerated conditions)

import numba
from numba import njit, types
import numpy as np

@njit(parallel=False)
def reproduce_bug_bool(test_condition_bool):

    if test_condition_bool:
        test_var = True
        
    return test_var

@njit(parallel=False)
def reproduce_bug_float(test_condition_bool):

    if test_condition_bool:
        test_var = 10.0
        
    return test_var

@njit(parallel=False)
def reproduce_bug_int(test_condition_bool):

    if test_condition_bool:
        test_var = 10
        
    return test_var

num_trials = 10
test_cases = np.asarray([True, False, True, False, True, False, True, False, True, False])

for t in range(num_trials):
    print("Test case: ", test_cases[t])
    returned_bool = reproduce_bug_bool(test_cases[t])
    returned_float = reproduce_bug_float(test_cases[t])
    returned_int = reproduce_bug_int(test_cases[t])
    
    print("Returned bool var: ", returned_bool)
    print("Returned float var: ", returned_float)
    print("Returned int var: ", returned_int)

All results seem both arbitrary and dangerous; I would expect them to throw errors.
In test case 1 (bool), the unaccounted-for condition implicitly takes on the same value as in the accounted-for condition (True). (I originally ran my code in version 0.46, and the value came back as False, which seemed dangerous but plausibly correct.)
In test case 2 (float) the unaccounted-for condition implicitly takes on the value of 0.0, rather arbitrarily.
In test case 3 (int), the unaccounted-for condition implicitly takes on the same value as in the accounted-for condition (10).

@e-dvl e-dvl closed this as completed Aug 20, 2021
@e-dvl e-dvl reopened this Aug 20, 2021
@gmarkall
Copy link
Member

I can reproduce this behaviour. I'd expected to find a duplicate issue but haven't found one so far.

@gmarkall gmarkall added bug - incorrect behavior Bugs: incorrect behavior needtriage labels Aug 20, 2021
@e-dvl
Copy link
Author

e-dvl commented Aug 20, 2021

I was rather surprised not to find this already as well!

@vpagano1
Copy link

I can reproduce the strange behavior as well. I would like to contribute by writing a pull request to fix this problem directly. I believe @sklam is in charge with exception handling... where might the issue be caused?

@gmarkall
Copy link
Member

I can reproduce the strange behavior as well. I would like to contribute by writing a pull request to fix this problem directly. I believe @sklam is in charge with exception handling... where might the issue be caused?

Unfortunately, determining where in the compilation process that this issue is caused will be the first step in writing a PR to fix the issue.

@gmarkall gmarkall added feature_request and removed needtriage bug - incorrect behavior Bugs: incorrect behavior labels Aug 23, 2021
@vpagano1
Copy link

I can reproduce the strange behavior as well. I would like to contribute by writing a pull request to fix this problem directly. I believe @sklam is in charge with exception handling... where might the issue be caused?

Unfortunately, determining where in the compilation process that this issue is caused will be the first step in writing a PR to fix the issue.

Right. So if you try returning test_var without numba an error is thrown:
UnboundLocalError: local variable 'test_var' referenced before assignment.
I will look into the python documentation for the code snippet where this appears, and hopefully the implementation will be similar.

@sklam
Copy link
Member

sklam commented Aug 23, 2021

Tracking unbounded variables

@vpagano1, the handling of unbound variable will be very different from cpython. I tried to find the issue or discussion we had on how to track unbounded variables but I cannot currently find the record. I will try to explain the challenges in tracking unbounded variables in Numba.

In CPython, local variables are kept in the active frame as a list of PyObject*. (See LOAD_FAST, which is the most common case). The PyObject* can be NULL to indicate that it is never defined.

In Numba, local variables are stored as "plain-old data" like in a C program. An int is usually just 8-bytes. A ndarray is a C-structure. However, they are not nullable. To track unbounded variables, the compiler will need to insert additional storage for the "liveness" marker.

Currently, Numba does not track the liveness at runtime. It is always zero-initializing all variables at the start of the function.

Unexpected results

With the zero-initialization, I would expect the result to be stable. If the branch is not taken, all bytes in that variable should be zeros. Somehow, I think it is doing something different. For instance, the boolean function has the following in the cpython wrapper:

  %.33 = call i32 @"_ZN8__main__22reproduce_bug_bool$241Eb"(i8* nonnull %.28, { i8*, i32, i8* }** undef, i8 %.32) #2
  %.58 = call i8* @PyBool_FromLong(i64 1)   ; constant one!?!?!?!
  ret i8* %.58

but the lowered function body has:

define i32 @"_ZN8__main__22reproduce_bug_bool$241Eb"(i8* noalias nocapture %retptr, { i8*, i32, i8* }** noalias nocapture readnone %excinfo, i8 %arg.test_condition_bool) local_unnamed_addr #0 {
entry:
  %.7.not = icmp ne i8 %arg.test_condition_bool, 0
  %.22 = zext i1 %.7.not to i8
  store i8 %.22, i8* %retptr, align 1
  ret i32 0
}

The actual function is correctly computing base on the input. But the wrapper is bypassing the returned value and always returning True. I think we have another bug.

@sklam
Copy link
Member

sklam commented Aug 23, 2021

I found the problem for the unexpected result. Type inference (or constant propagation) is incorrectly assuming the return type to be a constant because there is no other assignment.

Following is output from reproduce_bug_bool.inspect_types()

    # --- LINE 11 ---
    #   jump 10
    # label 10
    #   del $4pred
    #   $12return_value.1 = cast(value=test_var)  :: Literal[bool](True)
    #   del test_var
    #   return $12return_value.1

    return test_var

@dlee992
Copy link
Contributor

dlee992 commented Apr 16, 2024

Liveness analysis is a nice feature for numba.

Shall we consider add a liveness analysis compilation pass for numba? I feel there should exist some mature liveness analysis algorithms. Perhaps we can choose one to implement.

I did meet some segfault issues at runtime, whose root cause is UnboundedError/NamedError in this test case.
I used gdb_init and etc to find the root cause, but it's much slower than if numba directly throws an error during compilation.

Updates:
I found a doc: https://numba.readthedocs.io/en/stable/developer/live_variable_analysis.html

Looks like numba has sth related to this feature in #1611:

Provide warning for undefined variables: live_map for the entry block will contain all required variables. Subtracting that from the available arguments gives all potentially undefined variables. (With "potentially", I mean some variables that are defined only if a particular path is visited; and is undefined in another. raising_usecase1 and raising_usecase2 in test_obj_lifetime.py are examples.)

AH, just found the phi-node in numba SSA has already known some branches are undefined.

y.1 = phi(incoming_values=[Undefined, Var(y, test_name_error.py:14)], incoming_blocks=[0, 6]) ['y', 'y.1']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants