Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounds checking #4432

Merged
merged 39 commits into from Dec 18, 2019
Merged

Bounds checking #4432

merged 39 commits into from Dec 18, 2019

Conversation

@asmeurer
Copy link
Contributor

@asmeurer asmeurer commented Aug 9, 2019

This is still a work in progress, but feedback is welcome. I'm still new to LLVM so let me know if I should be generating the code in a better way.

I have changed the API of get_item_pointer[2] to include the context argument. This is required to raise an exception.

Still TODO:

  • Add tests
  • Add the flag to the public API (right now it is enabled by default for
    testing purposes).
  • Document the flag
  • Figure out why the parallel tests fail with bounds checking enabled
  • Figure out why the cuda tests fail with bounds checking enabled
  • Add a CI run with bounds checking globally enabled
  • See if missing broadcast errors are related to this #4432 (comment)
  • Fix memory leak in the tests
  • Add support for boundschecking fancy indexing

TODOs that should probably wait for a future PR:

  • Make the error message match object mode. This would require being more fancy in the way the exception is generated. For now, if NUMBA_FULL_TRACEBACK=1 is set, the index, shape, and axis are printed as a debug message.
  • Make the error message show the location in the code

I'll need help on how to do the last item.

There is also a boundcheck flag in some places in the code, which doesn't seem to do anything. I have named my flag boundscheck with an s, as that seemed better, but I don't particularly care if you decide another name is better.

asmeurer added 2 commits Aug 9, 2019
This is required to be able to raise exceptions, for instance, for bounds
checking.
So far we need to:

- Add tests
- Make the error message match object mode
- Add the flag to the public API (right now it is enabled by default for
  testing purposes)
numba/cgutils.py Outdated
with if_unlikely(builder, out_of_bounds_upper):
context.call_conv.return_user_exc(builder, IndexError, (msg,))
out_of_bounds_lower = builder.icmp_signed('<', ind, ind.type(0))
with if_unlikely(builder, out_of_bounds_lower):
Copy link
Contributor Author

@asmeurer asmeurer Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I merge these two ifs? I didn't see how to do a boolean OR in the llvmlite documentation.

Copy link
Contributor

@stuartarchibald stuartarchibald Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You possibly want something like this:

            out_of_bounds_upper = builder.icmp_signed('>=', ind, dimlen)
            out_of_bounds_lower = builder.icmp_signed('<', ind, ind.type(0))
            predicate = builder.or_(out_of_bounds_upper, out_of_bounds_lower)
            with if_unlikely(builder, predicate):
                context.call_conv.return_user_exc(builder, IndexError, (msg,))

Docs: http://llvmlite.pydata.org/en/latest/user-guide/ir/ir-builder.html#llvmlite.ir.IRBuilder.or_

Copy link
Contributor Author

@asmeurer asmeurer Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says or_ is bitwise OR (like | instead of ||). I don't know if that matters here. I mainly want to make sure I generate code that LLVM optimizes correctly (how can I check that?).

Copy link
Contributor

@stuartarchibald stuartarchibald Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed bitwise or, suggested that for same net effect, though it will likely generate an or op whereas a logical or would likely generate jump(s). To get something logical or-like (i.e. ||) it's is a bit like what you have already but the branches need nesting so that the second condition is only checked if the first passes, I'm assuming that's what you are thinking of?

You can check the LLVM IR with NUMBA_DUMP_LLVM=1 as an env var, or the .inspect_llvm method on the dispatcher. The assembly can be seen with the .inspect_asm method on the dispatcher. How were you hoping it to optimise?

Copy link
Contributor Author

@asmeurer asmeurer Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the index is already known to be less than the dimension or nonnegative it should optimize out the if.

Copy link
Contributor

@stuartarchibald stuartarchibald Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the LLVM correctly (and I'm not very good at reading LLVM, so I could be wrong), it saves the IndexError as a predefined constant. So that adds a bit of overhead to every function call (though it looks like it does it for every exception, so maybe that's not my problem to solve here). Also it implies that it would be more efficient to have only one return_user_exc here.

I'm not sure how this adds overhead to a function call, any chance you could please expand on this? Thanks.

Copy link
Contributor Author

@asmeurer asmeurer Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think under very specific conditions it might be possible to do this, but all the information would likely have to be constant at compile time. Also, any if from the pattern being added here will likely create a branch in a tight indexing loop and prevent vectorization.

So if I have something like

A = 0
for i in range(x.shape[0]):
    A += x[i]

it won't be able to deduce that 0 <= i < x.shape[0] and so the bounds checking can be optimized away?

I'm not sure how this adds overhead to a function call, any chance you could please expand on this? Thanks.

I think I misread the LLVM. The exceptions are saved as constant pickles on the functions, but are only unpickled when needed. So it adds a memory overhead to the function definition, but that's it. It's probably better to not have the exception there twice but maybe not a huge deal.

More importantly, since the exceptions are precomputed, I don't see how to raise an exception with a dynamic message, namely including the dimension in the error message like NumPy does. Would we need to add support for pickling a function that returns the exception with the given message? Or is this already something that can be done? If it's not easy to do this, I might skip it for this PR, if that's alright.

Copy link
Contributor

@stuartarchibald stuartarchibald Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think under very specific conditions it might be possible to do this, but all the information would likely have to be constant at compile time. Also, any if from the pattern being added here will likely create a branch in a tight indexing loop and prevent vectorization.

So if I have something like

A = 0
for i in range(x.shape[0]):
    A += x[i]

it won't be able to deduce that 0 <= i < x.shape[0] and so the bounds checking can be optimized away?

In theory, yes, but it depends on what code gets generated. I think that if an unsigned comparison operator were used in the bounds check then LLVM is more likely to be able to work it out, if a signed comparison is done then it won't. I just generated the assembly for:

  1. no boundscheck
  2. boundscheck with signed comparison
  3. boundscheck with unsigned comparison

Cases 1. and 3.are reasonably similar but 1. has a deeper loop unroll. 2. has no vectorisation because there's a test/jump in the middle of the loop.

I think more can be done in terms of identifying unsigned ranges and suspect that this is likely to help.

I'm not sure how this adds overhead to a function call, any chance you could please expand on this? Thanks.

I think I misread the LLVM. The exceptions are saved as constant pickles on the functions, but are only unpickled when needed. So it adds a memory overhead to the function definition, but that's it. It's probably better to not have the exception there twice but maybe not a huge deal.

More importantly, since the exceptions are precomputed, I don't see how to raise an exception with a dynamic message, namely including the dimension in the error message like NumPy does. Would we need to add support for pickling a function that returns the exception with the given message? Or is this already something that can be done? If it's not easy to do this, I might skip it for this PR, if that's alright.

Numba stores exception strings as compile time constants so you cannot raise an exception with a dynamically generated string. So yes, skipping it is fine :)

Copy link
Contributor

@stuartarchibald stuartarchibald Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made all range(x) calls unsigned in #4446 and was reminded of the amount of work needed to implement this correctly due to the NumPy casting rule (generalizing as):
uint* <op> int* -> float.

Copy link
Contributor Author

@asmeurer asmeurer Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numba stores exception strings as compile time constants so you cannot raise an exception with a dynamically generated string. So yes, skipping it is fine :)

I think you could serialize a function that creates the IndexError rather than the IndexError itself. Then modify return_user_exc so it can call the function to create the exception before raising it.

The thing I'm not sure about are the builder instructions for converting the index, shape, and axis into strings so they can be put in the message.

numba/cgutils.py Outdated Show resolved Hide resolved
numba/cgutils.py Outdated Show resolved Hide resolved
@stuartarchibald
Copy link
Contributor

@stuartarchibald stuartarchibald commented Aug 12, 2019

There is also a boundcheck flag in some places in the code, which doesn't seem to do anything. I have named my flag boundscheck with an s, as that seemed better, but I don't particularly care if you decide another name is better.

Agree, boundscheck makes sense, I also think renaming the current flag to that and reusing it if convenient would be good, else just remove the current flag as IIRC it's dead.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 12, 2019

I guess this isn't accounting for the numpy shape:

>>> import numba
>>> import numpy
>>> @numba.jit
... def test(x):
...     return x[:, 0]
>>> test(numpy.empty((0, 1)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index is out of bounds

This is the source of the test failures.

Is the numpy shape saved somewhere? If not, I guess we can just disable bounds checks if one of the shapes is 0. We can't get segfaults or nonsense memory values in that case. It mainly would help for type checking.

@stuartarchibald
Copy link
Contributor

@stuartarchibald stuartarchibald commented Aug 15, 2019

I guess this isn't accounting for the numpy shape:

>>> import numba
>>> import numpy
>>> @numba.jit
... def test(x):
...     return x[:, 0]
>>> test(numpy.empty((0, 1)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index is out of bounds

This is the source of the test failures.

Is the numpy shape saved somewhere? If not, I guess we can just disable bounds checks if one of the shapes is 0. We can't get segfaults or nonsense memory values in that case. It mainly would help for type checking.

The shape is an argument to the get_item_pointer2 function, I guess you can generate a branch based on that? I'm not sure that disabling bounds checking on arrays where one of the shape elements is 0 would be desirable, you can still access outside the allocated region and indeed the page, after all it's just a GEP that's being emitted. e.g. with no bounds check:

import numba
import numpy
@numba.jit
def test(x):
    big_index = 10000000000000000
    return x[big_index, big_index]

print(test(numpy.empty((0, 1))))

will likely segfault.


This little snippet might help you with debugging, it lets you print an iterable from the compiled code:

        def print_iterable(builder, name, fmt, iterable):
            printf(builder, "{}\n".format(name))
            for x in iterable:
                printf(builder, fmt, x)
            printf(builder, "\n")

        print_iterable(builder, "shape", "%d, ", shape)
        print_iterable(builder, "indices", "%d, ", indices)

if you paste that into the if boundscheck: branch body and run the failing example above you should see:

shape
0, 1, 
indices
0, 0, 
Traceback (most recent call last):
  File "test_4432.py", line 21, in <module>
    print(test(numpy.empty((0, 1))))
IndexError: index is out of bounds branch1

To accommodate the 0d case I suspect a:

builder.if_else(zero_d_predicate) as (then, otherwise):
  with then:
    <stuff>
  with otherwise:
    <stuff>

might be useful?

Hope this helps?

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 15, 2019

Oh the problem isn't that it's removing the numpy shape as I was assuming. It's that bounds checking for an empty dimension shouldn't do the index >= dimension check.

Also regarding the error thing, I hadn't realized that it reuses the pickled exception if it is used multiple times. I had seen it twice in there but only because I had different error messages for the upper and lower out of bounds at the time. So I was worried that you would get 2n pickled exceptions saved to the compiled function for n array accesses, but that isn't the case.

Thanks for the debug suggestion. I didn't notice the printf function.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 15, 2019

Can we distinguish these two cases inside numba?

>>> a = np.empty((0, 1))
>>> a[:, 0]
array([], dtype=float64)
>>> a[0, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 0 is out of bounds for axis 0 with size 0

The above printf code shows the shape as (0, 1) and indices as (0, 0) for both.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 15, 2019

This code does not segfault for me (in numba master)

from numba import njit
@njit
def test(x):
    return x[:, 10000000000000000000]
import numpy as np
a = np.empty((0, 1))
test(a)

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 15, 2019

I think what we have to do is to implement separate bounds checking in basic_indexing() in arrayobj.py (and possibly other places if they deal with slices as well). That function already does its own wraparound logic (it's the reason there is a flag for it in get_item_pointer, so it can avoid doing it twice), so it's reasonable for it to do bounds checking as well. As far as I can tell the input to get_item_pointer for a[:, 0] and a[0, 0] is identical. Only basic_indexing() knows when the index is a slice, and hence can be used on an empty dimension.

I don't actually understand how numpy handles shape-0 arrays. Does it just create a placeholder pointer?

In NumPy with a = np.empty((0, 1)), a[0, 0] raises IndexError, but a[:, 0]
does not.

It is impossible to distinguish these two cases in get_item_pointer, which
both want a pointer to the "start" of the empty array. So I have factored out
the bounds checking into a helper function, and called it separately in
basic_indexing() (the same as is already done with wraparound correction).
That way, we turn of bounds checking for slice indices on shape 0 axes. Note
that slice indices are already not bounds checked from above, e.g., a[0:100]
works even if a has fewer than 100 elements. It works this way even on Python
lists.
@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 15, 2019

I've implemented that in the latest commit.

asmeurer added 5 commits Aug 15, 2019
Python/NumPy doesn't do any bounds checking on slices.
This is only temporary, until we can include it in the error message proper. I
have enabled it when NUMBA_FULL_TRACEBACKS is set (I don't want to add a new
debug flag for it since it should hopefully be removed soon).
@stuartarchibald
Copy link
Contributor

@stuartarchibald stuartarchibald commented Aug 16, 2019

>>> a = np.empty((0, 1))
>>> a[:, 0]
array([], dtype=float64)
>>> a[0, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 0 is out of bounds for axis 0 with size 0

By the time we get to GEP I think it's probably too late, the contextual slice info is lost. I think this is because a slice looks like accessing index 0, i.e. "where does the data start for a slice == what is index 0". This https://github.com/numba/numba/pull/4432/files#diff-36c9ab8fa31f756e0da52905db0a95deR717-R722 is just working out where in data block the relevant data is.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Aug 16, 2019

I figured out that issue (see my most recent comments/commits).

Right now I'm trying to figure out the test failures, which are presently failing because bounds checking is still enabled by default. It looks like there are some issues with both cuda and parallel.

Here's an example:

from numba import njit
import numpy as np

@njit(parallel=True)
def test(X):
    X[:0] += 1
    return X

test(np.ones(10))

Enabling the debug output which I just added, you get

debug: IndexError: index 1 is out of bounds for axis 0 with size 0
debug: IndexError: index 0 is out of bounds for axis 0 with size 0
debug: IndexError: index 7 is out of bounds for axis 0 with size 0
debug: IndexError: index 5 is out of bounds for axis 0 with size 0
debug: IndexError: index 9 is out of bounds for axis 0 with size 0
debug: IndexError: index 4 is out of bounds for axis 0 with size 0
debug: IndexError: index 3 is out of bounds for axis 0 with size 0
debug: IndexError: index 6 is out of bounds for axis 0 with size 0
debug: IndexError: index 8 is out of bounds for axis 0 with size 0
debug: IndexError: index 2 is out of bounds for axis 0 with size 0
IndexError: index is out of bounds

I guess parallel takes slices and converts them into explicit indices. But bounds checking should not be enabled for slices, so I need to figure out where that is happening and how to disable it for that code path, similar to what I did for the single core slice code path.

I haven't looked at the cuda issues yet, but I think they are similar.

As an aside, it could be a good idea to have a CI build that runs the test suite with bounds checking enabled. I already found one function in numba that failed with it enabled (see 0b95b7c).

@stuartarchibald
Copy link
Contributor

@stuartarchibald stuartarchibald commented Aug 16, 2019

I figured out that issue (see my most recent comments/commits).

Right now I'm trying to figure out the test failures, which are presently failing because bounds checking is still enabled by default. It looks like there are some issues with both cuda and parallel.

Here's an example:

from numba import njit
import numpy as np

@njit(parallel=True)
def test(X):
    X[:0] += 1
    return X

test(np.ones(10))

Enabling the debug output which I just added, you get

debug: IndexError: index 1 is out of bounds for axis 0 with size 0
debug: IndexError: index 0 is out of bounds for axis 0 with size 0
debug: IndexError: index 7 is out of bounds for axis 0 with size 0
debug: IndexError: index 5 is out of bounds for axis 0 with size 0
debug: IndexError: index 9 is out of bounds for axis 0 with size 0
debug: IndexError: index 4 is out of bounds for axis 0 with size 0
debug: IndexError: index 3 is out of bounds for axis 0 with size 0
debug: IndexError: index 6 is out of bounds for axis 0 with size 0
debug: IndexError: index 8 is out of bounds for axis 0 with size 0
debug: IndexError: index 2 is out of bounds for axis 0 with size 0
IndexError: index is out of bounds

I guess parallel takes slices and converts them into explicit indices. But bounds checking should not be enabled for slices, so I need to figure out where that is happening and how to disable it for that code path, similar to what I did for the single core slice code path.

Were I to guess, this is where I'd start looking. github.com/numba/numba/blob/f629736997eaf83040caecd1fea7322577b7fd89/numba/parfor.py#L1921-L1934
Try here:

numba/numba/parfor.py

Lines 2214 to 2220 in f629736

if isinstance(value_typ, types.npytypes.Array):
value_var = ir.Var(scope, mk_unique_var("$value_var"), loc)
self.typemap[value_var.name] = value_typ.dtype
getitem_call = ir.Expr.getitem(value, index_var, loc)
self.calltypes[getitem_call] = signature(
value_typ.dtype, value_typ, index_var_typ)
true_block.body.append(ir.Assign(getitem_call, value_var, loc))

I haven't looked at the cuda issues yet, but I think they are similar.

As an aside, it could be a good idea to have a CI build that runs the test suite with bounds checking enabled. I already found one function in numba that failed with it enabled (see 0b95b7c).

If we have the compute capacity then I think this makes sense.

@stuartarchibald
Copy link
Contributor

@stuartarchibald stuartarchibald commented Aug 29, 2019

Note to self: need to find a way to get an ir.Loc to the site the exception is being created else the exception will correctly report out of bounds, but the site it will be raised from will be the jitted function, not the corresponding source line.

@stuartarchibald
Copy link
Contributor

@stuartarchibald stuartarchibald commented Sep 3, 2019

@DrTodd13 This is the issue with parfors and bounds check, if you have any ideas they'd be appreciated, thanks #4432 (comment) !

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Sep 3, 2019

The bounds check should only happen when the getitem is physically called, unless I am misunderstanding how things work. Is the parfor generated from the slice doing unnecessary redundant getitems? I wasn't able yet to debug why it happens (any suggestions for how to debug this sort of thing, beyond "learn how to read LLVM"?)

And if it is and they're not easy to remove, is there some way to annotate the getitem so that the get_item_pointer can know it shouldn't be bounds checked? The parfor logic happens in a completely different stage from the get_item_pointer.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Nov 26, 2019

The parfor slice thing is the only issue left here that I'm aware of. I still need to run the full test suite with bounds checking enable to make sure there aren't other failures I missed.

As an aside, one test in the parfor tests confirmed that bounds checking does indeed break vectorization.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Nov 26, 2019

I just noticed that the test_parfor_slice19 bug is fixed by #4648. I fixed all other known issues on this PR. Please put it in the review queue. I guess we should wait until #4648 is merged before merging this.

When testing this, the tests should be run with NUMBA_BOUNDSCHECK=1 to ensure there aren't any latent out of bounds issues in the library functions.

numba/cgutils.py Outdated

def do_boundscheck(context, builder, ind, dimlen, axis=None):
# Boundschecking is always disabled for CUDA
from .cuda.target import CUDATargetContext
Copy link
Contributor

@sklam sklam Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Can we rely on turning off boundchecking at the numba/cuda/compiler.py.

And it is likely causing other errors because the cuda subpackage is auto-initializing the CUDA Driver.

Copy link
Contributor Author

@asmeurer asmeurer Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have it this way is because I have it so that the environment variable overrides the flag.

Copy link
Contributor

@sklam sklam Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is causing error like: https://dev.azure.com/numba/numba/_build/results?buildId=3349&view=logs&jobId=eb149424-352d-52d9-b16d-aeffaad6858a&taskId=cdac5993-002e-5764-1ca7-e646ebe78687&lineStart=1728&lineEnd=1760&colStart=1&colEnd=99

(^ the link highlights in the log but doesn't seem to scroll to it)


Traceback (most recent call last):
  File "/home/vsts/work/1/s/numba/errors.py", line 717, in new_error_context
    yield
  File "/home/vsts/work/1/s/numba/lowering.py", line 260, in lower_block
    self.lower_inst(inst)
  File "/home/vsts/work/1/s/numba/lowering.py", line 303, in lower_inst
    val = self.lower_assign(ty, inst)
  File "/home/vsts/work/1/s/numba/lowering.py", line 467, in lower_assign
    return self.lower_expr(ty, value)
  File "/home/vsts/work/1/s/numba/lowering.py", line 1055, in lower_expr
    expr.index_var, signature)
  File "/home/vsts/work/1/s/numba/lowering.py", line 612, in lower_getitem
    res = impl(self.builder, castvals)
  File "/home/vsts/work/1/s/numba/targets/base.py", line 1132, in __call__
    res = self._imp(self._context, builder, self._sig, args, loc=loc)
  File "/home/vsts/work/1/s/numba/targets/base.py", line 1157, in wrapper
    return fn(*args, **kwargs)
  File "/home/vsts/work/1/s/numba/targets/arrayobj.py", line 429, in getitem_arraynd_intp
    aryty, ary, (idxty,), (idx,))
  File "/home/vsts/work/1/s/numba/targets/arrayobj.py", line 403, in _getitem_array_generic
    basic_indexing(context, builder, aryty, ary, index_types, indices)
  File "/home/vsts/work/1/s/numba/targets/arrayobj.py", line 359, in basic_indexing
    cgutils.do_boundscheck(context, builder, ind, shapes[ax], ax)
  File "/home/vsts/work/1/s/numba/cgutils.py", line 674, in do_boundscheck
    from .cuda.target import CUDATargetContext
  File "/home/vsts/work/1/s/numba/cuda/target.py", line 17, in <module>
    from . import codegen, nvvmutils
  File "/home/vsts/work/1/s/numba/cuda/nvvmutils.py", line 5, in <module>
    from .api import current_context
  File "/home/vsts/work/1/s/numba/cuda/api.py", line 298, in <module>
    event_elapsed_time = driver.event_elapsed_time
AttributeError: module 'numba.cuda.simulator.cudadrv.driver' has no attribute 'event_elapsed_time'

Copy link
Contributor Author

@asmeurer asmeurer Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I didn't realize that the import could fail like that. What would be the best way to fix this?

Copy link
Contributor Author

@asmeurer asmeurer Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some attribute to the context that lets us detect that it is CUDA without importing it? For instance, context.name. Or I can just check type(context).__name__.

Copy link
Contributor

@sklam sklam Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function should do the check at all. If we turn off boundscheck in cuda context (#4432 (comment)) and make sure the array implementation follow the context setting, we can avoid the check here.

Here's what I think the changes should look like:
https://gist.github.com/sklam/38a31d9a5e9124dab7a77def216e0a4c
(I haven't fully tested it)

Copy link
Contributor Author

@asmeurer asmeurer Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check it out. Actually this reminds me that I was going to add a test that the NUMBA_BOUNDSCHECK environment variable overrides the flag correctly.

Copy link
Contributor Author

@asmeurer asmeurer Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to modify it so that the NUMBA_BOUNDSCHECK environment variable still works. I did this by making enable_boundscheck a property on the context class, so that it can use the config.BOUNDSCHECK as a default and so it can be modified at runtime by the override_env_config context manager for testing.

@@ -37,7 +37,7 @@ def compile_cuda(pyfunc, return_type, args, debug, inline):
flags.set('no_compile')
flags.set('no_cpython_wrapper')
if debug:
flags.set('boundcheck')
flags.set('boundscheck')
Copy link
Contributor

@sklam sklam Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should let this be off and maybe check that the flag is never turned on.

Copy link
Contributor Author

@asmeurer asmeurer Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already raises an error if the flag is set (there is a test for this).

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Dec 4, 2019

I've addressed the review comments here.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Dec 4, 2019

This issue shouldn't block this pull request, but regarding getting the location of the exception, it seems that the most straightforward way is to store it on the context. Here is an example implementation:

diff --git a/numba/cgutils.py b/numba/cgutils.py
index cdce25131..c89a70565 100644
--- a/numba/cgutils.py
+++ b/numba/cgutils.py
@@ -691,12 +691,12 @@ def do_boundscheck(context, builder, ind, dimlen, axis=None):
     with if_unlikely(builder, out_of_bounds_upper):
         if config.FULL_TRACEBACKS:
             _dbg()
-        context.call_conv.return_user_exc(builder, IndexError, (msg,))
+        context.call_conv.return_user_exc(builder, IndexError, (msg,), loc=context.loc)
     out_of_bounds_lower = builder.icmp_signed('<', ind, ind.type(0))
     with if_unlikely(builder, out_of_bounds_lower):
         if config.FULL_TRACEBACKS:
             _dbg()
-        context.call_conv.return_user_exc(builder, IndexError, (msg,))
+        context.call_conv.return_user_exc(builder, IndexError, (msg,), loc=context.loc)


 def get_item_pointer2(context, builder, data, shape, strides, layout, inds,
diff --git a/numba/lowering.py b/numba/lowering.py
index 3fb48c03f..824f61208 100644
--- a/numba/lowering.py
+++ b/numba/lowering.py
@@ -254,6 +254,7 @@ class BaseLower(object):
         self.pre_block(block)
         for inst in block.body:
             self.loc = inst.loc
+            self.context.loc = inst.loc
             defaulterrcls = partial(LoweringError, loc=self.loc)
             with new_error_context('lowering "{inst}" at {loc}', inst=inst,
                                    loc=self.loc, errcls_=defaulterrcls):
diff --git a/numba/targets/base.py b/numba/targets/base.py
index a7b52d69e..c89b1e56c 100644
--- a/numba/targets/base.py
+++ b/numba/targets/base.py
@@ -254,6 +254,8 @@ class BaseContext(object):

         self._boundscheck = False

+        self.loc = None
+
         self.data_model_manager = datamodel.default_manager

         # Initialize
from numba import njit
import numpy as np
@njit(boundscheck=True)
def test(x):
    return x[1]
test(np.array([1]))
$ python test-loc.py
Traceback (most recent call last):
  File "test-loc.py", line 6, in <module>
    test(np.array([1]))
  File "test-loc.py", line 5, in test
    return x[1]
IndexError: index is out of bounds

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Dec 5, 2019

This issue shouldn't block this pull request, but regarding getting the location of the exception, it seems that the most straightforward way is to store it on the context. Here is an example implementation:

I ran the tests with this and there were no failures. Should I push it up here?

@sklam
Copy link
Contributor

@sklam sklam commented Dec 10, 2019

RE: #4432 (comment)
Let's keep it a different PR.

@asmeurer
Copy link
Contributor Author

@asmeurer asmeurer commented Dec 10, 2019

Will do. I also have some thoughts on getting the error message to include a more complete error message (which I haven't implemented yet). I'll start a new PR once this one is merged.

@seibert
Copy link
Contributor

@seibert seibert commented Dec 17, 2019

running internal CI: numba_77

@seibert
Copy link
Contributor

@seibert seibert commented Dec 18, 2019

this has passed internal CI

sklam
sklam approved these changes Dec 18, 2019
Copy link
Contributor

@sklam sklam left a comment

Great work! This feature will help user avoid segfaults.

@seibert seibert merged commit bcb94db into numba:master Dec 18, 2019
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants