## Is this just magic?  What is Numba doing to make code run quickly?

Let's define a trivial example function.

In [1]:
from numba import jit

In [2]:
@jit
def add(a, b):
    return a + b

In [3]:
add(1, 1)

2

Numba examines Python bytecode and then translates this into an 'intermediate representation'.  To view this IR, run (compile) `add` and you can access the `inspect_types` method.

In [4]:
add.inspect_types()

add (int64, int64)
--------------------------------------------------------------------------------
# File: <ipython-input-2-7a2ac56f16b6>
# --- LINE 1 --- 
# label 0
#   del b
#   del a
#   del $0.3

@jit

# --- LINE 2 --- 

def add(a, b):

    # --- LINE 3 --- 
    #   a = arg(0, name=a)  :: int64
    #   b = arg(1, name=b)  :: int64
    #   $0.3 = a + b  :: int64
    #   $0.4 = cast(value=$0.3)  :: int64
    #   return $0.4

    return a + b




Ok.  Numba is has correctly inferred the type of the arguments, defining things as `int64` and running smoothly.  

(What happens if you do `add(1., 1.)` and then `inspect_types`?)

In [5]:
add(1., 1.)

2.0

In [6]:
add.inspect_types()

add (int64, int64)
--------------------------------------------------------------------------------
# File: <ipython-input-2-7a2ac56f16b6>
# --- LINE 1 --- 
# label 0
#   del b
#   del a
#   del $0.3

@jit

# --- LINE 2 --- 

def add(a, b):

    # --- LINE 3 --- 
    #   a = arg(0, name=a)  :: int64
    #   b = arg(1, name=b)  :: int64
    #   $0.3 = a + b  :: int64
    #   $0.4 = cast(value=$0.3)  :: int64
    #   return $0.4

    return a + b


add (float64, float64)
--------------------------------------------------------------------------------
# File: <ipython-input-2-7a2ac56f16b6>
# --- LINE 1 --- 
# label 0
#   del b
#   del a
#   del $0.3

@jit

# --- LINE 2 --- 

def add(a, b):

    # --- LINE 3 --- 
    #   a = arg(0, name=a)  :: float64
    #   b = arg(1, name=b)  :: float64
    #   $0.3 = a + b  :: float64
    #   $0.4 = cast(value=$0.3)  :: float64
    #   return $0.4

    return a + b




### What about the actual LLVM code?

You can see the actual LLVM code generated by Numba using the `inspect_llvm()` method.  Since it's a `dict`, doing the following will be slightly more visually friendly.

In [7]:
for k, v in add.inspect_llvm().items():
    print(k, v)

(int64, int64) ; ModuleID = 'add'
source_filename = "<string>"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.const.add = internal constant [4 x i8] c"add\00"
@".const.Fatal error: missing _dynfunc.Closure" = internal constant [38 x i8] c"Fatal error: missing _dynfunc.Closure\00"
@PyExc_RuntimeError = external global i8
@".const.missing Environment" = internal constant [20 x i8] c"missing Environment\00"

; Function Attrs: norecurse nounwind
define i32 @"_ZN8__main__7add$241Exx"(i64* noalias nocapture %retptr, { i8*, i32 }** noalias nocapture readnone %excinfo, i8* noalias nocapture readnone %env, i64 %arg.a, i64 %arg.b) local_unnamed_addr #0 {
entry:
  %.15 = add nsw i64 %arg.b, %arg.a
  store i64 %.15, i64* %retptr, align 8
  ret i32 0
}

define i8* @"_ZN7cpython8__main__7add$241Exx"(i8* %py_closure, i8* %py_args, i8* nocapture readnone %py_kws) local_unnamed_addr {
entry:
  %.5 = alloca i8*, align 8
  %.6 = alloca i8*, align

## But there's a caveat

Now, watch what happens when we try to do something that is natural in Python, but not particularly mathematically sound:

In [22]:
def add_strings(a, b):
    return a + b

In [23]:
add_strings_jit = jit()(add_strings)

In [24]:
add_strings_jit('a', 'b')

'ab'

It worked, but what does `inspect_types` tell us?

In [25]:
add_strings_jit.inspect_types()

add_strings (str, str)
--------------------------------------------------------------------------------
# File: <ipython-input-22-d1008a9d4aa2>
# --- LINE 1 --- 
# label 0
#   del b
#   del a
#   del $0.3

def add_strings(a, b):

    # --- LINE 2 --- 
    #   a = arg(0, name=a)  :: pyobject
    #   b = arg(1, name=b)  :: pyobject
    #   $0.3 = a + b  :: pyobject
    #   $0.4 = cast(value=$0.3)  :: pyobject
    #   return $0.4

    return a + b




In [26]:
%%timeit
add_strings('a', 'b')

158 ns ± 5.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [27]:
%%timeit
add_strings_jit('a', 'b')

8.2 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## What's all this pyobject business?  

This means it has been compiled in `object` mode.  This can be a faster than regular python if it can do loop lifting, but not that fast.  
We want those `pyobjects` to be `int64` or another type that can be inferred by Numba. Your best bet is forcing `nopython` mode: this will throw an error if Numba finds itself in object mode, so that you _know_ that it can't give you speed.

For the full list of supported Python and NumPy features in `nopython` mode, see the Numba documentation here: http://numba.pydata.org/numba-doc/latest/reference/pysupported.html

## Figuring out what isn't working

In [28]:
%%file nopython_failure.py
from numba import jit

@jit
def add(a, b):
    for i in range(100):
        c = i
        f = i + 7
        l = c + f
        
    return a + b

add('a', 'b')

Overwriting nopython_failure.py


In [29]:
!numba --annotate-html fail.html nopython_failure.py

[0m

[fail.html](fail.html)

## Forcing `nopython` mode

In [30]:
add_strings_jit = jit(nopython=True)(add_strings)

In [31]:
add_strings_jit('a', 'b')

TypingError: Failed at nopython (nopython frontend)
[33m[1m[33m[1mInvalid usage of + with parameters (str, str)
Known signatures:
 * (int64, int64) -> int64
 * (int64, uint64) -> int64
 * (uint64, int64) -> int64
 * (uint64, uint64) -> uint64
 * (float32, float32) -> float32
 * (float64, float64) -> float64
 * (complex64, complex64) -> complex64
 * (complex128, complex128) -> complex128
 * (uint64,) -> uint64
 * (uint8,) -> uint64
 * (uint16,) -> uint64
 * (uint32,) -> uint64
 * (int32,) -> int64
 * (int64,) -> int64
 * (int8,) -> int64
 * (int16,) -> int64
 * (float32,) -> float32
 * (float64,) -> float64
 * (complex64,) -> complex64
 * (complex128,) -> complex128
 * parameterized[0m
[0m[37m[1m[1] During: typing of intrinsic-call at <ipython-input-22-d1008a9d4aa2> (2)[0m
[37m[1m
File "<ipython-input-22-d1008a9d4aa2>", line 2:[0m
[34m[1mdef add_strings(a, b):
[31m[1m    return a + b
[0m    [32m[1m^[0m[0m

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new


In [32]:
from numba import njit

In [33]:
add_strings_jit = njit(add_strings)

In [34]:
add_strings_jit('a', 'b')

TypingError: Failed at nopython (nopython frontend)
[33m[1m[33m[1mInvalid usage of + with parameters (str, str)
Known signatures:
 * (int64, int64) -> int64
 * (int64, uint64) -> int64
 * (uint64, int64) -> int64
 * (uint64, uint64) -> uint64
 * (float32, float32) -> float32
 * (float64, float64) -> float64
 * (complex64, complex64) -> complex64
 * (complex128, complex128) -> complex128
 * (uint64,) -> uint64
 * (uint8,) -> uint64
 * (uint16,) -> uint64
 * (uint32,) -> uint64
 * (int32,) -> int64
 * (int64,) -> int64
 * (int8,) -> int64
 * (int16,) -> int64
 * (float32,) -> float32
 * (float64,) -> float64
 * (complex64,) -> complex64
 * (complex128,) -> complex128
 * parameterized[0m
[0m[37m[1m[1] During: typing of intrinsic-call at <ipython-input-22-d1008a9d4aa2> (2)[0m
[37m[1m
File "<ipython-input-22-d1008a9d4aa2>", line 2:[0m
[34m[1mdef add_strings(a, b):
[31m[1m    return a + b
[0m    [32m[1m^[0m[0m

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new


## Other compilation flags

There are two other main compilation flags for `@jit`

```python
cache=True
```

if you don't want to always want to get dinged by the compilation time for every run. This will actually save the compiled function into something like a `pyc` file in your `__pycache__` directory, so even between sessions you should have nice fast performance.

```python
nogil=True
```

This releases the GIL.  Note, however, that it doesn't do anything else, like make your program threadsafe.  You have to manage all of those things on your own (use `concurrent.futures`).