## Exercise 2.1 - Inspection

Here we will try inspecting the outputs of various stages of Numba’s compilation pipeline. First, we need a function to work with, so we’ll invent this sum function:

In [1]:
from numba import jit

@jit
def sum(x):
    s = 0
    for i in range(len(x)):
        s += x[i]
    return s

Now inspect its types:



In [2]:
sum.inspect_types()

There is no output - no typing information exists at present, because we have not yet forced a compilation by calling the function yet. So let’s do that and try inspecting types again:

In [3]:
import numpy as np

a = np.arange(10)

sum(a)

sum.inspect_types()

sum (array(int64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-1-430493d5276a>
# --- LINE 3 --- 

@jit

# --- LINE 4 --- 

def sum(x):

    # --- LINE 5 --- 
    # label 0
    #   x = arg(0, name=x)  :: array(int64, 1d, C)
    #   $const0.1 = const(int, 0)  :: int64
    #   s = $const0.1  :: int64
    #   del $const0.1

    s = 0

    # --- LINE 6 --- 
    #   jump 6
    # label 6
    #   $6.1 = global(range: <class 'range'>)  :: range
    #   $6.2 = global(len: <built-in function len>)  :: len
    #   $6.4 = call $6.2(x)  :: (array(int64, 1d, C),) -> int64
    #   del $6.2
    #   $6.5 = call $6.1($6.4)  :: (int64,) -> range_state64
    #   del $6.4
    #   del $6.1
    #   $6.6 = getiter(value=$6.5)  :: range_iter64
    #   del $6.5
    #   $phi25.1 = $6.6  :: range_iter64
    #   del $6.6
    #   jump 25
    # label 25
    #   $25.2 = iternext(value=$phi25.1)  :: pair<int64, bool>
    #   $25.3 = pair_first(value=$2

Calling the function again with arguments of a different type results in a different typing:

In [4]:
a = np.arange(10, dtype=np.float32)
sum(a)
sum.inspect_types()

sum (array(int64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-1-430493d5276a>
# --- LINE 3 --- 

@jit

# --- LINE 4 --- 

def sum(x):

    # --- LINE 5 --- 
    # label 0
    #   x = arg(0, name=x)  :: array(int64, 1d, C)
    #   $const0.1 = const(int, 0)  :: int64
    #   s = $const0.1  :: int64
    #   del $const0.1

    s = 0

    # --- LINE 6 --- 
    #   jump 6
    # label 6
    #   $6.1 = global(range: <class 'range'>)  :: range
    #   $6.2 = global(len: <built-in function len>)  :: len
    #   $6.4 = call $6.2(x)  :: (array(int64, 1d, C),) -> int64
    #   del $6.2
    #   $6.5 = call $6.1($6.4)  :: (int64,) -> range_state64
    #   del $6.4
    #   del $6.1
    #   $6.6 = getiter(value=$6.5)  :: range_iter64
    #   del $6.5
    #   $phi25.1 = $6.6  :: range_iter64
    #   del $6.6
    #   jump 25
    # label 25
    #   $25.2 = iternext(value=$phi25.1)  :: pair<int64, bool>
    #   $25.3 = pair_first(value=$2

The different typings have different code generated. We can use inspect_llvm to see the generated LLVM IR, which return a dict keyed on the argument types, so it it helpful to define an additional function to help show the LLVM code:

In [5]:
def show_llvm(func):
    llvm_code = func.inspect_llvm()
    for k, v in llvm_code.items():
        print('-' * 80)
        print("Signature:", k)
        print('-' * 80)
        print(v)
        print()

Now we can easily look at the LLVM code for sum:

In [6]:
show_llvm(sum)

--------------------------------------------------------------------------------
Signature: (array(float32, 1d, C),)
--------------------------------------------------------------------------------
; ModuleID = '<string>'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@PyExc_RuntimeError = external global i8
@.const.sum = internal constant [4 x i8] c"sum\00"
@".const.Fatal_error:_missing__dynfunc.Closure" = internal constant [38 x i8] c"Fatal error: missing _dynfunc.Closure\00"
@.const.missing_Environment = internal constant [20 x i8] c"missing Environment\00"

define i32 @"__main__.sum$2.array(float32,_1d,_C)"(double* noalias nocapture %retptr, { i8*, i32 }** noalias nocapture readnone %excinfo, i8* noalias nocapture readnone %env, i8* %arg.x.0, i8* nocapture readnone %arg.x.1, i64 %arg.x.2, i64 %arg.x.3, float* nocapture readonly %arg.x.4, i64 %arg.x.5.0, i64 %arg.x.6.0) {
entry:
  %.4.i = icmp eq i8* %arg.x.0, null
  br i1 %.4

That is quite a lot of code! The generated code includes a wrapper function which does part of the work of marshalling the arguments, and calls to the Numba Runtime, which manages memory allocation within compiled code.

Generated assembly can similarly be viewed:

In [7]:
def show_asm(func):
    asm_code = func.inspect_asm()
    for k, v in asm_code.items():
        print('-' * 80)
        print("Signature:", k)
        print('-' * 80)
        print(v)
        print()

show_asm(sum)

--------------------------------------------------------------------------------
Signature: (array(float32, 1d, C),)
--------------------------------------------------------------------------------
	.text
	.file	"<string>"
	.globl	"__main__.sum$2.array(float32,_1d,_C)"
	.align	16, 0x90
	.type	"__main__.sum$2.array(float32,_1d,_C)",@function
"__main__.sum$2.array(float32,_1d,_C)":
	.cfi_startproc
	pushq	%rbx
.Ltmp0:
	.cfi_def_cfa_offset 16
	subq	$16, %rsp
.Ltmp1:
	.cfi_def_cfa_offset 32
.Ltmp2:
	.cfi_offset %rbx, -16
	movq	%rdi, %rbx
	movq	48(%rsp), %rax
	testq	%rcx, %rcx
	je	.LBB0_2
	lock
	incq	(%rcx)
.LBB0_2:
	xorps	%xmm1, %xmm1
	testq	%rax, %rax
	jle	.LBB0_5
	movq	40(%rsp), %rdx
	incq	%rax
	xorps	%xmm1, %xmm1
	.align	16, 0x90
.LBB0_4:
	movss	(%rdx), %xmm0
	cvtss2sd	%xmm0, %xmm0
	addsd	%xmm0, %xmm1
	decq	%rax
	addq	$4, %rdx
	cmpq	$1, %rax
	jg	.LBB0_4
.LBB0_5:
	testq	%rcx, %rcx
	je	.LBB0_8
	movq	$-1, %rax
	lock
	xaddq	%rax, (%rcx)
	cmpq	$1, %rax
	je	.LBB0_7
.LBB0_8:
	movsd	%xmm1, (%rbx

The generated assembly code is a lot shorter than the LLVM code - this is because it has been transformed by LLVM’s optimisation passes, which has in part simplified the code so that it executes more quickly.

## Summary

- The typing of variables in the Python source and Numba IR can be viewed with `inspect_types()`.
- The generated LLVM and Assembly code can be retrieved using the `inspect_llvm()` and `inspect_asm()` functions.
- The LLVM output is very large in comparison with the assembly code. This is because the optimisation passes simplify and eliminate a large amount of code.
- The generated code also handles marshalling Python arguments to native types, and book-keeping for reference-counting.