## A demonstration of the PIXIE project.

This is a demonstration of the, currently at proof-of-concept stage, PIXIE (<b>P</b>ortable <b>I</b>nstructions e<b>X</b>changed <b>I</b>n <b>E</b>xecutable) project.

<b>WARNING: This is a proof-of-concept technology demonstration only and not suitable for general use!</b>


The aim of the PIXIE project is to provide tools to create Ahead-Of-Time (AOT), but Just-In-Time (JIT) compiler "friendly", libraries that put their performance profile into the hands of the user.

This demonstration includes:

1. [Compiling some LLVM IR via PIXIE into a shared library.](#1.-Compiling-some-LLVM-IR-via-PIXIE-into-a-shared-library.)
2. [Using PIXIE to make a shared library available as a Python C-Extension.](#2.-Using-PIXIE-to-make-a-shared-library-available-as-a-Python-C-Extension.)
3. [Exploring the `__PIXIE__` dictionary in a PIXIE created C-Extension.](#3.-Exploring-the-__PIXIE__-dictionary-in-a-PIXIE-created-C-Extension.)
4. [Calling functions from the `__PIXIE__` dictionary with Numba via `ctypes` and LLVM bitcode.](#4.-Calling-functions-from-the-__PIXIE__-dictionary-with-Numba-via-ctypes-and-LLVM-bitcode.)
5. [Host (or other target) specialisation and automatic rewiring of the `__PIXIE__` dictionary.](#5.-Host-(or-other-target)-specialisation-and-automatic-rewiring-of-the-__PIXIE__-dictionary.)
6. [Numba++'s Minimum Viable Product (MVP)](#6.-Numba++'s-Minimum-Viable-Product-(MVP))
   1. ["Blending" input languages through AOT compiled modules. Example application, a "custom" solver.](#6A.-"Blending"-input-languages-through-AOT-compiled-modules.-Example-application,-a-"custom"-solver.)
   2. ["Blending" performance through AOT compiled modules. Example application, a DAXPY-like function call.](#6B.-"Blending"-performance-through-AOT-compiled-modules.-Example-application,-a-DAXPY-like-function-call.)

<b>NOTE: PIXIE and this demonstration currently target x86_64 CPUs running Linux operating systems. Other CPUs and operating systems will be supported in the future. </b>

### 1. Compiling some LLVM IR via PIXIE into a shared library.

First, import some PIXIE compiler components:

In [None]:
from pixie import PIXIECompiler, TranslationUnit, ExportConfiguration

PIXIE consumes LLVM intermediate representation (IR) (or LLVM bitcode) and operates on translation units, just like LLVM - where a translation unit is approximately a module or in e.g. clang/GCC compilation of C language code - where a translation unit is typically a "file". Create some LLVM IR as input source to put into a translation unit:

In [None]:
# This LLVM IR is the equivalent to the C code:
# void foo(double* a, double* b, double* out)
# {
#      *out = *a + *b;
# }

foo_double_double = """
define void @"_Z3fooPdS_"(double* %".1", double* %".2", double* %".out")
{
entry:
    %.3 = load double, double * %.1
    %.4 = load double, double * %.2
    %"res" = fadd double %".3", %".4"
    store double %"res", double * %".out"
    ret void
}
"""

Translation units are stored into an iterable, this is synonymous to a command line invocation e.g.

`$CC <translation unit 1>... <translation unit n>`

Create an interable containing a translation unit holding the LLVM IR from above:

In [None]:
tus = [TranslationUnit("foo_double_double" , foo_double_double),]

Next PIXIE needs to be informed about what is to be exported (i.e. which symbols are global), this is done via an export configuration.

In [None]:
# This is an export configuration object, the kwarg "versioning_strategy" is not used yet,
# but will eventually inform how to create ISA specific versions of the code being compiled.
export_config = ExportConfiguration(versioning_strategy='embed_dso')
export_config.add_symbol(python_name="foo", symbol_name="_Z3fooPdS_", signature="void(double*, double*, double*)")

Now initialize a PIXIE compiler instance and call `.compile()` to compile the source into a shared library.

In [None]:
from pixie.cpus import x86

# Create a compiler instance
libfoo_compiler = PIXIECompiler(library_name='foo_library', # the name for the output shared library
                                translation_units=tus, # translation units
                                export_configuration=export_config, # export config
                                baseline_cpu='nocona', # The "baseline" CPU, this is the CPU that the library itself will target
                                baseline_features=x86.sse3, # feature set(s) that the library will target
                                python_cext=False, # whether the output shared library should be a C-Extension
                                output_dir='.',) # directory in which to put the shared library once it has been created

# ... and compile the shared library
libfoo_compiler.compile()

All PIXIE is doing is using LLVM's MCJIT to create object files and then linking them with a native linker. The output is just a standard shared library (NOTE: there's a bug in the naming scheme at present that means the shared libraries end up with C-Extension-like names). Take a look at the shared library:

In [None]:
!file foo_library.*so

This shared library can be loaded with `ctypes` calls and then the symbol from the function defined in the LLVM IR above can be called.

In [None]:
from ctypes import CDLL, byref, c_double, POINTER
import os

# Set up binding, load the library from the PWD
dso = CDLL(os.path.join('.', libfoo_compiler._output_file))
dso._Z3fooPdS_.argtypes = (POINTER(c_double),) * 3
dso._Z3fooPdS_.restype = None

# Stage call
a = c_double(3)
b = c_double(5)
out = c_double(0)
dso._Z3fooPdS_(*(byref(x) for x in (a, b, out)))
print(f"{a.value} + {b.value} = {out.value}")

### 2. Using PIXIE to make a shared library available as a Python C-Extension.

It's easy for PIXIE to build a shared library as a Python C-Extension (importable module), just set `python_cext=True` in the compiler invocation. The output is binary module that can be imported with a "special" `__PIXIE__` dictionary in the module root.

In [None]:
# Create a compiler instance
libfoo_compiler = PIXIECompiler(
                                #########################
                                # CHANGE NAME FOR C-EXT #
                                #########################
                                library_name='cext_foo_library', # the name for the output shared library
                                translation_units=tus, # translation units
                                export_configuration=export_config, # export config
                                baseline_cpu='nocona', # The "baseline" CPU, this is the CPU that the library itself will target
                                baseline_features=x86.sse3, # feature set(s) that the library will target
                                ####################
                                # THIS IS NOW TRUE #
                                ####################
                                python_cext=True, # whether the output shared library should be a C-Extension
                                output_dir='.',) # directory in which to put the shared library once it has been created

# ... and compile the shared library
libfoo_compiler.compile()

### 3. Exploring the `__PIXIE__` dictionary in a PIXIE created C-Extension.

As this new PIXIE library is a C-Extension, it can just be imported like any other module. This imports the C-Extension compiled above and runs `dir` on it!

In [None]:
import cext_foo_library
dir(cext_foo_library)

What's in the `__PIXIE__` dictionary?

In [None]:
from pprint import pprint
pprint(cext_foo_library.__PIXIE__)

Interesting things to note are:

* The `bitcode` key, which has the LLVM bitcode used to compile the module as its value.
* The `c_header` key, which is currently unused, but will be a place to store a C-header compatible with the bitcode/source used to compile the library.
* The keys `is_specialized` and `specialize` which are to do with "specialisation" of this library (see the [next section](#5.-Host-(or-other-target)-specialisation-and-automatic-rewiring-of-the-__PIXIE__-dictionary.)).
* The key `linkage` records linkage, e.g. `('m',)` might indicate `libm` linkage.
* The `symbols` key which contains a symbol-table-like dictionary. It contains information in the form:
  ```
   <python function name> : <dictionary of signatures to details> 
  ```
  and each `<dictionary of signatures to details>` contains a `signature` as the key which maps to a value of a  dictionary containing the following:
  * `address` the address of function in the current process.
  * `baseline_feature` base line feature set for this function.
  * `cfunc`: a callable `ctypes.CFUNCTYPE` wrapper for this function.
  * `feature_variants` (currently unused).
  * `metadata`: a storage container for anything additional a user might want to store about this function.
  * `module` the Python module source for this function.
  * `source_file` the source file for this function.
  * `symbol` the symbol name associated with this signature.
* The `uuid` key, this is used to make the libraries "unique" and helps when loading "specialisations".

The same symbol that was called previously in the "shared library" can now be called more easily as the `ctypes` binding is already done as part of what's in the `__PIXIE__` dictionary.

In [None]:
# Walk through the `__PIXIE__` dictionary and fetch the `cfunc`
foo_symbols = cext_foo_library.__PIXIE__['symbols']['foo']
foo_specialisation = foo_symbols['void(double*, double*, double*)']
cfunc = foo_specialisation['cfunc']

# stage the call with the same inputs as defined previously, but a new output to make sure it works!:
out = c_double(0)
cfunc(*(byref(x) for x in (a, b, out)))
print(f"{a.value} + {b.value} = {out.value}")

### 4. Calling functions from the `__PIXIE__` dictionary with Numba via `ctypes` and LLVM bitcode.

It's also possible to call the `cfunc` from above using Numba... this just uses the standard `ctypes` support that is already built in to Numba:

In [None]:
from numba import njit
import numpy as np

@njit
def call_foo():
    a = np.array([3.])
    b = np.array([5.])
    out = np.array([0.])
    cfunc(a.ctypes, b.ctypes, out.ctypes)
    print("result:", out[0])
    
call_foo()

If the control flow graph of the LLVM IR Numba generates is inspected, the use of `numba.dynamic.globals.<hash>` can be seen, this is the `ctypes` bound function call (it's just calling a raw address).

In [None]:
call_foo.inspect_cfg(call_foo.signatures[0])

It's also possible to use the `bitcode` from the PIXIE library to make the same call, but this time Numba's JIT compiler can "see" the function being called and optimise the call site and the body of the function.

In [None]:
# Note: this is a little involved as the PIXIE function has `double *` in its signature
# and so Python values need to be passed-by-reference.

from numba.extending import intrinsic
from numba.core import cgutils
from llvmlite import binding as llvm
from llvmlite import ir as llvmir

@intrinsic
def recompiled_foo(tyctx, ty_x, ty_y):
    assert ty_x == ty_y
    sig = ty_x(ty_x, ty_y)
    def codegen(cgctx, builder, sig, llargs):
        # This loads the bitcode from the PIXIE dictionary and adds it to the code library
        bitcode = cext_foo_library.__PIXIE__['bitcode']
        cgctx.active_code_library.add_llvm_module(llvm.parse_bitcode(bitcode))
        # Create an appropriate LLVM IR function signature
        double_ptr = llvmir.DoubleType().as_pointer()
        fnty = llvmir.FunctionType(llvmir.VoidType(), (double_ptr,) * 3)
        # Fetch the name of the symbol associated with this function
        foo_symbols = cext_foo_library.__PIXIE__['symbols']['foo']
        sym_data = foo_symbols['void(double*, double*, double*)']
        sym_name = sym_data['symbol']
        # Create the function in the LLVM IR for this module
        fn = cgutils.get_or_insert_function(builder.module, fnty, sym_name)
        # Stack allocate some space and assign in the values coming from the Python call
        x_ptr = cgutils.alloca_once_value(builder, llargs[0])
        y_ptr = cgutils.alloca_once_value(builder, llargs[1])
        # Allocate a stack slot for the result
        r_ptr = cgutils.alloca_once(builder, llargs[0].type)
        # stage the call to the function
        builder.call(fn, (x_ptr, y_ptr, r_ptr))
        # load and return the result as a value
        return builder.load(r_ptr)
    return sig, codegen

@njit
def call_bitcode_foo():
    return recompiled_foo(1.0, 2.0)

print(f"result {call_bitcode_foo()}")

Take a look at the control flow graph of the LLVM IR Numba generates now that the bitcode variant of the `foo` function has been used:

In [None]:
call_bitcode_foo.inspect_cfg(call_bitcode_foo.signatures[0])

... LLVM has managed to optimise the entire function `call_bitcode_foo` into storing the value `3.0` into Numba's `retptr` (return value) variable.

### 5. Host (or other target) specialisation and automatic rewiring of the `__PIXIE__` dictionary.

PIXIE libraries can effectively "recompile" themselves. Recall that the `cext_foo_library` was originally compiled for a `nocona` series CPU targeting `SSE3` instructions, the AOT symbol calls will be running with these instructions. PIXIE libraries can trivially be re-specialised (by default to the host machine) and the C-Extension loading mechanism of the PIXIE libraries is aware of specialisations such that the `__PIXIE__` dictionary will reflect the presence of a specialised library if present. This means that Python users of C-Extension PIXIE libraries do not need to change their imports after "specialisation" occurs! This idea is perhaps more easily explained in code...

First, dump the `foo` symbol in the original `cext_foo_library`, note the SSE3 instructions (this assumes `objdump` is present on `$PATH` and it supports `--disassemble=`).

In [None]:
!objdump --disassemble=_Z3fooPdS_ -j .text cext_foo_library*.so

Now call the `specialize` function that is in the `__PIXIE__` dictionary, this will recompile the library to target the host CPU and create a new library named as the original but with `_pixie_specialized` appended.

In [None]:
cext_foo_library.__PIXIE__['specialize']()

Note in the output of the above that the library has been specialised to the CPU present in the host machine. Check the `foo` symbol in the new `specialized` variant of the library, assuming the machine on which this notebook is running has instructions later than `SSE3` available the machine code in the symbol should make use of them (anything with `AVX` onwards will probably have a notable difference).

In [None]:
!objdump --disassemble=_Z3fooPdS_ -j .text cext_foo_library_pixie_specialized*.so

The `__PIXIE__` dictionaries are "specialisation aware", if there's a specialised version of the C-Extension present it's automatically wired into the `__PIXIE__` dictionary during import. i.e. users do not have to change their code to use the specialised version of the library.

To demonstrate... reload the module in process (do not do this in practice!), note how it picks up the specialisation on re-import:

In [None]:
import sys
sys.modules.pop('cext_foo_library', None)
import cext_foo_library

The `__PIXIE__` dictionary is present as before, but it's actually wired through to the specialised version (note the `is_specialized` key now has value `True`).

In [None]:
cext_foo_library.__PIXIE__['is_specialized']

It's possible to demonstrate that the specialised version of the library is being reference by the imported module by loading the specialised version through `ctypes` and checking the addresses match (the link-loader invoked through the `CDLL` call will "see" that the specialised version is already loaded in process and just return it, which is how the addresses will be the same).

In [None]:
from ctypes import cast, c_void_p
addr = cext_foo_library.__PIXIE__['symbols']['foo']['void(double*, double*, double*)']['address']
print(f"Address from __PIXIE__ dictionary: {hex(addr)}")
specialized_name = libfoo_compiler._output_file.replace(libfoo_compiler._basename, f"{libfoo_compiler._basename}_pixie_specialized")
specialized_dso = CDLL(os.path.join('.', specialized_name))
print(f"Address from ctypes CDLL load    : {hex(cast(specialized_dso._Z3fooPdS_, c_void_p).value)}")

This means that any user code that calls through symbols in the `__PIXIE__` dictionary will now automatically get the specialised versions and improved performance. The package author (or packager) shipped `noconda` + `SSE3`, but the user easily specialised the library to their local hardware with no changes to their application code.

### 6. Numba++'s Minimum Viable Product (MVP)

What does all this mean for Numba++ (next generation Numba)? This graph shows the anticipated capabilities for Numba++'s "minimum viable product".

In [None]:
def numba_mvp_graph():
    import graphviz
    cluster_font = {'fontcolor':'#191970', 'fontname':'Arial'}
    f = graphviz.Digraph('numba_mvp')
    f.attr(rankdir='TB')
    f.attr('node', shape='box', style='rounded', fontname='Arial')
    f.attr('edge', fontname='Arial', fontsize='11')

    with f.subgraph(name='cluster_C_pipeline') as c_pipe:
        c_pipe.attr(label='C language\nAOT pipeline', bgcolor='#aaffee60', **cluster_font)
        c_pipe.attr('node', fillcolor='#ffffff', style='filled')
        c_pipe.node('C/C++ library')
        c_pipe.node('clang')
        c_pipe.node(label='PIXIE', name='PIXIE1')
        c_pipe.edge('C/C++ library', 'clang', label='C source')
        c_pipe.edge('clang', 'PIXIE1', label='LLVM IR')

    with f.subgraph(name='cluster_Python_AOT_pipeline') as py_pipe:
        py_pipe.attr(label='Python language\nAOT pipeline', bgcolor='#98fb9860', **cluster_font)
        py_pipe.attr('node', fillcolor='#ffffff', style='filled')
        py_pipe.node('Python')
        py_pipe.node('Numba AOT Compiler')
        py_pipe.node(label='PIXIE', name='PIXIE2')
        py_pipe.edge('Python', 'Numba AOT Compiler', label='Python bytecode')
        py_pipe.edge('Numba AOT Compiler', 'PIXIE2', label='LLVM IR')

    with f.subgraph(name='cluster_Python_pipeline') as user_pipe:
        user_pipe.attr(label='User code', bgcolor='#aa22ee60', **cluster_font)
        user_pipe.attr('node', fillcolor='#ffffff', style='filled')
        user_pipe.node(label='Python', name='user_python')

    with f.subgraph(name='cluster_Numba_JIT') as nbjit:
        buf = ' ' * 40
        nbjit.attr(label=f'{buf}Numba JIT Compiler', bgcolor='#ffa50060', **cluster_font)
        nbjit.attr('node', fillcolor='#ffffff', style='filled')
        nbjit.node(label='LLVM cross module optimisation', name='opt')
        nbjit.node(label='JIT pipeline', name='jit_pipeline')
        nbjit.node(label='JIT Engine', name='jit_engine')
        nbjit.node(label='Execution', name='exec')
        nbjit.edge('jit_pipeline', 'opt')
        nbjit.edge('opt', 'jit_engine', label='Optimised\nLLVM IR')
        nbjit.edge('jit_engine', 'exec', label='Executable\nbinaries')

    f.edge('PIXIE1', 'opt', label='PIXIE library')
    f.edge('PIXIE2', 'opt', label='PIXIE library')
    f.edge('user_python', 'jit_pipeline', label='Python bytecode')
    return f

numba_mvp_graph()

PIXIE fits into the plan/graph because:
* PIXIE is about making it so that package authors/packagers can ship AOT deployable libraries but at the same time leave options for users to gain better performance.
* PIXIE is indifferent to what the input source language was so long as it can be converted to LLVM bitcode.
* Numba++ MVP is about making it possible to combine languages at the package author level and AOT/JIT compilation at the user level in a way that best suite the end application.

### 6A. "Blending" input languages through AOT compiled modules. Example application, a "custom" solver.

This example demonstrates the use of a Newton-Raphson based solver to compute the roots of the function `cos(x) + 1`. The purpose of this example is to emulate code being written in different languages and packaged by different parties and yet the user is able to make choices about how to incorporate the packaged code.

1. One "author" writes the objective function and its derivative in C code and compiles into a PIXIE library shipped by their package.
2. Another "author" writes a Newton-Raphson solver in Python and compiles into another PIXIE library shipped by their package.
3. The libraries in 1. and 2. are used by a third author in their custom application, which is compiled with Numba's JIT compiler, and choices can be made about whether to call via bitcode or via symbols.

#### Compiling an objective functions module:

The objective functions module comprises the C code:

In [None]:
!cat objective_function.c

In [None]:
!cat objective_function_derivative.c

A PIXIE based C compiler is needed. A quick way to obtain one is to essentially just wrap `clang` calls to get some LLVM IR and then do a compilation via the standard PIXIE tool chain. The example below creates such a compilation chain and compiles the objective function and its derivative from above into a C-extension module called `objective_functions`.

In [None]:
import subprocess, tempfile, os

def tu_from_c_source(fname):
    with tempfile.TemporaryDirectory() as build_dir:
        outfile = os.path.join(build_dir, 'tmp.bc')
        cmd = ('clang', '-x', 'c++', '-fPIC', '-mcmodel=small', '-emit-llvm', fname, '-o', outfile, '-c')
        subprocess.run(cmd)
        with open(outfile, 'rb') as f:
            data = f.read()
        return TranslationUnit(fname, data)
    
def compile_mvp_c_code():
    src = ('objective_function.c', 'objective_function_derivative.c')
    tus = [tu_from_c_source(s) for s in src]
    export_config = ExportConfiguration(versioning_strategy='embed_dso')
    export_config.add_symbol(python_name='f',
                             symbol_name='_Z1fPdS_',
                             signature='void(double*, double*)')
    export_config.add_symbol(python_name='dfdx',
                             symbol_name='_Z4dfdxPdS_',
                             signature='void(double*, double*)')
    compiler = PIXIECompiler(library_name='objective_functions',
                             translation_units=tus,
                             export_configuration=export_config,
                             baseline_cpu='nocona',
                             baseline_features=x86.sse3,
                             python_cext=True,
                             output_dir='.')
    compiler.compile()
    
compile_mvp_c_code()

# Check the objective functions module imports without error
import objective_functions

#### Compiling a root finding module:

The root finding module is to be defined in Python source and compiled using a new Numba AOT compiler prototype. The `NR_root` function below implements a Newton-Raphson solver based on code derived from [here](https://github.com/numba/numba-examples/blob/cc0304f9fa75530809dc19fb7168de32b3d1a931/tutorials/nasa_apps_oct_2019/1%20-%20Numba%20basics.ipynb) under the terms of the license (see following `Details`):

<details>

LICENSE location:
https://github.com/numba/numba-examples/blob/cc0304f9fa75530809dc19fb7168de32b3d1a931/LICENSE
    
Copy of license
```
BSD 2-Clause License

Copyright (c) 2017, Numba
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
```
</details>

In [None]:
from pixie_numba_compiler import aot
from numba import types

# A function type that takes a value of double type as an argument and returns a value of double type.
fn_type = types.FunctionType(types.double(types.double))

# Say hello to a new Numba AOT compiler!
@aot((types.double(fn_type, fn_type, types.double, types.double, types.intp)),)
def NR_root(f, dfdx, x0, eps, max_it):
    converged = False
    for i in range(max_it):
        y = f(x0)
        yp = dfdx(x0)
        if abs(yp) < eps:
            break
        x1 = x0 - y / yp
        if abs(x1 - x0) <= eps:
            converged = True
            break
        x0 = x1
    if converged:
        return x1
    else:
        raise RuntimeError("Solution did not converge")

Now compile the `@aot` declared Python function using some Numba derived bindings around PIXIE tools so as to create a C-Extension module called `optimiser`.

In [None]:
from pixie_numba_compiler import TranslationUnit, Library

# Create a translation unit
translation_unit = TranslationUnit()
# Add the NR_root function to it
translation_unit.add(NR_root)
# Create library instance
opt_lib = Library('optimiser', (translation_unit,), outdir='.')
# Compile
opt_lib.compile()

# Check it imports
import optimiser

#### Custom user code to achieve "blended compilation":

This defines the custom user code that will consume the modules that have been AOT compiled from the multiple source languages. Binding to code where C is the input language is currently a little more involved but can be achieved through the use of a Numba `@intrinsic`, the objective functions are bound in the following:

In [None]:
def gen_objective_func_callsite(pixie_lib, pysym, pixie_sig):
    @intrinsic
    def bind_call(tyctx, ty_x):
        sig = ty_x(ty_x)
        
        def codegen(cgctx, builder, sig, llargs):
            # This loads the bitcode from the PIXIE dictionary and adds it to the code library
            bitcode = pixie_lib.__PIXIE__['bitcode']
            cgctx.active_code_library.add_llvm_module(llvm.parse_bitcode(bitcode))
            # Create an appropriate LLVM IR function signature
            double_ptr = llvmir.DoubleType().as_pointer()
            fnty = llvmir.FunctionType(llvmir.VoidType(), (double_ptr,) * 2)
            # Fetch the name of the symbol associated with this function
            fn_symbols = pixie_lib.__PIXIE__['symbols'][pysym]
            sym_name = fn_symbols[pixie_sig]['symbol']
            # Create the function in the LLVM IR for this module
            fn = cgutils.get_or_insert_function(builder.module, fnty, sym_name)
            # Stack allocate some space and assign in the values coming from the Python call
            x_ptr = cgutils.alloca_once_value(builder, llargs[0])
            # Allocate a stack slot for the result
            r_ptr = cgutils.alloca_once(builder, llargs[0].type)
            # stage the call to the function
            builder.call(fn, (x_ptr, r_ptr))
            # load and return the result as a value
            return builder.load(r_ptr)
        
        return sig, codegen
    
    # make sure the functions can be used as first-class function types
    # by wrapping them in a `@njit` function. This function can be inlined
    # and doesn't need to be callable from CPython.
    @njit(forceinline=True, no_cpython_wrapper=True)
    def trampoline(x):
        return bind_call(x)

    return trampoline

# Create the binding instances
import objective_functions
f = gen_objective_func_callsite(objective_functions, 'f', 'void(double*, double*)')
dfdx = gen_objective_func_callsite(objective_functions, 'dfdx', 'void(double*, double*)')

The optimiser code that was compiled via the Numba AOT prototype compiler can be more easily accessed as the compiler stored some meta-data into the `__PIXIE__` dictionary about the call site. A `pixie_converter` function uses this meta-data and makes the Numba AOT compiled functions in the `optimiser` module available for calling inside Numba compiled code (and also in the interpreter, they are `@numba.njit` wrapped).

In [None]:
import optimiser
from numba_helpers import pixie_converter
# Convert the PIXIE module functions into ones that Numba can call
numba_optimiser = pixie_converter(optimiser)

The exported symbols appear as a `namedtuple` with members `jit` and `aot` so as to let the user decide how to stage the call, `.jit` for call via `bitcode` and `.aot` for call via symbol.

In [None]:
numba_optimiser.NR_root

Finally, call the `NR_root` function to do root finding for functions `f` and `dfdx` originally from the C code. The `.jit` variant will use the bitcode from the PIXIE library and take part in optimisation at compile time. The `.aot` variant will use the symbol from the PIXIE library and not take part in optimisation at compile time as it is an opaque call.

In [None]:
@njit
def specialized_find_roots(eps=1e-7, max_it=50):
    # call jit version
    result_jit = numba_optimiser.NR_root.jit(f, dfdx, 0.5, eps, max_it)
    # call aot version, but vary input to get a noticably different output
    result_aot = numba_optimiser.NR_root.aot(f, dfdx, 0.8, 1e-3, max_it) 
    return result_jit, result_aot

# Should be ~pi.
print(specialized_find_roots())

This concludes an example of source mixing languages for use in Numba via PIXIE. Unfortunately, due to a current limitation in Numba's handling of first-class function types, there's no real performance gain when using the `.jit` version in comparison to the `.aot` version. However, the next example aims to demonstrate such an effect.

### 6B. "Blending" performance through AOT compiled modules. Example application, a DAXPY-like function call.

The following `@aot` compiled function defines a DAXPY-like (see BLAS libraries) call.

In [None]:
@aot(types.double[::1](types.double[::1], types.double, types.double),)
def daxpy(a, x, y):
    for i in range(len(a)):
        a[i] = a[i] * x + y
    return a

translation_unit = TranslationUnit()
translation_unit.add(daxpy)
daxpy_lib = Library('daxpy', (translation_unit,), outdir='.')
daxpy_lib.compile()

# check it imports
import daxpy

For demonstration purposes only, these functions do a large iteration count loop over operations on a small vector, namely a `DAXPY` call and then a reduction via `sum()`. The `.aot` variant will end up with a call to a symbol in the loop that will prevent a variety of optimisations as well as incurring the cost of a call. The `.jit` variant will have the `DAXPY` call take part in the loop body optimisation and should provide a faster executing loop. The `pixie_converter` is used again to easily let the functions in the PIXIE library be called by Numba.

In [None]:
numba_daxpy = pixie_converter(daxpy)

n = 100000

# define a NumPy variant to use to check the result is correct.
def call_numpy(a, x, y):
    acc = 0.
    for i in range(n):
        a = a * x + y
        acc += a.sum()
    return acc

# define a Numba version that calls the `.aot` version.
@njit
def call_aot(a, x, y):
    acc = 0.
    for i in range(n):
        numba_daxpy.daxpy.aot(a, x, y) # call via symbol
        acc += a.sum()
    return acc

# define a Numba version that calls the `.jit` version.
@njit
def call_jit(a, x, y):
    acc = 0.
    for i in range(n):
        numba_daxpy.daxpy.jit(a, x, y) # call via bitcode, should in-line and be optimised
        acc += a.sum()
    return acc

# Generates inputs (a, x, y)
def gen_input(n):
    return np.arange(1, n + 1).astype(np.float64), .1, .2

First check that the implementations all produce the same result:

In [None]:
def check():
    n = 10
    np_args = gen_input(n)
    expected = call_numpy(*np_args)
    aot_args = gen_input(n)
    aot_result = call_aot(*aot_args)
    jit_args = gen_input(n)
    jit_result = call_jit(*jit_args)
    
    np.testing.assert_allclose(expected, aot_result)
    np.testing.assert_allclose(expected, jit_result)
    
# check the results!
check()

Choose a deliberately short "internal" vector size such that the call via symbol is disproportionately expensive.

In [None]:
short_vector_len = 4

Check the performance of the implementations... (approximate scale of performance results are noted in brackets).

In [None]:
args = gen_input(short_vector_len)
%timeit call_numpy(*args) # (typically "X" time units)

In [None]:
args = gen_input(short_vector_len)
%timeit call_aot(*args) # (typically "X/100" time units)

In [None]:
args = gen_input(short_vector_len)
%timeit call_jit(*args) # (typically "X/1000" time units)

### This concludes the demonstration of the current state of the PIXIE project.

Feedback is welcomed, please do open issues or post on Numba's discourse forum!