# Circuit Compilers Advanced Topics

Recall a circuit compiler is a function that takes an arithmetic circuit and compiles it into a program, specifically a `RawProgram` object.

There are several circuit compilers provided by CK, and it is possible to write custom circuit compilers. A circuit compiler is a callable with the signature:
```
def my_circuit_compiler(
    *result: CircuitNode,
    input_vars: InputVars = InferVars.ALL,
    circuit: Optional[Circuit] = None,
) -> RawProgram:
```

That is, the callable takes zero or more arguments, the circuit result nodes, and optional keyword arguments. The results nodes must be from the same circuit.

Parameter `input_vars` specifies how to determine the function input variables. Default is to use all circuit variables, in index order. Other options are documented in the module `ck.circuit_compiler.support.input_vars`.

Parameter `circuit` is rarely needed as each result node keeps track of the circuit it belongs to. However, in some circumstances, when there are no result nodes, the circuit needs to be provided. If the `circuit` parameter is used, then the supplied circuit must be the same as that of the result nodes.


CK provides many circuit compilers, each using different algorithms. Each provided circuit compiler is a `NamedCircuitCompiler` enum member.

Here are the named circuit compilers, which are explained in the next sections.

In [1]:
from ck.circuit_compiler import NamedCircuitCompiler

for compiler in NamedCircuitCompiler:
    print(compiler.name)

LLVM_STACK
LLVM_TMPS
LLVM_VM
CYTHON_VM
INTERPRET


## LLVM_STACK
Use the LLVM compiler to compile to a native binary function, where no temporary working memory is explicitly allocated at compile time or requested at run time. All temporary variables are allocated on the stack as determined by the LLVM compiler.

This compiler creates an extremely efficient run time. However, the compile time can be prohibitive for even moderately sized circuits.

## LLVM_TMPS
Use the LLVM compiler to compile to a native binary function, where  temporary working memory is allocated at compile time.

##  LLVM_VM
Use the LLVM compiler to compile a virtual CPU as native binary function, where  instructions for the virtual CPU are determined by traversing the circuit and stored as a constant array by the LLVM compiler.

This compiler creates a moderately efficient run time. The compile times can be significantly better than `LLVM_STACK` and `LLVM_TMPS`.

##  CYTHON_VM
Use a Cython implementation of a virtual CPU as native binary function, where  instructions for the virtual CPU are determined by traversing the circuit and provided to the Cythonised virtual CPU by the raw program.

This compiler creates a moderately efficient run time. The compile times are generally very fast, and are significantly better than LLVM compilers.

##  INTERPRET
Use a Python implementation of a virtual CPU as native binary function, where  instructions for the virtual CPU are determined by traversing the circuit and provided to the virtual CPU by the raw program.

This compiler creates an inefficient run time, but is easy to inspect and debug (As it is Python). The compile times are generally very fast.

Here is a demonstration of the named circuit compilers. This code shows the compile time and program execution time for each compiler, using a circuit created from an example PGM.

In [2]:
import timeit
from ck.example import Insurance
from ck.pgm_compiler import DEFAULT_PGM_COMPILER
from ck.pgm_circuit import PGMCircuit
from ck.circuit import CircuitNode
from ck.program.program_buffer import ProgramBuffer

pgm = Insurance()
pgm_cct: PGMCircuit = DEFAULT_PGM_COMPILER(pgm)
top: CircuitNode = pgm_cct.circuit_top

print('  Compiler  Compile-time   Run-time')
print('  --------  ------------   --------')
for compiler in NamedCircuitCompiler:
    # Time compilation
    start_time = timeit.default_timer()
    raw_program = compiler(top)
    stop_time = timeit.default_timer()
    compile_time = (stop_time - start_time) * 1000  # as milliseconds

    # Time c running the program
    program = ProgramBuffer(raw_program)
    start_time = timeit.default_timer()
    program.compute()
    stop_time = timeit.default_timer()
    run_time = (stop_time - start_time) * 1000  # as milliseconds

    print(f'{compiler.name:>10}  {compile_time:10.3f}ms {run_time:8.3f}ms')


  Compiler  Compile-time   Run-time
  --------  ------------   --------


LLVM_STACK    2655.639ms    0.014ms


 LLVM_TMPS    3057.663ms    0.071ms
   LLVM_VM      92.588ms    0.178ms
 CYTHON_VM      49.203ms    0.155ms


 INTERPRET      39.676ms   18.790ms


It is possible to dump a raw program for debugging and demonstration purposes. Here are some simple examples.

In [3]:
from ck.circuit import Circuit

cct = Circuit()
a, b, c, d = cct.new_vars(4)
top = a * b + c * d + 56.23

raw_program = NamedCircuitCompiler.LLVM_STACK(top)
raw_program.dump()

LLVMRawProgram
signature = [4] -> [1]
temps = 0
dtype = <class 'ctypes.c_double'>
var_indices = (0, 1, 2, 3)
optimisation level = 2
LLVM program size = 24
LLVM program:
  ; ModuleID = ""
  target triple = "unknown-unknown-unknown"
  target datalayout = ""
  
  define void @"main"(double* %".1", double* %".2", double* %".3")
  {
  entry:
    %".5" = getelementptr double, double* %".1", i32 0
    %".6" = load double, double* %".5"
    %".7" = getelementptr double, double* %".1", i32 1
    %".8" = load double, double* %".7"
    %".9" = fmul double %".6", %".8"
    %".10" = getelementptr double, double* %".1", i32 2
    %".11" = load double, double* %".10"
    %".12" = getelementptr double, double* %".1", i32 3
    %".13" = load double, double* %".12"
    %".14" = fmul double %".11", %".13"
    %".15" = fadd double %".9", %".14"
    %".16" = fadd double %".15", 0x404c1d70a3d70a3d
    %".17" = getelementptr double, double* %".3", i32 0
    store double %".16", double* %".17"
    ret void
  

In [4]:
raw_program = NamedCircuitCompiler.LLVM_TMPS(top)
raw_program.dump()

LLVMRawProgram
signature = [4] -> [1]
temps = 3
dtype = <class 'ctypes.c_double'>
var_indices = (0, 1, 2, 3)
optimisation level = 0
LLVM program size = 36
LLVM program:
  ; ModuleID = ""
  target triple = "unknown-unknown-unknown"
  target datalayout = ""
  
  define void @"main"(double* %".1", double* %".2", double* %".3")
  {
  entry:
    %".5" = getelementptr double, double* %".1", i32 0
    %".6" = load double, double* %".5"
    %".7" = getelementptr double, double* %".1", i32 1
    %".8" = load double, double* %".7"
    %".9" = fmul double %".6", %".8"
    %".10" = getelementptr double, double* %".2", i32 0
    store double %".9", double* %".10"
    %".12" = getelementptr double, double* %".1", i32 2
    %".13" = load double, double* %".12"
    %".14" = getelementptr double, double* %".1", i32 3
    %".15" = load double, double* %".14"
    %".16" = fmul double %".13", %".15"
    %".17" = getelementptr double, double* %".2", i32 1
    store double %".16", double* %".17"
    %".19" 

In [5]:
raw_program = NamedCircuitCompiler.LLVM_VM(top)
raw_program.dump()

LLVMRawProgramWithArrays
signature = [4] -> [1]
temps = 3
dtype = <class 'ctypes.c_double'>
var_indices = (0, 1, 2, 3)
optimisation level = 2
LLVM program size = 130
LLVM program:
  ; ModuleID = ""
  target triple = "unknown-unknown-unknown"
  target datalayout = ""
  
  define void @"main"(double* %".1", double* %".2", double* %".3")
  {
  entry:
    %".5" = load double*, double** @"consts"
    %".6" = load i8*, i8** @"instructions"
    %"idx" = alloca i8
    %"num_args" = alloca i8
    %"accumulator" = alloca double
    %"arrays" = alloca double*, i32 4
    %".7" = getelementptr double*, double** %"arrays", i8 0
    store double* %".1", double** %".7"
    %".9" = getelementptr double*, double** %"arrays", i8 1
    store double* %".2", double** %".9"
    %".11" = getelementptr double*, double** %"arrays", i8 2
    store double* %".3", double** %".11"
    %".13" = getelementptr double*, double** %"arrays", i8 3
    store double* %".5", double** %".13"
    store i8 0, i8* %"idx"
    br 

In [6]:
raw_program = NamedCircuitCompiler.CYTHON_VM(top)
raw_program.dump()

CythonRawProgram
signature = [4] -> [1]
temps = 3
dtype = <class 'numpy.float64'>
var_indices = (0, 1, 2, 3)
number of instructions = 4


In [7]:
raw_program = NamedCircuitCompiler.INTERPRET(top)
raw_program.dump()

InterpreterRawProgram
signature = [4] -> [1]
temps = 4
dtype = <class 'numpy.float64'>
var_indices = (0, 1, 2, 3)
number of instructions = 4
instructions:
  tmp[0] = mul var[0] var[1]
  tmp[1] = mul var[2] var[3]
  tmp[2] = sum tmp[0] tmp[1]
  result[0] = sum tmp[2] 56.23


The default circuit compiler is available as `DEFAULT_CIRCUIT_COMPILER`, which is a `NamedCircuitCompiler` enum member.

In [8]:
from ck.circuit_compiler import DEFAULT_CIRCUIT_COMPILER

DEFAULT_CIRCUIT_COMPILER.name

'CYTHON_VM'