# Expression Evaluation (User Defined Functions)

So far we have seen that ironArray has support for evaluating expressions that are passed as strings or as simple Python statements.  There is another, more flexible way for evaluating expressions called User Defined Functions, or UDFs for short.

UDFs are small functions that can be expressed in a simple subset of Python.  These functions are then passed to the internal LLVM compiler in ironArray and a binary specific and optimized for the local machine is generated.  This binary is optimized for the CPU and in addition, it will make use of the [Intel SVML library](https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-short-vector-math-library-operations/overview-intrinsics-for-short-vector-math-library-svml-functions.html) for accelerating the evaluation of transcendental functions.

Let's see how this works.

In [1]:
%load_ext memprofiler
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import iarray as ia

In [2]:
%%time
precip1 = ia.load("precip1.iarr")
precip2 = ia.load("precip2.iarr")
precip3 = ia.load("precip3.iarr")

CPU times: user 106 ms, sys: 201 ms, total: 308 ms
Wall time: 308 ms


Now, let's define a simple function that computes the mean for this data:

In [3]:
from iarray.udf import jit, Array, float32, float64

@jit()
def mean(out: Array(float32, 3),
         p1: Array(float32, 3),
         p2: Array(float32, 3),
         p3: Array(float32, 3)) -> int:

    l = out.shape[0]
    m = out.shape[1]
    n = out.shape[2]

    for i in range(l):
        for j in range(m):
            for k in range(n):
                value = p1[i,j,k] + p2[i,j,k] + p3[i,j,k]
                out[i,j,k] = value / 3

    return 0

This function is known as a _User Defined Function_ (_UDF_ for short) and it has a syntax that is a small subset of Python.  The order of the parameters passed matters: first comes the output and then a variable number of inputs (3 in this case).  The type of the parameters is always an `Array`, where you specify the data type (currently `float32` or `float64`), and the dimensions (3 in this case).

Finally, you can make use of the `shape` attribute of the parameters so as to access the _window_ to which the UDF will apply.  It is important to have in mind that the parameters do not have access to whole arrays passed to the function, but only to one part (also known as _window_).  Later we will see an example where we will work more explictly with windows.

Let's create the ironArray expression from this User Defined Function with:

In [4]:
%%time
precip_expr = ia.expr_from_udf(mean, [precip1, precip2, precip3])

CPU times: user 23.2 ms, sys: 4.46 ms, total: 27.7 ms
Wall time: 27.5 ms


As can be seen, converting the user defined function into a native ironArray expression is pretty fast.  And as always, in order to do the actual evaluation, we have to call `.eval()` on the expression:

In [5]:
%%mprof_run iarray-mean
precip_mean = precip_expr.eval()
precip_mean

<IArray (720, 721, 1440) np.float32>

memprofiler: used 496.22 MiB RAM (peak of 496.22 MiB) in 0.7525 s, total RAM usage 1376.03 MiB


Let's compare this time with the evaluation via a regular lazy expression:

In [6]:
precip_expr2 = (precip1 + precip2 + precip3) / 3

In [7]:
%%mprof_run iarray-mean2
precip_mean2 = precip_expr2.eval()
precip_mean2

<IArray (720, 721, 1440) np.float32>

memprofiler: used 393.84 MiB RAM (peak of 393.84 MiB) in 0.7497 s, total RAM usage 1769.88 MiB


Ok, so the times and memory consumption are very close.  It turns out that UDFs compile and execute in ironArray using the very same LLVM machinery, which explains times being similar.  It is up to the user to decide to use one or the other depending on the needs.

User Defined Function also have access to a good assortment of math functions, and you can access them via the usual Python `math` module. Let's see an example for our dataset, and although this does not make much sense for precipitation data, we can use this as an indication of the efficiency of the computational engine inside ironArray:

In [8]:
import math

@jit()
def trans(out: Array(float32, 3),
          p1: Array(float32, 3),
          p2: Array(float32, 3),
          p3: Array(float32, 3)) -> int:

    l = out.shape[0]
    m = out.shape[1]
    n = out.shape[2]

    for i in range(l):
        for j in range(m):
            for k in range(n):
                value = math.sin(p1[i,j,k]) * math.sin(p2[i,j,k]) + math.cos(p2[i,j,k])
                value *= math.tan(p1[i,j,k])
                value += math.sqrt(p3[i,j,k]) * 2
                out[i,j,k] = value

    return 0

In [9]:
%%time
precip_expr = ia.expr_from_udf(trans, [precip1, precip2, precip3])

CPU times: user 19.7 ms, sys: 4.94 ms, total: 24.6 ms
Wall time: 24.5 ms


In [10]:
%%mprof_run iarray-trans
precip_mean = precip_expr.eval()
precip_mean

<IArray (720, 721, 1440) np.float32>

memprofiler: used 650.31 MiB RAM (peak of 650.31 MiB) in 1.1076 s, total RAM usage 2420.74 MiB


In this case we see that the overhead of using transcendental functions is pretty the same than plain arithmetic operations (sum, rest, mult, division...).  This is a very significant fact because traditionally transcendental functions took really long time compared with plain arithmetic; not anymore thanks to the combination of LLVML and Intel SVML.  The good mix between compiler optimization (via LLVM) and SIMD usage (via SVML) makes this couple shine.

For the sake of comparison, let's compute the same expression with NumPy:

In [11]:
%%time
p1_ = precip1.data
p2_ = precip2.data
p3_ = precip3.data

CPU times: user 10.5 s, sys: 1.87 s, total: 12.4 s
Wall time: 4.42 s


In [12]:
%%mprof_run np_trans
np_result = (np.tan(p1_) * (np.sin(p1_) * np.sin(p2_) + np.cos(p2_)) + np.sqrt(p3_) * 2)

memprofiler: used 2852.12 MiB RAM (peak of 8555.11 MiB) in 8.4781 s, total RAM usage 13842.86 MiB


This is really slow, but this is kind of expected because NumPy does not have support for SVML or multithreading (at this time at least), and we all know that transcendental functions always took quite a lot to execute on a regular CPU.

## Working with data windows

We already mentioned that User Defined Functions can access only a part (a window) of the dataset.  Here we will have a more in deep look at how this works and how to squeeze all the functionality out of it.

Let's start by creating an zeroed array, and let's populate it with a UDF:

In [13]:
e = ia.zeros((10, 10), chunks=(5, 5), blocks=(2, 2), dtype=np.float64)

The UDF used:

In [14]:
@jit
def fill_diag(out: Array(float64, 2), in1: Array(float64, 2)) -> int:
    n = out.shape[0]
    m = out.shape[1]
    start_n = out.window_start[0]
    start_m = out.window_start[1]
    for i in range(n):
        for j in range(m):
            out[i, j] = 1. if i + start_n == j + start_m else in1[i, j]

    return 0

In [15]:
fill_expr = ia.expr_from_udf(fill_diag, [e])
result = fill_expr.eval()

In [16]:
print(result.data)

[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
