# Expression Evaluation (User Defined Functions)

So far we have seen that ironArray has support for evaluating expressions that are passed as strings or as simple Python statements.  There is another, more flexible way for evaluating expressions called User Defined Functions, or UDFs for short.

UDFs are small functions that can be expressed in a simple subset of Python.  These functions are then passed to the internal compiler in ironArray and a binary specific and optimized for the local machine is generated.  This binary is optimized for the CPU and in addition, it will make use of the available SIMD hardware in the CPU for accelerating transcendental functions.

Let's see how this works.

In [1]:
%load_ext memprofiler
%matplotlib inline
import numpy as np
import iarray as ia

In [2]:
%%time
precip1 = ia.load("precip1.iarr")
precip2 = ia.load("precip2.iarr")
precip3 = ia.load("precip3.iarr")

CPU times: user 357 ms, sys: 853 ms, total: 1.21 s
Wall time: 3.06 s


Now, let's define a simple function that computes the mean for this data:

In [3]:
from iarray.udf import jit, Array, float32

@jit()
def mean(out: Array(float32, 3),
         p1: Array(float32, 3),
         p2: Array(float32, 3),
         p3: Array(float32, 3)) -> int:

    l = p1.window_shape[0]
    m = p1.window_shape[1]
    n = p1.window_shape[2]

    for i in range(l):
        for j in range(m):
            for k in range(n):
                value = p1[i,j,k] + p2[i,j,k] + p3[i,j,k]
                out[i,j,k] = value / 3

    return 0

and create the ironArray expression from this User Defined Function with:

In [4]:
%%time
precip_expr = ia.expr_from_udf(mean, [precip1, precip2, precip3])

CPU times: user 45.8 ms, sys: 7.09 ms, total: 52.9 ms
Wall time: 58.1 ms


As can be seen, converting the user defined function into a native ironArray expression is pretty fast.  And as always, in order to do the actual evaluation, we have to call `.eval()` on the expression:

In [5]:
%%mprof_run iarray::mean_UDF
precip_mean = precip_expr.eval()
precip_mean

<IArray (720, 721, 1440) np.float32>

memprofiler: used 1352.39 MiB RAM (peak of 1352.39 MiB) in 8.2615 s, total RAM usage 1771.35 MiB


Let's compare this time with the evaluation via a regular lazy expression:

In [6]:
precip_expr2 = (precip1 + precip2 + precip3) / 3

In [7]:
%%mprof_run iarray::mean_lazy
precip_mean2 = precip_expr2.eval()
precip_mean2

<IArray (720, 721, 1440) np.float32>

memprofiler: used -202.64 MiB RAM (peak of 45.60 MiB) in 9.5278 s, total RAM usage 1326.20 MiB


Ok, so the times are very close.  It turns out that UDFs compile and execute in ironArray using the very same internal compiler, which explains times being similar.  It is up to the user to use one or the other depending on the needs.

Let's see how NumPy does on this:

In [8]:
%%mprof_run
np_precip1 = precip1.data
np_precip2 = precip2.data
np_precip3 = precip3.data

memprofiler: used 3778.74 MiB RAM (peak of 3804.30 MiB) in 17.7812 s, total RAM usage 4763.00 MiB


In [9]:
%%mprof_run numpy::mean
np_result = (np_precip1 + np_precip2 + np_precip3) / 3

memprofiler: used 160.56 MiB RAM (peak of 2538.25 MiB) in 49.0238 s, total RAM usage 3968.13 MiB


## Transcendental functions in User Defined Functions

Now, let's use expressions with some transcendental functions.  This does not make sense for this case (precipitation data), but we are doing this just as an indication of the efficiency of the computational engine inside ironArray:

In [10]:

import math

@jit()
def trans(out: Array(float32, 3),
          p1: Array(float32, 3),
          p2: Array(float32, 3),
          p3: Array(float32, 3)) -> int:

    l = p1.window_shape[0]
    m = p1.window_shape[1]
    n = p1.window_shape[2]

    for i in range(l):
        for j in range(m):
            for k in range(n):
                value = math.sin(p1[i,j,k]) * math.sin(p2[i,j,k]) + math.cos(p2[i,j,k])
                value *= math.tan(p1[i,j,k])
                value += math.cosh(p3[i,j,k]) * 2
                out[i,j,k] = value

    return 0

In [11]:
%%time
precip_expr = ia.expr_from_udf(trans, [precip1, precip2, precip3])

CPU times: user 46.7 ms, sys: 23.9 ms, total: 70.6 ms
Wall time: 110 ms


In [12]:
%%mprof_run iarray::trans_UDF
precip_mean = precip_expr.eval()
precip_mean

<IArray (720, 721, 1440) np.float32>

memprofiler: used -306.43 MiB RAM (peak of 0.00 MiB) in 10.6055 s, total RAM usage 2529.17 MiB


In this case we see that the overhead of using transcendental functions is pretty low compared with plain arithmetic operations (sum, rest, mult, division...).

Let's see how a regular lazy expression behaves:

In [13]:
%%mprof_run iarray::trans_lazy
lazy_expr = ia.tan(precip1) * (ia.sin(precip1) * ia.sin(precip2) + ia.cos(precip2)) + ia.sqrt(precip3) * 2
lazy_result = lazy_expr.eval()
lazy_result

<IArray (720, 721, 1440) np.float32>

memprofiler: used 583.92 MiB RAM (peak of 583.92 MiB) in 14.4773 s, total RAM usage 2609.19 MiB


Let's compare this against NumPy:

In [14]:
%%mprof_run numpy::trans
p1_ = np_precip1
p2_ = np_precip2
p3_ = np_precip3
np_result = (np.tan(p1_) * (np.sin(p1_) * np.sin(p2_) + np.cos(p2_)) + np.sqrt(p3_) * 2)

memprofiler: used 633.16 MiB RAM (peak of 4987.94 MiB) in 167.7880 s, total RAM usage 3242.38 MiB


Ok, this is really slow, but this is kind of expected, as ironArray comes with support for evaluating transcendental functions using the existing SIMD capabilities in the CPU.

## Resource consumption

As a summary, let's do a plot on the speed for the different kind of computations.  First for a regular mean:

In [15]:
%mprof_plot iarray::mean_UDF iarray::mean_lazy numpy::mean -t "Mean computation"

And here the resouce consumption for the transcendental expression:

In [16]:
%mprof_plot iarray::trans_UDF iarray::trans_lazy numpy::trans -t "Transcendental expression"

Of course, User Defined Functions not only run much faster than NumPy, but also very similar in speed (if not faster) than regular expressions in ironArray.  UDFs are a very powerful feature of ironArray, so make sure that you leverage them.
