# Expressions

ironArray has a strong support for expression evaluation.  Things like sums, products, divisions or a pretty complete range of transcendental functions (e.g. `exp`, `sin`, `asin`, `tanh`...).  Fast evaluation of (large) arrays is one of the features that received more love during the development.  Performance comes from a balance between:

1. Use of [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html) for accelerating transcendental functions.

2. Use of [Intel SVML](https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-short-vector-math-library-operations/overview-intrinsics-for-short-vector-math-library-svml-functions.html) for computing vector math functions.

3. Use of multi-threading capabilities.

4. Leveraging the 2-level partitioning in ironArray arrays so that most of the computation intensive happens inside private caches (L1, L2), which benefits multi-threading performance.

You can access to the powerful evaluation capabilities in ironArray in different ways, which we are going to succinctly expose in this tutorial.  In order to do that, we are going to make use of the dataset that we created in our reductions tutorial.  Let's go:

In [1]:
%%time
import iarray as ia
ia_precip = ia.load("precip-3m.iarr")

CPU times: user 395 ms, sys: 117 ms, total: 512 ms
Wall time: 686 ms


Now, in order to evaluate some expressions on this, let's put the data for each month on a different array:

In [2]:
print(ia_precip.chunkshape)
print(ia_precip.blockshape)
ia_precip

(1, 64, 64, 128)
(1, 16, 16, 64)


<IArray (3, 720, 721, 1440) np.float32>

In [3]:
%%time
precip1 = ia_precip[0].copy()
precip2 = ia_precip[1].copy()
precip3 = ia_precip[2].copy()

CPU times: user 19.6 s, sys: 5.41 s, total: 25 s
Wall time: 20.8 s


With that, let's compute something easy, like a new array with the mean of these.  For that, we are going to use the internal evaluation engine:

In [4]:
%%time

precip_expr = ia.expr_from_string("(p1 + p2 + p3) / 3", {'p1': precip1, 'p2': precip2, 'p3': precip3})
precip_expr

CPU times: user 18.8 ms, sys: 2.16 ms, total: 21 ms
Wall time: 21 ms


<iarray.expression.Expr at 0x7f9090b03900>

Ok, that was fast, but we did not quite evaluate anything yet; we rather build the expression to compute.  This way you can combine different expressions (made of possible views) and delay the actual evaluation only when necessary (TODO: is that really possible?).

In order to do the actual evaluation, we have to call `.eval()` on a expression:

In [5]:
%%time

precip_mean = precip_expr.eval()
precip_mean

CPU times: user 10.5 s, sys: 2.68 s, total: 13.1 s
Wall time: 1.64 s


<IArray (720, 721, 1440) np.float32>

Cool, so we have our first evaluation done.  But let's see how its performance fares against NumPy, and especially whether the outcome is correct:

In [6]:
np_precip1 = precip1.data
np_precip2 = precip2.data
np_precip3 = precip3.data

In [7]:
%%time
np_precip_mean = (np_precip1 + np_precip2 + np_precip3) / 3

CPU times: user 1.22 s, sys: 566 ms, total: 1.79 s
Wall time: 1.79 s


So, ironArray times are quite competitive with NumPy (several times faster, in fact).  How about the correctness of the outcome?.  Let's see:

In [8]:
import numpy as np
np.testing.assert_almost_equal(np_precip_mean, precip_mean.data)

Cool, results are the same. That means that, generally speaking, ironArray can go faster than NumPy operations, even if the former are compressed.  But let's see how it performs against a parallel evaluation engine like `numexpr`:

In [9]:
%%time
import numexpr as ne
np_precip_mean = ne.evaluate("(p1 + p2 + p3) / 3", {'p1': np_precip1, 'p2': np_precip2, 'p3': np_precip3})

CPU times: user 3.66 s, sys: 14.6 s, total: 18.3 s
Wall time: 2.95 s


As you see, ironArray performance is very close to numexpr too and, again, remember that ironArray is dealing with compressed data transparently.

Now, let's use expressions with some transcendental functions.  This does not make sense for precipitation data, but just as an indication of the speed of ironArray in this area too:

In [10]:
%%time

result = ia.expr_from_string("(tan(p1) * (sin(p1) * sin(p2) + cos(p2)) + sqrt(p3) * 2)",
                             {'p1': precip1, 'p2': precip2, 'p3': precip3}
                             ).eval()
result


CPU times: user 18.3 s, sys: 4.66 s, total: 22.9 s
Wall time: 2.72 s


<IArray (720, 721, 1440) np.float32>

and let's compare this against NumPy:

In [11]:
%%time
#import numpy as np
p1_ = np_precip1
p2_ = np_precip2
p3_ = np_precip3
np_result = (np.tan(p1_) * (np.sin(p1_) * np.sin(p2_) + np.cos(p2_)) + np.sqrt(p3_) * 2)

CPU times: user 12.1 s, sys: 7.69 s, total: 19.8 s
Wall time: 20.1 s


Ok, this is really slow.  Let's see how numexpr performs in this case:

In [12]:
%%time
#import numexpr as ne
#ne.set_num_threads(0)
np_result = ne.evaluate("(tan(p1) * (sin(p1) * sin(p2) + cos(p2)) + sqrt(p3) * 2)",
                        {'p1': np_precip1, 'p2': np_precip2, 'p3': np_precip3})

CPU times: user 11 s, sys: 1.61 s, total: 12.6 s
Wall time: 1.79 s


Again, this time is close to what we are getting with ironArray.  Do not forget to check for correctness:

In [13]:
import numpy as np
np.testing.assert_almost_equal(np_result, np_result.data)


TODO: Introduce lazy expressions and UDFs, and an 'Optimization tips' section...

TODO: add a comparison with xarray or dask, for out-of-core reductions