# Reductions

ironArray supports a broad range of reduction facilities, like `sum`, `min`, `max`, `mean` and others.  Also, they work on any (or group of) dimensions.  One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although some times they need some help from the user).

In order to exercise some of this functionality, let's use the precipitation data from a period of 3 months.  In case the data has not been downloaded yet, this should not take more than a couple of minutes.  Let's go:

In [1]:
import numpy as np
import iarray as ia

In [2]:
%run fetch_data.py

Dataset precip-3m.iarr is already here!


In [3]:
!du -sh precip*.iarr
!du -sh precip*.zarr

333M	precip1.iarr
280M	precip2.iarr
268M	precip3.iarr
788M	precip-3m.iarr
966M	precip-3m-optimal.iarr
363M	precip1.zarr
357M	precip2.zarr
355M	precip3.zarr
1,1G	precip-3m.zarr


The whole dataset is stored now on a single file of less than 1 GB, which is about 10x less than the original dataset thanks to compression.  That's a big win!  In addition there is an assortment of other, smaller files for the purposes of tutorials.  Also, and for comparison purposes, you will find the datasets in Zarr format, another popular array format.



Ok.  Now, let's import this data into ironArray before proceeding with reductions:

In [4]:
%%time

ia_precip = ia.load("precip-3m.iarr")
print(ia_precip)
print("cratio: ", round(ia_precip.cratio, 2))

<IArray (3, 720, 721, 1440) np.float32>
cratio:  13.17
CPU times: user 68.5 ms, sys: 383 ms, total: 451 ms
Wall time: 409 ms


Ok, so ironArray achieves a compression ratio of more than 10x, which is a big win in terms of memory consumption.  Now, let's have a look at how reduction works:

In [5]:
%%time
reduc0 = ia.mean(ia_precip, axis=(0, 2, 3)).data

CPU times: user 1min 51s, sys: 987 ms, total: 1min 52s
Wall time: 5.71 s


Ok, so that's pretty slow.  Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory.  In this case, we have been traversing the array in a very innefficient way.  In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order.  With this in mind, let's try with a more reasonable order:

In [6]:
%%time
reduc0 = ia.mean(ia_precip, axis=(3, 2, 0)).data

CPU times: user 8.3 s, sys: 198 ms, total: 8.5 s
Wall time: 672 ms


Ok, that's much better.  Now, for the matter of comparsion, let's do the same computation with NumPy.  Let's first get the NumPy array from the ironArray one:

In [7]:
%%time
precip = ia_precip.data

CPU times: user 10.7 s, sys: 1.85 s, total: 12.5 s
Wall time: 4.18 s


and now the actual reduction:

In [8]:
%%time
reduc1 = np.mean(precip, axis=(3, 2, 0))

CPU times: user 504 ms, sys: 0 ns, total: 504 ms
Wall time: 504 ms


In [9]:
np.testing.assert_almost_equal(reduc0, reduc1)

So, by default, ironArray provides good enough performance out of the box.  If you need to compute this sort of reductions lots of times, you may want to fine tune your ironArray array parameters.  If so, just keep reading.  But before proceeding further, let's save our data for later reuse:

## Optimization Tips

As we know, most of ironArray optimization stems from its two levels of partitioning.  Previously, we have been using automatic values for chunks and blocks (based on CPU's cache sizes).  But for maximum speed, there is no replacement for fine tuning chunk and block shapes manually.

Let's start by loading the array that we previously stored on-disk:

In [10]:
%%time
import iarray as ia
ia_precip = ia.open("precip-3m.iarr")

CPU times: user 0 ns, sys: 23.7 ms, total: 23.7 ms
Wall time: 22.7 ms


Some milliseconds is not too much for loading a 9 GB large array from disk.  It turns out that `open()` just loads data when it needs it by default (but for a full in-memory load, see the `load()` function).  So, only a tiny portion of the file (the metadata) is read in order to figure out how to access the data.

Now, let's start with a first attempt to optimize the speed of the reduction by favoring it when creating another container:

In [11]:
%%time
ia.set_config(favor=ia.Favors.SPEED)
new_precip = ia_precip.copy()

CPU times: user 96.3 ms, sys: 334 ms, total: 431 ms
Wall time: 390 ms


And let's try the reduction with the new array:

In [12]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 8.18 s, sys: 203 ms, total: 8.38 s
Wall time: 663 ms


Well, we did not had a lot of improvement here, but in general you should allways try to use the `favor=ia.Favors.SPEED` setting when looking for speed.

Another path for improvement is to fine tune the chunk and block shapes. Let's see the current chunk and block shapes:

In [13]:
print(ia_precip.chunks)
print(ia_precip.blocks)

(1, 128, 128, 256)
(1, 16, 32, 64)


After several iterations trying different chunk shapes and block shapes, one can reach a sort of optimal configuration:

In [14]:
%%time
with ia.config(chunks=(1, 360, 128, 1440), blocks=(1, 8, 8, 720)):
    new_precip = ia_precip.copy()


CPU times: user 13.7 s, sys: 4.41 s, total: 18.1 s
Wall time: 8.57 s


In [15]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 5.22 s, sys: 23.9 ms, total: 5.25 s
Wall time: 326 ms


So, this new time is quite good (somewhat faster than NumPy actually).  That means that, with a bit of manual tuning, you can end with quite fast computations with ironArray.  And most specially, using (potentially much) less memory.

## Expressions with on-disk operands

So far we have been playing with expressions on in-memory operands, but ironArray can seamlessly work with on-disk arrays too.  Let's save the previous array:

In [16]:
%%time
ia.save("precip-3m-optimal.iarr", new_precip)

CPU times: user 1.54 ms, sys: 0 ns, total: 1.54 ms
Wall time: 1.47 ms


and let's re-open again but in 'lazy' mode and do the reduction from disk:

In [17]:
new_precip_opt = ia.open("precip-3m-optimal.iarr")
new_precip_opt

<IArray (3, 720, 721, 1440) np.float32>

In [18]:
%%time
reduc1 = ia.mean(new_precip_opt, axis=(3, 2, 0)).data

CPU times: user 5.68 s, sys: 994 ms, total: 6.67 s
Wall time: 806 ms


So, although this on-disk time is considerable slower than the in-memory one, it is still not bad at all.  Of course, the filesystem cache is probably saving quite a bit of time, although modern SSDs can improve speeds quite significantly.  This is interesting because that means that doing out-of-core operations will consume far less memory, while still keeping reasonable times.



