# Reductions

ironArray supports a broad range of reduction facilities, like `sum`, `min`, `max`, `mean` and others.  Also, they work on any (or group of) dimensions.  One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although some times they need some help from the user).

In order to exercise some of this functionality, let's use the precipitation data from a period of 3 months.  In case the data has not been downloaded yet, this should not take more than a couple of minutes.  Let's go:

In [10]:
import numpy as np
import iarray as ia

In [8]:
%run fetch_data.py

Dataset precip-3m.iarr is already here!


The whole dataset is stored now on a single file of about 500 MB, which is about 20x less than the original dataset thanks to compression.  That's a big win!  In addition there is an assortment of other, smaller files for the purposes of tutorials.  Also, and for comparison purposes, you will find the datasets in Zarr format, another popular array format.



In [6]:
!du -sh precip*.iarr
!du -sh precip*.zarr

1.1G	precip-3m-optimal.iarr
512M	precip-3m.iarr
176M	precip1.iarr
176M	precip2.iarr
224M	precip3.iarr
474M	precip-3m.zarr
162M	precip1.zarr
158M	precip2.zarr
157M	precip3.zarr


Ok.  Now, let's import this data into ironArray before proceeding with reductions:

In [7]:
%%time

ia_precip = ia.load("precip-3m.iarr")
print(ia_precip)
print("cratio: ", round(ia_precip.cratio, 2))

<IArray (3, 720, 721, 1440) np.float32>
cratio:  20.9
CPU times: user 124 ms, sys: 202 ms, total: 326 ms
Wall time: 325 ms


Ok, so ironArray achieves a compression ratio of around 20x, which is a big win in terms of memory consumption.  Now, let's have a look at how reduction works:

In [9]:
%%time
reduc0 = ia.mean(ia_precip, axis=(0, 2, 3)).data

CPU times: user 5min 43s, sys: 3.42 s, total: 5min 47s
Wall time: 42.8 s


Ok, so that's pretty slow.  Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory.  In this case, we have been traversing the array in a very innefficient way.  In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order.  With this in mind, let's try with a more reasonable order:

In [10]:
%%time
reduc0 = ia.mean(ia_precip, axis=(3, 2, 0)).data

CPU times: user 12 s, sys: 614 ms, total: 12.6 s
Wall time: 1.68 s


Ok, that's much better.  This time is pretty competitive.  Now, let's compare exactly the same computation with NumPy.  Let's first get the NumPy array from the ironArray one:

In [11]:
%%time
precip = ia_precip.data

CPU times: user 16 s, sys: 2.19 s, total: 18.2 s
Wall time: 7.2 s


and now the actual reduction:

In [12]:
%%time
reduc1 = np.mean(precip, axis=(3, 2, 0))

CPU times: user 809 ms, sys: 929 ms, total: 1.74 s
Wall time: 1.78 s


In [13]:
np.testing.assert_almost_equal(reduc0, reduc1)

If you need to compute this sort of reductions lots of times, you may want to fine tune your ironArray array parameters.  If so, just keep reading.  But before proceeding further, let's save our data for later reuse:

## Optimization Tips

As we know, most of ironArray optimization stems from their two levels of partitioning.  Previously, we have been using automatic values for chunks and blocks (based on CPU's cache sizes).  But for maximum speed, there is no replacement for fine tuning chunk and block shapes manually.

Let's start by loading the array that we previously stored on-disk:

In [14]:
%%time
import iarray as ia
ia_precip = ia.open("precip-3m.iarr")

CPU times: user 1.71 ms, sys: 38.6 ms, total: 40.3 ms
Wall time: 38 ms


So, not too much for loading a 9 GB large array from disk, uh?  Well, the thing is that `open()` just loads data when it needs it by default (but for a full in-memory load, see the `load()` function).  So, only a tiny portion of the file (the metadata) is read in order to figure out how access the data.

Now, let's start with a first attempt to optimize the speed of the reduction by favoring it when creating another container:

In [16]:
%%time
with ia.config(favor=ia.Favors.SPEED):
    new_precip = ia_precip.copy()

CPU times: user 19.8 s, sys: 3.46 s, total: 23.3 s
Wall time: 17.9 s


And let's try the reduction with the new array:

In [17]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 9.14 s, sys: 187 ms, total: 9.32 s
Wall time: 1.21 s


Well, we have got some good improvement, which is not bad for a first attempt.

Another path for improvement is to fine tune the chunk and block shapes. Let's see the current chunk and block shapes:

In [23]:
print(ia_precip.chunks)
print(ia_precip.blocks)

(1, 128, 128, 256)
(1, 32, 32, 128)


After several iterations trying different chunk shapes and block shapes, one can reach a sort of optimal configuration:

In [18]:
%%time
with ia.config(chunks=(1, 360, 128, 1440), blocks=(1, 8, 8, 720), favor=ia.Favors.SPEED):
    new_precip = ia_precip.copy()


CPU times: user 20.6 s, sys: 2.78 s, total: 23.4 s
Wall time: 15.2 s


In [19]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 7.29 s, sys: 45.2 ms, total: 7.34 s
Wall time: 835 ms


So, this new time is quite competitive with NumPy (2x faster for this case).  That means that, with a bit of manual tuning, you can end with very fast solutions with ironArray arrays.  **And** specially, using (potentially much) less memory.

## Expressions with on-disk operands

So far we have been playing with expressions on in-memory operands, but ironArray can seamlessly work with on-disk arrays too.  Let's save the previous array:

In [20]:
%%time
ia.save("precip-3m-optimal.iarr", new_precip)

CPU times: user 1.12 ms, sys: 132 µs, total: 1.26 ms
Wall time: 1.18 ms


and let's re-open again but in 'lazy' mode and do the reduction from disk:

In [21]:
new_precip_opt = ia.open("precip-3m-optimal.iarr")
new_precip_opt

<IArray (3, 720, 721, 1440) np.float32>

In [22]:
%%time
reduc1 = ia.mean(new_precip_opt, axis=(3, 2, 0)).data

CPU times: user 8.68 s, sys: 3.77 s, total: 12.4 s
Wall time: 2.79 s


So, although this on-disk time is considerable slower than the in-memory one, it is still not bad at all.  Of course, the filesystem cache is probably saving quite a bit of time, although modern SSDs can improve speeds quite significantly.  This is interesting because that means that doing out-of-core operations will consume far less memory, while still keeping reasonable times.


TODO: add a comparison with xarray or dask, for out-of-core reductions (Aleix, are you willing to do that?)