# Reductions

ironArray supports a broad range of reduction facilities, like `sum`, `min`, `max`, `mean` and others.  Also, they work on any (or group of) dimensions.  One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although some times they need some help from the user).

In order to exercise some of this functionality, let's download some open data from the network.  In this case, we are interested in downloading precipitation data from a period of 3 months.  With a decent connection, this should not take more than a minute to download.  Let's go:

In [1]:
import zarr
import xarray as xr
import numpy as np
import s3fs
import iarray as ia

In [2]:
def open_zarr(year, month, datestart, dateend):
    fs = s3fs.S3FileSystem(anon=True)

    datestring = 'era5-pds/zarr/{year}/{month:02d}/data/'.format(year=year, month=month)

    precip_zarr = xr.open_dataset(s3fs.S3Map(datestring + 'precipitation_amount_1hour_Accumulation.zarr/',
                                             s3=fs),
                                 engine="zarr")
    precip_zarr = precip_zarr.sel(time1=slice(np.datetime64(datestart), np.datetime64(dateend)))

    return precip_zarr.precipitation_amount_1hour_Accumulation

In [3]:
%%time

precip_m1 = open_zarr(1987, 10, '1987-10-01', '1987-10-30 23:59')
precip_m2 = open_zarr(1987, 11, '1987-11-01', '1987-11-30 23:59')
precip_m3 = open_zarr(1987, 12, '1987-12-01', '1987-12-30 23:59')

CPU times: user 1.02 s, sys: 75.7 ms, total: 1.09 s
Wall time: 22.1 s


Let's see how one of these arrays of monthly precipitation looks like:

In [4]:
repr(precip_m1)

"<xarray.DataArray 'precipitation_amount_1hour_Accumulation' (time1: 720, lat: 721, lon: 1440)>\n[747532800 values with dtype=float32]\nCoordinates:\n  * lat      (lat) float32 90.0 89.75 89.5 89.25 ... -89.25 -89.5 -89.75 -90.0\n  * lon      (lon) float32 0.0 0.25 0.5 0.75 1.0 ... 359.0 359.2 359.5 359.8\n  * time1    (time1) datetime64[ns] 1987-10-01 ... 1987-10-30T23:00:00\nAttributes:\n    long_name:       Total precipitation\n    nameCDM:         Total_precipitation_1hour_Accumulation\n    nameECMWF:       Total precipitation\n    product_type:    forecast\n    shortNameECMWF:  tp\n    standard_name:   precipitation_amount\n    units:           m"

Ok, so that's about 3 GB per month, so the full dataset is about 9 GB.

Now, let's build a NumPy array out of these (be careful if you plan to run this in your local machine, as you will need at least 16 GB of free memory):

In [5]:
%%time

precip = np.stack((precip_m1.values, precip_m2.values, precip_m3.values))
precip.shape

CPU times: user 43.1 s, sys: 17.4 s, total: 1min
Wall time: 2min 32s


(3, 720, 721, 1440)

Now, let's suppose that we want to compute the mean for the precipitation per hour during our period of time.  As the time axis is 1 in our new `precip` array, we want to reduce on all axis, except the first.  That is:

In [6]:
%%time

reduc0 = np.mean(precip, axis=(0, 2, 3))

CPU times: user 2.18 s, sys: 8.14 s, total: 10.3 s
Wall time: 12 s


Ok.  Now, let's import this data into ironArray before proceeding with reductions:

In [7]:
%%time

ia_precip = ia.numpy2iarray(precip)
print(ia_precip)
print("cratio: ", round(ia_precip.cratio, 2))

<IArray (3, 720, 721, 1440) np.float32>
cratio:  11.72
CPU times: user 1min 7s, sys: 9.67 s, total: 1min 16s
Wall time: 26.8 s


Ok, so ironArray achieves a compression ratio of around 20x, which is a big win in terms of memory consumption.  Now, let's have a look at how reduction works:

In [8]:
%%time
reduc1 = ia.mean(ia_precip, axis=(0, 2, 3)).data

CPU times: user 2min 48s, sys: 1.59 s, total: 2min 49s
Wall time: 28.9 s


Ok, so that's pretty slow.  Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory.  In this case, we have been traversing the array in a very innefficient way.  In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order.  With this in mind, let's try with a more reasonable order:

In [9]:
%%time
reduc1 = ia.mean(ia_precip, axis=(3, 2, 0)).data

CPU times: user 7.74 s, sys: 63.2 ms, total: 7.8 s
Wall time: 1.34 s


Ok, that's much better.  This time is pretty competitive.  Now, let's compare exactly the same computation with NumPy:

In [10]:
%%time

reduc0 = np.mean(precip, axis=(3, 2, 0))

CPU times: user 2.17 s, sys: 8.03 s, total: 10.2 s
Wall time: 11.7 s


In [11]:
np.testing.assert_almost_equal(reduc0, reduc1)

Ok, so the mean calculation has worked correctly.

If you need to compute this sort of reductions lots of times, you may want to fine tune your ironArray array parameters.  If so, just keep reading.  But before proceeding further, let's save our data for later reuse:

In [12]:
%%time

ia.save("precip1.iarr", ia_precip[0])
ia.save("precip2.iarr", ia_precip[1])
ia.save("precip3.iarr", ia_precip[2])
ia.save("precip-3m.iarr", ia_precip)
%ls -lh precip*.iarr

-rw-r--r--  1 aleix11alcacer  staff   660M Apr 16 08:40 precip-3m.iarr
-rw-r--r--  1 aleix11alcacer  staff   219M Apr 16 08:40 precip1.iarr
-rw-r--r--  1 aleix11alcacer  staff   285M Apr 16 08:40 precip2.iarr
-rw-r--r--  1 aleix11alcacer  staff   283M Apr 16 08:40 precip3.iarr
-rw-r--r--  1 aleix11alcacer  staff   219M Apr 16 08:17 precip_slicing.iarr
CPU times: user 49.9 s, sys: 11.5 s, total: 1min 1s
Wall time: 28.1 s


The dataset is stored now on a single file of less than 450 MB, which is more than 20x less than the original dataset thanks to compression.  That's a big win!

Finally, let's save the data in zarr format for other tutorials too:

In [13]:
%%time

zarr.save("precip1.zarr", precip_m1.values)
zarr.save("precip2.zarr", precip_m2.values)
zarr.save("precip3.zarr", precip_m3.values)
zarr.save("precip-3m.zarr", precip)
!du -sh precip*.zarr

1.0G	precip-3m.zarr
363M	precip1.zarr
357M	precip2.zarr
355M	precip3.zarr
358M	precip_slicing.zarr
CPU times: user 26.2 s, sys: 26.8 s, total: 53 s
Wall time: 46.8 s


## Optimization Tips

As we know, most of ironArray optimization stems from their two levels of partitioning.  Previously, we have been using automatic values for chunkshape and blockshape (based on CPU's cache sizes).  But for maximum speed, there is no replacement for fine tuning chunk and block shapes manually.

Let's start by loading the array that we previously stored on-disk:

In [14]:
%%time
import iarray as ia
ia_precip = ia.open("precip-3m.iarr")

CPU times: user 2.75 ms, sys: 173 ms, total: 176 ms
Wall time: 174 ms


So, not too much for loading a 9 GB large array from disk, uh?  Well, the thing is that `open()` just loads data when it needs it by default (but for a full in-memory load, see the `load()` function).  So, only a tiny portion of the file (the metadata) is read in order to figure out how access the data.

Now, let's see the current chunk and block shapes:

In [15]:
print(ia_precip.chunkshape)
print(ia_precip.blockshape)

(1, 256, 256, 512)
(1, 32, 32, 64)


Now, let's start with a first attempt to optimize the speed of the reduction by favoring it when creating another container:

In [16]:
%%time
with ia.config(favor=ia.Favors.SPEED):
    new_precip = ia_precip.copy()

CPU times: user 28.8 s, sys: 13.3 s, total: 42.1 s
Wall time: 33.8 s


And let's try the reduction with the new array:

In [17]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 7.59 s, sys: 2.6 s, total: 10.2 s
Wall time: 1.94 s


Well, we have got some good improvement, which is not bad for a first attempt.  Another possibility is to fine tune the chunk and block shapes. After several iterations, one can reach a sort of optimal configuration:

In [18]:
%%time
with ia.config(chunkshape=(1, 360, 128, 1440), blockshape=(1, 8, 8, 720), favor=ia.Favors.SPEED):
    new_precip = ia_precip.copy()


CPU times: user 35.3 s, sys: 8.74 s, total: 44.1 s
Wall time: 37.4 s


In [19]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 7.76 s, sys: 1.75 s, total: 9.51 s
Wall time: 1.72 s


So, this new time is quite competitive with NumPy (2x faster for this case).  That means that, with a bit of manual tuning, you can end with very fast solutions with ironArray arrays.  **And** specially, using (potentially much) less memory.

## Expressions with on-disk operands

So far we have been playing with expressions on in-memory operands, but ironArray can seamlessly work with on-disk arrays too.  Let's save the previous array:

In [20]:
%%time
ia.save("precip-3m-optimal.iarr", new_precip)

CPU times: user 1min 25s, sys: 19 s, total: 1min 44s
Wall time: 34.8 s


and let's re-open again but in 'lazy' mode and do the reduction from disk:

In [21]:
new_precip_opt = ia.open("precip-3m-optimal.iarr")
new_precip_opt

<IArray (3, 720, 721, 1440) np.float32>

In [22]:
%%time
reduc1 = ia.mean(new_precip_opt, axis=(3, 2, 0)).data

CPU times: user 7.54 s, sys: 2.49 s, total: 10 s
Wall time: 2.72 s


So, although this on-disk time is considerable slower than the in-memory one, it is still not bad at all.  Of course, the filesystem cache is probably saving quite a bit of time, although modern SSDs can improve speeds quite significantly.  This is interesting because that means that doing out-of-core operations will consume far less memory, while still keeping reasonable times.


TODO: add a comparison with xarray or dask, for out-of-core reductions (Aleix, are you willing to do that?)