# Reductions

ironArray supports a broad range of reduction facilities, like `sum`, `min`, `max`, `mean` and others.  Also, they work on any (or group of) dimensions.  One interesting aspect of these is that the implementation leverages the multi-threading capabilities of ironArray, so they can be pretty fast (although some times they need some help from the user).

In order to exercise some of this functionality, let's download some open data from the network.  In this case, we are interested in downloading precipitation data from a period of 3 months.  With a decent connection, this should not take more than a minute to download.  Let's go:

In [6]:
import zarr
import xarray as xr
import numpy as np
import s3fs
import iarray as ia

In [10]:
def open_zarr(year, month, datestart, dateend):
    fs = s3fs.S3FileSystem(anon=True)

    datestring = 'era5-pds/zarr/{year}/{month:02d}/data/'.format(year=year, month=month)

    precip_zarr = xr.open_dataset(s3fs.S3Map(datestring + 'precipitation_amount_1hour_Accumulation.zarr/',
                                             s3=fs),
                                 engine="zarr")
    precip_zarr = precip_zarr.sel(time1=slice(np.datetime64(datestart), np.datetime64(dateend)))

    return precip_zarr.precipitation_amount_1hour_Accumulation

In [11]:
%%time

precip_m1 = open_zarr(1987, 10, '1987-10-01', '1987-10-30 23:59')
precip_m2 = open_zarr(1987, 11, '1987-11-01', '1987-11-30 23:59')
precip_m3 = open_zarr(1987, 12, '1987-12-01', '1987-12-30 23:59')

CPU times: user 381 ms, sys: 16 ms, total: 397 ms
Wall time: 16.3 s


Let's see how one of these arrays of monthly precipitation looks like:

In [15]:
repr(precip_m1)

"<xarray.DataArray 'precipitation_amount_1hour_Accumulation' (time1: 720, lat: 721, lon: 1440)>\n[747532800 values with dtype=float32]\nCoordinates:\n  * lat      (lat) float32 90.0 89.75 89.5 89.25 ... -89.25 -89.5 -89.75 -90.0\n  * lon      (lon) float32 0.0 0.25 0.5 0.75 1.0 ... 359.0 359.2 359.5 359.8\n  * time1    (time1) datetime64[ns] 1987-10-01 ... 1987-10-30T23:00:00\nAttributes:\n    long_name:       Total precipitation\n    nameCDM:         Total_precipitation_1hour_Accumulation\n    nameECMWF:       Total precipitation\n    product_type:    forecast\n    shortNameECMWF:  tp\n    standard_name:   precipitation_amount\n    units:           m"

Ok, so that's about 3 GB per month, so the full dataset is about 9 GB.

Now, let's build a NumPy array out of these (be careful if you plan to run this in your local machine, as you will need at least 16 GB of free memory):

In [5]:
%%time

precip = np.stack((precip_m1.values, precip_m2.values, precip_m3.values))
precip.shape

CPU times: user 13.4 s, sys: 2.19 s, total: 15.6 s
Wall time: 22.7 s


(3, 720, 721, 1440)

Now, let's suppose that we want to compute the mean for the precipitation per hour during our period of time.  As the time axis is 1 in our new `precip` array, we want to reduce on all axis, except the first.  That is:

In [6]:
%%time

reduc0 = np.mean(precip, axis=(0, 2, 3))

CPU times: user 788 ms, sys: 0 ns, total: 788 ms
Wall time: 788 ms


Ok.  Now, let's import this data into ironArray before proceeding with reductions:

In [7]:
%%time

ia_precip = ia.numpy2iarray(precip)
print(ia_precip)
print("cratio: ", round(ia_precip.cratio, 2))

<IArray (3, 720, 721, 1440) np.float32>
cratio:  23.46
CPU times: user 20.6 s, sys: 3.41 s, total: 24 s
Wall time: 7.9 s


Ok, so ironArray achieves a compression ratio of around 20x, which is a big win in terms of memory consumption.  Now, let's have a look at how reduction works:

In [8]:
%%time
reduc1 = ia.mean(ia_precip, axis=(0, 2, 3)).data

CPU times: user 4min 52s, sys: 392 ms, total: 4min 53s
Wall time: 14.8 s


Ok, so that's pretty slow.  Now, it is time to remember that ironArray uses chunked storage, even when it holds data in-memory.  In this case, we have been traversing the array in a very innefficient way.  In general, in chunked storage, it is always better to start reducing by the dimension that is the largest, and we took the inverse order.  With this in mind, let's try with a more reasonable order:

In [10]:
%%time
reduc1 = ia.mean(ia_precip, axis=(3, 2, 0)).data

CPU times: user 9.83 s, sys: 212 ms, total: 10 s
Wall time: 678 ms


Ok, that's much better.  This time is pretty competitive.  Now, let's compare actual data with NumPy (just in case):

In [11]:
np.testing.assert_almost_equal(reduc0, reduc1)

Ok, so the mean calculation has worked correctly.

If you need to compute this sort of reductions lots of times, you may want to fine tune your ironArray array parameters.  If so, just keep reading.  But before proceeding further, let's save our data for later reuse:

In [25]:
%%time

ia.save(ia_precip[0].copy(), "precip1.iarr")
ia.save(ia_precip[1].copy(), "precip2.iarr")
ia.save(ia_precip[2].copy(), "precip3.iarr")
ia.save(ia_precip, "precip-3m.iarr")
%ls -lh precip*.iarr

NameError: name 'ia_precip' is not defined

The dataset is stored now on a single file of 440 MB, which is more than 20x less than the original dataset thanks to compression.  That's a big win!

Finally, let's save the data in zarr format for other tutorials too:

In [27]:
%%time

zarr.save("precip1.zarr", precip_m1.values)
zarr.save("precip2.zarr", precip_m2.values)
zarr.save("precip3.zarr", precip_m3.values)
!du -sh precip*.zarr

1.1G	precip-3m.zarr
363M	precip1.zarr
357M	precip2.zarr
355M	precip3.zarr
CPU times: user 7.85 s, sys: 795 ms, total: 8.65 s
Wall time: 3.77 s


## Optimization Tips

As we know, most of ironArray optimization stems from their two levels of partitioning.  Previously, we have been using automatic values for chunkshape and blockshape (based on CPU's cache sizes).  But for maximum speed, there is no replacement for fine tuning chunk and block shapes manually.

Let's start by loading the array that we previously stored on-disk:

In [13]:
%%time
import iarray as ia
ia_precip = ia.open("precip-3m.iarr")

CPU times: user 4.22 ms, sys: 13 ms, total: 17.2 ms
Wall time: 12.9 ms


So, not too much for loading a 9 GB large array from disk, uh?  Well, the thing is that `open()` just loads data when it needs it by default (but for a full in-memory load, see the `load()` function).  So, only a tiny portion of the file (the metadata) is read in order to figure out how access the data.

Now, let's see the current chunk and block shapes:

In [14]:
print(ia_precip.chunkshape)
print(ia_precip.blockshape)

(1, 128, 128, 512)
(1, 16, 16, 128)


Now, let's start with a first attempt to find out the chunk and block shapes that can optimize the reductions.  Initially one may think that making the last dimension (the largest) as large as possible could be good decision.  Let's try that by creating another container:

In [15]:
%%time
with ia.config(chunkshape=(1, 64, 64, 1440), blockshape=(1, 16, 16, 360)) as cfg:
    new_precip = ia_precip.copy(cfg=cfg)

CPU times: user 26.6 s, sys: 1.01 s, total: 27.6 s
Wall time: 11 s


And let's try the reduction with the new array:

In [16]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 8.29 s, sys: 503 ms, total: 8.79 s
Wall time: 992 ms


Well, we have got some improvement, which is not bad for a first attempt.  After several iterations, one can reach a sort of optimal configuration:

In [29]:
%%time
with ia.config(chunkshape=(1, 360, 128, 1440), blockshape=(1, 8, 8, 720)) as cfg:
    new_precip = ia_precip.copy(cfg=cfg)


CPU times: user 21.5 s, sys: 2.41 s, total: 23.9 s
Wall time: 10.6 s


In [30]:
%%time
reduc1 = ia.mean(new_precip, axis=(3, 2, 0)).data

CPU times: user 9.11 s, sys: 69.4 ms, total: 9.18 s
Wall time: 503 ms


So, this new time is competitive with NumPy (somewhat better in fact).  That means that, when you have to deal with large arrays you should not assume that working with NumPy arrays is the fast thing you can get.  With the right tools, and a bit of manual tuning, you can end with very fast solutions for this scenario too.

## Expressions with on-disk operands

So far we have been playing with expressions on in-memory operands, but ironArray can seamlessly work with on-disk arrays too.  Let's save the previous array:

In [34]:
%%time
ia.save(new_precip, "precip-3m-optimal.iarr")

CPU times: user 862 µs, sys: 219 ms, total: 220 ms
Wall time: 220 ms


and let's re-open again but in 'lazy' mode and do the reduction from disk:

In [35]:
new_precip_opt = ia.open("precip-3m-optimal.iarr")
new_precip_opt

IArrayError: b'BLOSC FAILED - 0x800a0000002b040b - error=1,ver=0,rev=10,os=0,neg=0,adj=43,subject=1035,code=2818048,ubits=0x0'

In [33]:
%%time
reduc1 = ia.mean(new_precip_opt, axis=(3, 2, 0)).data

NameError: name 'new_precip_opt' is not defined

So, although this on-disk time is considerable slower than the in-memory one, it is still not bad.  This is interesting because that means that doing out-of-core operations on modern SSDs (our case here) will consume far less memory, while still keeping reasonable times.

TODO: add a comparison with xarray or dask, for out-of-core reductions