# Getting started

*ironArray* for Python is a package that implements a multi-dimensional, compressed data container and a computational engine optimized for operating with large arrays.

In this tutorial we will cover creating and manipulating *ironArray* arrays.


## Array creation

In this first section, we will see how to create the main object of the library, the *ironArray* arrays, and how to specify different properties into it. Let's start by creating a simple array:

In [1]:
import numpy as np
import iarray as ia

dtshape = ia.DTShape([5, 5], np.float64)
arr = ia.linspace(dtshape, -1 , 1)
print(repr(arr))

<IArray (5, 5) np.float64>


Voilà, we have created our first ironArray array in `arr`.

In order to create the array we have specified the shape and data type of the array within a `DTShape` dataclass.  Then we have called the `linspace` constructor, where you specify the `start` and `stop` values.  In general, ironArray uses the same names for functions than NumPy, so you can always check the excellent docs in NumPy so as to get more info.

Right now, ironArray arrays only support two data types, `double` and `float`.  This is an indication of how it is geared towards handling numerical data, as many of the ironArray internal optimizations rely on using floating point datasets.  In addition, and thanks to `fp_mantissa_bits` property, you can use any precision between 1 and 64 bit floats and the compression engine will be in charge of getting rid of the unnecessary storage.

But let's go back to our example and let's inspect the data of the array by converting it into a NumPy array:

In [2]:
ia.iarray2numpy(arr)

array([[-1.        , -0.91666667, -0.83333333, -0.75      , -0.66666667],
       [-0.58333333, -0.5       , -0.41666667, -0.33333333, -0.25      ],
       [-0.16666667, -0.08333333,  0.        ,  0.08333333,  0.16666667],
       [ 0.25      ,  0.33333333,  0.41666667,  0.5       ,  0.58333333],
       [ 0.66666667,  0.75      ,  0.83333333,  0.91666667,  1.        ]])

### Creation Properties

Besides the shape and data type, we can set more properties in the array.  For example, let's create it persistently:

In [3]:
pers_arr = ia.linspace(dtshape, -1 , 1, urlpath="myarr.iarr")

In [4]:
%%bash
ls -l myarr.iarr

-rw-r--r-- 1 faltet faltet 606 Jan 20 11:02 myarr.iarr


and we can open it from disk too:

In [5]:
arr2 = ia.open("myarr.iarr")
print(arr2.data)

[[-1.         -0.91666667 -0.83333333 -0.75       -0.66666667]
 [-0.58333333 -0.5        -0.41666667 -0.33333333 -0.25      ]
 [-0.16666667 -0.08333333  0.          0.08333333  0.16666667]
 [ 0.25        0.33333333  0.41666667  0.5         0.58333333]
 [ 0.66666667  0.75        0.83333333  0.91666667  1.        ]]


The `urlpath` parameter can be used along with many others in the `Storage` object:

In [6]:
store = ia.Storage()
print(store)

Storage(chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False)


For clarity, we can also group them in an Storage instance. For example, the next `Storage` object indicates the shape of both the chunks and the blocks:

```
ia.Storage(chunkshape=(3000, 1000), blockshape=(100, 100))
```

Let's use this Storage property to create a slightly larger array, more appropiate for ironArray usage:

In [7]:
dtshape = ia.DTShape((10000, 7000), np.float64)
storage = ia.Storage(chunkshape=(3000, 1000), blockshape=(100, 100), urlpath="large_arr.iarr")
arr = ia.linspace(dtshape, -1, 1, storage=storage)

In [8]:
%%bash
ls -lh large_arr.iarr

-rw-r--r-- 1 faltet faltet 103M Jan 20 11:02 large_arr.iarr


in this case, we have created an array of about 530 MB, but thanks to integrated compression, it takes just less than 5x the storage space.  Contrarily to ordinary chunked and compressed data container libraries that supports just 1 level of data partitioning (HDF5, Zarr...), ironArray allows for 2 levels: chunks and blocks.  This has important impact in the performance of modern computer architectures.  But don't get too fast for now; more on that later.

### More Properties

When creating arrays, ironArray lets you specify many other parameters such as compression related ones:

In [9]:
dtshape = ia.DTShape((10000, 7000), np.float64)
storage = ia.Storage(chunkshape=(3000, 1000), blockshape=(100, 100), urlpath="large_arr2.iarr")
arr = ia.linspace(dtshape, -1, 1, storage=storage, clevel=1, codec=ia.Codecs.ZSTD, fp_mantissa_bits=30)

In [10]:
%%bash
ls -lh large_arr2.iarr

-rw-r--r-- 1 faltet faltet 31M Jan 20 11:02 large_arr2.iarr


So, in this case the space required to store the array is much less than before mainly due to two reasons:

1) We are using a better codec for compression (ZSTD).

2) We are requiring just 30 significant bits in the mantissa, improving the compression ratio.

You can see the complete set of supported arguments, as well as looking at its defaults, by printing an instance of `ia.Config`:

In [11]:
cfg = ia.Config()
print(cfg)

Config(codec=<Codecs.LZ4: 1>, clevel=5, filters=[<Filters.SHUFFLE: 1>], fp_mantissa_bits=0, use_dict=False, nthreads=28, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, storage=Storage(chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False), chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False)


Of course, specifying all the compression and storage related arguments for every call in ironArray can get boring pretty quickly.  For avoid this you can either modify the global configuration defaults, or doing so with a context.

### Global Configuration

If you are going to use the same configuration parameters in your script, it might be a good idea to change the global parameters:

In [12]:
ia.set_config(codec=ia.Codecs.ZSTD, clevel=1)

Config(codec=<Codecs.ZSTD: 5>, clevel=1, filters=[<Filters.SHUFFLE: 1>], fp_mantissa_bits=0, use_dict=False, nthreads=28, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, storage=Storage(chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False), chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False)

After doing that, the defaults for the global configuration changed:

In [13]:
cfg = ia.Config()
print(cfg)

Config(codec=<Codecs.ZSTD: 5>, clevel=1, filters=[<Filters.SHUFFLE: 1>], fp_mantissa_bits=0, use_dict=False, nthreads=28, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, storage=Storage(chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False), chunkshape=None, blockshape=None, urlpath=None, enforce_frame=False, plainbuffer=False)


and from then on, these will be the defaults for *all* the functions that are called in your script.

### Contextual Configuration

If you don't want to mess with your global configuration, `ia.config` context comes handy:

In [14]:
dtshape = ia.DTShape([1000, 1000])
with ia.config(clevel=9, codec=ia.Codecs.LZ4) as cfg:
    a1 = ia.linspace(dtshape, -1, 0, cfg=cfg)
a2 = ia.linspace(dtshape, -1, 0)
print(f"a1 cratio: {a1.cratio}")
print(f"a2 cratio: {a2.cratio}")

a1 cratio: 8.781161022342813
a2 cratio: 15.030573209604679


In this case, `a1` and `a2` have different compression ratios because they are using different properties.  `a1` is using the LZ4 codec with compression level 3, whereas `a2` was using ZSTD and compression level 1, which are the global defaults that we set in the previous section.

That's basically all for this tutorial.  Remember that you can create arbitrarily large arrays either in-memory or on-disk, and that you can choose from a large variety of parameters in order to tailor arrays to your own needs.  Also, make sure that you use the advanced global and contextual configurations for avoiding too much parameter clutter in your array creation calls.