# Creating arrays

*ironArray* for Python is a package that implements a multi-dimensional, compressed data container and an optimized computational engine to manage large arrays.

In this tutorial we will cover creating a simple ironArray array.  We will instantiate a simple array, then set properties on the array object.  We will also see how to set default properties by changing global and contextual configuration settings.


## Creating an array

Let's start by creating a simple array:

In [5]:
import numpy as np
import iarray as ia

shape = [5, 5]
dtype = np.float64

arr = ia.linspace(shape, -1 , 1, dtype=dtype)
print(repr(arr))

<IArray (5, 5) np.float64>


Voilà, the object `arr` contains our first ironArray array.

To create an array, we first have to define its shape.  The array is then instantiated by the `linspace` constructor, where you specify the `start` and `stop` values.  Functions in ironArray are written to map closely to NumPy functions;  you can consult the [NumPy documentation](https://numpy.org/doc/) for more information on the functions and their parameters.

The ironArray library is designed to operate on floating point numerical data.  Consequently, the arrays currently support two data types: `double` and `float`.

Let's convert the `arr` object into a NumPy array and inspect the data:

In [6]:
ia.iarray2numpy(arr)

array([[-1.        , -0.91666667, -0.83333333, -0.75      , -0.66666667],
       [-0.58333333, -0.5       , -0.41666667, -0.33333333, -0.25      ],
       [-0.16666667, -0.08333333,  0.        ,  0.08333333,  0.16666667],
       [ 0.25      ,  0.33333333,  0.41666667,  0.5       ,  0.58333333],
       [ 0.66666667,  0.75      ,  0.83333333,  0.91666667,  1.        ]])

### Properties

Besides the shape and data type, we can set more properties on the array.  For example, let's make it persistent:

In [7]:
pers_arr = ia.linspace(shape, -1 , 1, dtype=dtype, urlpath="myarr.iarr")

In [8]:
%%bash
ls -l myarr.iarr

-rw-r--r-- 1 aleix aleix 633 Apr 21 08:36 myarr.iarr


and then we'll read the persistent object from disk.  We are going to use `ia.open()`instead of `ia.load()` to lazily read in the data as needed (a topic covered in a later turorial):

In [9]:
arr2 = ia.open("myarr.iarr")
print(arr2.data)

[[-1.         -0.91666667 -0.83333333 -0.75       -0.66666667]
 [-0.58333333 -0.5        -0.41666667 -0.33333333 -0.25      ]
 [-0.16666667 -0.08333333  0.          0.08333333  0.16666667]
 [ 0.25        0.33333333  0.41666667  0.5         0.58333333]
 [ 0.66666667  0.75        0.83333333  0.91666667  1.        ]]


#### Store

The `Store` class is used to tune the storage for your arrays.  The `urlpath` property is just one of many properties that can be set in a `Store` object.  See the [Store documentation](../reference/autofiles/config/iarray.Store.html) for more details on how ironArray storage can be optimized to improve performance and decrease array storage size.

In [10]:
store = ia.Store()
print(store)

Store(chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False)


We can also set multiple properties in a single `Store` instance. For example, this `Store` object has properties for the shape of the chunks and the blocks:

```
ia.Store(chunkshape=(3000, 1000), blockshape=(100, 100))
```

The following example shows how to create a `Store` object and set its properties, then add it to a larger ironArray array object:

In [15]:
store = ia.Store(chunks=(3000, 1000), blocks=(100, 100), urlpath="large_arr.iarr")
arr = ia.linspace((10000, 7000), -1, 1, dtype=np.float64, store=store)

In [16]:
%%bash
ls -lh large_arr.iarr

-rw-r--r-- 1 aleix aleix 91M Apr 21 08:38 large_arr.iarr


We have just created an array containing about 500MB of data.  Thanks to integrated compression, the size of the serialized array on disk is only 100MB, about 5 times smaller.  In contrast to ordinary chunked and compressed data container libraries that support just a single level of data partitioning (such as HDF5 and Zarr), IronArray allows for two levels: chunks and blocks.  As we'll see later, two levels offer more flexibility and options for tuning performance on modern CPU architectures.

#### More Properties

You may set many other properties when creating an ironArray array.  Here we set some compression properties:

In [19]:
store = ia.Store(chunks=(3000, 1000), blocks=(100, 100), urlpath="large_arr2.iarr")
arr = ia.linspace((10000, 7000), -1, 1, dtype=np.float64, store=store, clevel=1, codec=ia.Codecs.ZSTD, fp_mantissa_bits=30)

In [20]:
%%bash
ls -lh large_arr2.iarr

-rw-r--r-- 1 aleix aleix 69M Apr 21 08:39 large_arr2.iarr


As you can see, we created an array that holds 500MB of data, as before.  But now the serialized data only occupies 30MB of disk space.  We changed the compression codec and mantissa bits properties to shrink the storage size:

1) `codec=ia.Codecs.ZSTD`:  ZSTD offers better compression.

2) `fp_mantissa_bits=30`:  We are setting just 30 significant bits in the mantissa, improving the compression ratio. You can set the `fp_mantissa_bits` to any precision between 1 and 24 bit (float32) or 53 bit (float64); the compression engine will compress the data to fit the specified precision.

You can see the complete set of supported properties and their defaults by examining an instance of `ia.Config`:

In [21]:
cfg = ia.Config()
print(cfg)

Config(codec=<Codecs.ZSTD: 5>, clevel=1, favor=<Favors.BALANCE: 0>, filters=[<Filters.BITSHUFFLE: 2>], fp_mantissa_bits=0, use_dict=False, nthreads=21, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, btune=True, dtype=<class 'numpy.float64'>, store=Store(chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False), chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False)


### Configuration

Setting the same compression and storage-related properties every time you create an ironArray array object can become tedious and repetitive, if you are dealing with known datasets with stable characteristics.  You have the option to set default properties in either the global configuration or within a context.

#### Global configuration

If you will always use the same configuration parameters in your script, it might be a good idea to set default global properties as part of your script initialization:

In [22]:
ia.set_config(codec=ia.Codecs.ZSTD, clevel=1, btune=False)

Config(codec=<Codecs.ZSTD: 5>, clevel=1, favor=<Favors.BALANCE: 0>, filters=[<Filters.BITSHUFFLE: 2>], fp_mantissa_bits=0, use_dict=False, nthreads=21, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, btune=False, dtype=<class 'numpy.float64'>, store=Store(chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False), chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False)

You can verify that the new default properties are now set:  the default compression codec has changed to ZSTD, and the default compression level has changed to 1.

In [23]:
cfg = ia.Config()
print(cfg)

Config(codec=<Codecs.ZSTD: 5>, clevel=1, favor=<Favors.BALANCE: 0>, filters=[<Filters.BITSHUFFLE: 2>], fp_mantissa_bits=0, use_dict=False, nthreads=21, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, btune=False, dtype=<class 'numpy.float64'>, store=Store(chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False), chunks=None, blocks=None, urlpath=None, enforce_frame=False, plainbuffer=False)


These will be the defaults for *all* the ironArray functions that are called in your script.

#### Contextual Configuration

Sometimes you want different configuration profiles for different kinds of arrays.  In this case, you can create `ia.config` objects with custom settings that can be applied to selected arrays.  This is an example of *contextual configuration*:

In [31]:
shape = [1000, 1000]
with ia.config(clevel=9, codec=ia.Codecs.LZ4):
    a1 = ia.linspace(shape, -1, 0)
a2 = ia.linspace(shape, -1, 0)
print(f"a1 cratio: {a1.cratio:.4f}")
print(f"a2 cratio: {a2.cratio:0.4f}")

a1 cratio: 5.7174
a2 cratio: 14.8501


In this case, `a1` and `a2` have different compression ratios, as they have different compression levels and compression codecs set as default properties on their array configurations.  `a1` is using the LZ4 codec with compression level 3, whereas `a2` is using ZSTD and compression level 1, the global defaults that we set in the previous example.

### Conclusion

Now you can create ironArray arrays with specific properties.  You can create arbitrarily large arrays either in memory or on disk, and you can tailor arrays to your own needs using ironArray configuration properties.  Use the advanced global and contextual configurations to set often-used configuration profiles for your arrays.