# Creating arrays

*ironArray* for Python is a package that implements a multi-dimensional, compressed data container and an optimized computational engine to manage large arrays.

In this tutorial we will cover creating a simple ironArray array.  We will instantiate a simple array, then set properties on the array object.  We will also see how to set default properties by changing global and contextual configuration settings.


## Creating an array

Let's start by creating a simple array whose elements are inside the [-1, 1] interval:

In [56]:
import numpy as np
import iarray as ia

shape = [5, 5]
dtype = np.float64

arr = ia.linspace(shape, -1 , 1, dtype=dtype)
print(arr)

<IArray (5, 5) np.float64>


Voil√†, the object `arr` contains our first ironArray array.

To create an array, we first have to define its shape.  The array is then instantiated by the `linspace` constructor, where you specify the `start` and `stop` values.  Functions in ironArray are written to map closely to NumPy functions;  you can consult the [NumPy documentation](https://numpy.org/doc/) for more information on the functions and their parameters.

The ironArray library is designed to operate on floating point numerical data.  Consequently, the arrays currently support two data types: `double` and `float`.

Let's convert the `arr` object into a NumPy array and inspect the data:

In [57]:
ia.iarray2numpy(arr)

array([[-1.        , -0.91666667, -0.83333333, -0.75      , -0.66666667],
       [-0.58333333, -0.5       , -0.41666667, -0.33333333, -0.25      ],
       [-0.16666667, -0.08333333,  0.        ,  0.08333333,  0.16666667],
       [ 0.25      ,  0.33333333,  0.41666667,  0.5       ,  0.58333333],
       [ 0.66666667,  0.75      ,  0.83333333,  0.91666667,  1.        ]])

### Properties

Besides the shape and data type, we can set more properties on the array.  For example, let's make it persistent:

In [58]:
pers_arr = ia.linspace(shape, -1 , 1, dtype=dtype, urlpath="myarr.iarr", mode="w")

In [59]:
%%bash
ls -l myarr.iarr

-rw-r--r--  1 martaiborra  staff  900 Sep 14 11:10 myarr.iarr


and then we'll read the persistent object from disk.  We are going to use `ia.open()`instead of `ia.load()` to lazily read in the data as needed (a topic covered in a later turorial):

In [60]:
arr2 = ia.open("myarr.iarr")
print(arr2.data)

[[-1.         -0.91666667 -0.83333333 -0.75       -0.66666667]
 [-0.58333333 -0.5        -0.41666667 -0.33333333 -0.25      ]
 [-0.16666667 -0.08333333  0.          0.08333333  0.16666667]
 [ 0.25        0.33333333  0.41666667  0.5         0.58333333]
 [ 0.66666667  0.75        0.83333333  0.91666667  1.        ]]


#### Store

The `Store` class is used to tune the storage for your arrays.  The `urlpath` property is just one of many properties that can be set in a `Store` object.  See the [Store documentation](../reference/autofiles/config/iarray.Store.html) for more details on how ironArray storage can be optimized to improve performance and decrease array storage size.

In [61]:
store = ia.Store()
print(store)

Store(chunks=None, blocks=None, urlpath=None, mode=b'r', contiguous=False, plainbuffer=False)


We can also set multiple properties in a single `Store` instance. For example, this `Store` object has properties for the shape of the chunks and the blocks:

```
ia.Store(chunks=(3000, 1000), blocks=(100, 100))
```

The following example shows how to create a `Store` object and set its properties, then add it to a larger ironArray array object:

In [62]:
store = ia.Store(chunks=(3000, 1000), blocks=(100, 100), urlpath="large_arr.iarr", mode="w")
arr = ia.linspace((10000, 7000), -1, 1, dtype=np.float64, store=store)

In [63]:
%%bash
ls -lh large_arr.iarr

-rw-r--r--  1 martaiborra  staff   144M Sep 14 11:10 large_arr.iarr


We have just created an array containing about 500MB of data.  Thanks to integrated compression, the size of the serialized array on disk is only around 150MB, more than 3x times smaller.  In contrast to ordinary chunked and compressed data container libraries that support just a single level of data partitioning (such as HDF5 and Zarr), IronArray allows for two levels: chunks and blocks.  As we'll see later, two levels offer more flexibility and options for tuning performance on modern CPU architectures.

#### More Properties

You may set many other properties when creating an ironArray array.  Here we set some compression properties:

In [64]:
store = ia.Store(chunks=(3000, 1000), blocks=(100, 100), urlpath="large_arr2.iarr", mode="w")
arr = ia.linspace((10000, 7000), -1, 1, dtype=np.float64, store=store, clevel=5, codec=ia.Codecs.ZSTD, fp_mantissa_bits=30)

In [65]:
%%bash
ls -lh large_arr2.iarr

-rw-r--r--  1 martaiborra  staff    71M Sep 14 11:10 large_arr2.iarr


As you can see, we created an array that holds 500MB of data, as before.  But now the serialized data only takes around 70MB of disk space.  We changed the compression codec and mantissa bits properties to shrink the storage size:

1) `codec=ia.Codecs.ZSTD`:  ZSTD offers better compression.

2) `fp_mantissa_bits=30`:  The IEEE Standard for Floating-Point Arithmetic (IEEE 754), sets the number of significand bits to 24 for float32 and 53 for float64 (including the hidden bit). By setting just 30 bits in the mantissa (or significand) instead of the usual 53 bits for float64 we are setting the other 23 bits to zero, which improves the compression ratio. You can set the `fp_mantissa_bits` to any precision between 1 and 24 bit (float32) or 53 bit (float64); the compression engine will compress the data to fit the specified precision.

You can see the complete set of supported properties and their defaults by examining an instance of `ia.Config`:

In [66]:
cfg = ia.Config()
print(cfg)

Config(codec=<Codecs.LZ4: 1>, clevel=9, favor=<Favors.BALANCE: 0>, filters=[<Filters.SHUFFLE: 1>], fp_mantissa_bits=0, use_dict=False, nthreads=3, eval_method=<Eval.AUTO: 1>, seed=1, random_gen=<RandomGen.MERSENNE_TWISTER: 0>, btune=True, dtype=<class 'numpy.float64'>, store=Store(chunks=None, blocks=None, urlpath=None, mode=b'r', contiguous=False, plainbuffer=False), chunks=None, blocks=None, urlpath=None, mode=b'r', contiguous=False, plainbuffer=False)


## Conclusion

You can create arbitrarily large arrays either in memory or on disk, and you can tailor arrays to your own needs using ironArray configuration properties.  There is a dedicated tutorial about [Configuring ironArray](configuration.ipynb) that is important to read in order to comfortably deal with the rich set of properties in ironArray.