In [None]:
%pylab inline

# Tutorial on how to use `MadArray` objects

A `MadArray` is a numpy array with missing elements. It is generated using three types of parameter:

* **data** as an array of entries, either *int*, *float* or *complex*;
* a **mask** indicating the missing entries;
* **options** to define the behaviour of the object.

A basic initialisation requires only a data matrix.  Without mask, all elements are considered as non-missing

In [None]:
from madarrays import MadArray

# initialisation without mask
data = np.random.rand(4, 6)

A = MadArray(data)
print(A)

## Masking

The masking of data differs according to the type of entries:

* if the entries are *int* or *float*, the masking is done exclusively by giving aa boolean array of the same size of the data as argument `mask`, each entry indicating if the corresponding entry in the data array is missing or not
* if the entries are *complex*, the masking can be done as previously, or by giving boolean arrays of the same size of the data as arguments `mask_amplitude` and `mask_phase`, each entry indicating respectively if the magnitude and the phase of the corresponding entry is missing or not.

In [None]:
# initialization with a mask
mask = np.random.random(data.shape) < 0.5

Am = MadArray(data, mask)
print(mask)
print(Am)

A *MadArray* can also be defined from another *MadArray*, for example to copy the object:

In [None]:
Am2 = MadArray(Am)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am2), repr(Am2)))

A different mask can also be used:

In [None]:
mask2 = np.random.random(data.shape) < 0.9
Am3 = MadArray(Am, mask2)
print('{} - {}'.format(str(Am), repr(Am)))
print('{} - {}'.format(str(Am3), repr(Am3)))

## Properties

A *MadArray* has attributes that give information about the masking.

In [None]:
# mask of non-missing elements
print(Am.known_mask)

In [None]:
# mask of missing elements
print(Am.unknown_mask)

In [None]:
print('Is masked: {}'.format(Am.is_masked))
print('Ratio missing data: {}'.format(Am.ratio_missing_data))

## Indexing

There are two different and incompatible ways to index *MadArray*. By default (`masked_indexing=False`), it is similar to the indexing of *nd-array*: both the data matrix and the mask are indexed, and a *MadArray* with the shape defined by the indices is returned:

In [None]:
print(A[0:3, 1:3])
print(Am[0:3, 1:3])

With the other way (`masked_indexing=True`), a MadArray with the shape unchanged is returned, where non-indexed entries are considered as masked.

In [None]:
Am4 = MadArray(data, mask, masked_indexing=True)
print(Am4[0:3, 1:3])

This latter approach is adapted to be handled with *scikit-learn* procedures.

## Numerical operations
Numpy functions apply on *MadArray*, but **without** taking into account the mask 


In [None]:
print(np.mean(A))
print(np.mean(Am))