# Tutorial Part 3: Array Functions
An array function is simply a function that takes one or more arrays as input, performs a calculation on the input and return a new array with the result. Examples are functions that adds values from two arrays together or functions that calculates the mean of values in an array.

Isopy arrays support a large number of numpy array functions, and universal functions, natively through the mechanisms provided by numpy. In addition isopy comes with its own set of array functions that to supplement or enhance those provided by numpy.

**Table of Content**

* [Implementation](#Implementation)
    * [Input Values](#Input-values)
    * [Optional Arguments](#Optional-arguments)
* [Array Functions](#Array-Functions)
    * [Isopy](#Isopy)
    * [arrayfunc](#arrayfunc)
    * [Numpy](#Numpy)
    * [Scipy](#Scipy)
* [Examples](#Examples)

In [1]:
import isopy
import numpy as np
from scipy import stats

## Implementation
This section briefly discusses the isopy implementation of array functions for different types of input.

### Input values
The simplest case is a function that takes only one input and performs an operation on every value in the input. This work just as you would expect for isopy arrays.ample the numpy function ``log`` calculates the natural logarithm for every value in an array.

In [6]:
a = isopy.array(ru = [1, 11], pd=[2, 12], cd = [3,13])
np.log(a)

(row) , Ru     , Pd      , Cd     
0     , 0      , 0.69315 , 1.0986 
1     , 2.3979 , 2.4849  , 2.5649 

For functions that take an axis argument, ``axis=0`` will perform the operation on each column, ``axis=1`` will perform the operation on each row and ``axis=None`` will perform the operation on the entire array. If the axis argument is not given it will by default perform the operation on each column**\*** 

In [48]:
np.sum(a) # Same as np.sum(a, axis=1)

(row) , Ru , Pd , Cd 
None  , 12 , 14 , 16 

In [22]:
np.sum(a, axis=None) # Sums all the values in the array

42.0

In [23]:
np.sum(a, axis = 1) # Sums the values in each row

array([ 6., 36.])

**\*** This is also true for numpy functions where the default value of ``axis`` is typically ``None``. If you want to perform the operation on the entire array you have to explicitly pass ``axis=None`` when calling the function.

---
If the input includes two or more isopy arrays then the returned array will contain all the columns found in all arrays. The operation is then performed on each column in turn. By default ``np.nan`` will be used to represent missing columns in arrays. In most instances this means that the result for that column also becomes ``np.nan``.

In [26]:
a1 = isopy.array(ru = [1, 11], pd=[2, 12], cd = [3,13])
a2 = isopy.array(ru = [1, 11], rh = [1.5, 11.5], pd=[2, 12])
a1 + a2 # same as np.add(a1, a2)

(row) , Ru , Pd , Cd  , Rh  
0     , 2  , 4  , nan , nan 
1     , 22 , 24 , nan , nan 

Generally, the the number of rows in the different arrays must be the same or 1. If one array has a size of 1 then that value used for every row value in larger arrays.

In [27]:
a3 = isopy.array(ru = 100, pd=200, cd = 300)
a1 + a3

(row) , Ru  , Pd  , Cd  
0     , 101 , 202 , 303 
1     , 111 , 212 , 313 

**Note** It is possible to change the default value used for missing columns using the [arrayfunc](#arrayfunc) function or in this case the [isopy ``add()``](#Isopy) function.

---
If the input is a mixture of isopy arrays and scalar values then the scalar values are used to compute the result of every column in the array. Generally, the scalar value(s) must therefore have a size of 1 or be of the same size as the number of rows in the array.

In [18]:
a = isopy.array(ru = [1, 11], pd=[2, 12], cd = [3,13])
a + 1 # 1 is added to each column

(row) , Ru , Pd , Cd 
0     , 2  , 3  , 4  
1     , 12 , 13 , 14 

In [19]:
a + [1, 10] # 1 is added to the first row, 10 is added to the second

(row) , Ru , Pd , Cd 
0     , 2  , 3  , 4  
1     , 21 , 22 , 23 

---
Dictionaries behave like an isopy array when used in combination with isopy arrays. However, only columns in the isopy array will be included in the output array. Thus dictionaries are useful for storing reference values and standard data. The most of the reference values included with isopy are all dictionaries for this reason.

The array function implementation will automatically convert a python dictionary to a ``IsopyDict`` so the keys in the dictionary do not have to be key strings.

In [31]:
a = isopy.array(ru = [1, 11], pd=[2, 12], cd = [3,13])
d = dict(ru = 100, rh=150, pd=200, ag=250, cd=300)
a + d

(row) , Ru  , Pd  , Cd  
0     , 101 , 202 , 303 
1     , 111 , 212 , 313 

If the dictionary is a ``ScalarDict`` and a ratio key string is not present in the dictionary the value is automatically calculated if both the numerator and denominator key strings are present in the dictionary.

In [35]:
a = isopy.array(ru = [1, 11], pd=[2, 12], cd = [3,13]).ratio('pd')
d = isopy.ScalarDict(ru = 100, rh=150, pd=200, ag=250, cd=300)
a + d

(row) , Ru/Pd  , Cd/Pd  
0     , 1      , 3      
1     , 1.4167 , 2.5833 

### Optional arguments
Many array functions have optional arguments and most of these are compatible with the isopy implementation of array functions **\***. Optional arguments are defined as arguments that have a default value. Optional arguments that are isopy arrays pass only the value of the column that is being operated on. All other optional arguments are passed to every column operation. Isopy arrays that are given as optional arguments function like a dictionary and therefore do not do not alter the columns of the result.

For example the optional argument ``where`` is found in many array functions to only included certain indexes in calculations. Passing a simple boolean array means this array will be applied to every column.

In [4]:
a = isopy.array(ru = [1, 11], pd=[2, 12], cd = [3,13])
np.sum(a, where = [True, False]) #Only sums values in the first row

(row) , Ru , Pd , Cd 
None  , 1  , 2  , 3  

Passing a boolean isopy array means we can specify ``where`` for different columns. assign arrays with column not in the input will not affect the output.

In [5]:
w = isopy.array(ru = [True, False], pd=[False, True], cd = [True,True], te = [True, True], dtype = bool)
np.sum(a, where=w) #Te column is not included in the result as it is an optional argument

(row) , Ru , Pd , Cd 
None  , 1  , 12 , 16 

**\*** Optional arguments known not work, or with undefined behavior, with the isopy array function implementation are ``axes``, ``subok``, ``signature``, ``extobj``, ``order`` and ``casting``.

## Array Functions
### Isopy
There are two categories of numpy array functions. There are the general array function that are included under the ``isopy`` name space and the specialized functions from the isopy toolbox that are found under the ``isopy.tb`` name space. The general array functions are introduced below and a more detailed description can be found [here](https://isopy.readthedocs.io/en/latest/refpages/array_functions.html). The toolbox functions are described [here](https://isopy.readthedocs.io/en/latest/refpages/toolbox.html).

Isopy arrays support the ``+``, ``-``, ``*``, ``/`` and ``**`` operators. ``np.nan`` is used to represent the value(s) of a absent columns.

In [8]:
a1 = isopy.array(ru = [1, 11], pd = [2, 12], cd = [3,13])
a2 = isopy.array(ru = 1, rh = 1.5, pd = 2, ag = 2.5, cd = 3)
a1 + a2

(row) , Ru , Pd , Cd , Rh  , Ag  
0     , 2  , 4  , 6  , nan , nan 
1     , 12 , 14 , 16 , nan , nan 

In [9]:
a1 / a2

(row) , Ru , Pd , Cd     , Rh  , Ag  
0     , 1  , 1  , 1      , nan , nan 
1     , 11 , 6  , 4.3333 , nan , nan 

You can also call the functions directly which allows you to specify the default value(s) used for missing columns.

In [11]:
isopy.add(a1, a2, default_value=0)

(row) , Ru , Pd , Cd , Rh  , Ag  
0     , 2  , 4  , 6  , 1.5 , 2.5 
1     , 12 , 14 , 16 , 1.5 , 2.5 

You can also specify the keys on which the operation should be performed. You even specify columns that does not occur in any of the input.

In [15]:
isopy.divide(a1, a2, keys=['pd', 'ag', 'cd', 'te']) #The result will contain only the columns specified

(row) , Pd , Ag  , Cd     , Te  
0     , 1  , nan , 1      , nan 
1     , 6  , nan , 4.3333 , nan 

---
You can join together multiple array using the ``concatenate`` function. By default the rows are appended

In [16]:
a1 = isopy.array(ru = [1, 11], pd = [2, 12], cd = [3,13])
a2 = isopy.array(ru = 1, rh = 1.5, pd = 2, ag = 2.5, cd = 3)
isopy.concatenate(a1, a2)

(row) , Ru , Pd , Cd , Rh  , Ag  
0     , 1  , 2  , 3  , nan , nan 
1     , 11 , 12 , 13 , nan , nan 
2     , 1  , 2  , 3  , 1.5 , 2.5 

You can specify the value(s) used for missing columns

In [17]:
isopy.concatenate(a1, a2, default_value=0)

(row) , Ru , Pd , Cd , Rh  , Ag  
0     , 1  , 2  , 3  , 0   , 0   
1     , 11 , 12 , 13 , 0   , 0   
2     , 1  , 2  , 3  , 1.5 , 2.5 

If you wish to append a column to an array set the pass ``axis=1``

In [20]:
a3 = isopy.array(rh=[1.5, 11.5], ag=[2.5, 12.5])
isopy.concatenate(a1, a3, axis=1)

(row) , Ru , Pd , Cd , Rh   , Ag   
0     , 1  , 2  , 3  , 1.5  , 2.5  
1     , 11 , 12 , 13 , 11.5 , 12.5 

---
Isopy comes with array functions for calculating the standard deviation, standard error and the median absolute deviation. All these functions assume 1 degree of freedom.

In [11]:
a = isopy.random(100, [(0, 0.1), (1, 1), (10, 10)], ['ru', 'pd', 'cd'])
isopy.sd(a) # Standard deviation

(row) , Ru      , Pd    , Cd     
None  , 0.10624 , 1.022 , 9.3383 

In [6]:
isopy.se(a) # Standard error

(row) , Ru       , Pd      , Cd     
None  , 0.010171 , 0.10453 , 0.9686 

In [7]:
isopy.mad(a) # Median absolute deviation

(row) , Ru       , Pd      , Cd     
None  , 0.093511 , 0.98411 , 10.553 

These functions work just like numpy arrays and will thus work on any array, not just isopy arrays.

In [8]:
isopy.sd([i for i in range(100)])

29.011491975882016

Versions that ignore, rather than propagate, ``nan`` values are named ``nansd()``, ``nanse()`` and ``nanmad()``. 

You can specify the multiplier of the returned value using the ``level`` argument. Values below 1 are interpreted as a percentage point.

In [12]:
isopy.sd(a, level=2) # 2 standard deviations or ~ 96 % confidence level

(row) , Ru      , Pd    , Cd     
None  , 0.21248 , 2.044 , 18.677 

In [13]:
isopy.sd(a, level=0.95) # 95 % confidence level or ~ 1.96 standard deviations

(row) , Ru      , Pd     , Cd     
None  , 0.20823 , 2.0031 , 18.303 

Versions with predefined levels of 2, 3, 4, 5 and 0.95 exist for each of these functions. For ``sd`` these are called ``sd2``, ``sd3``, ``sd4``, ``sd5``, and ``sd95``. The naming scheme is the same for the other functions.

In [32]:
isopy.sd2(a)

(row) , Ru      , Pd     , Cd     
None  , 0.21837 , 2.0544 , 21.229 

---
``keymax`` and ``keymin`` can be used to find the column with the maximum and minimum value in an array. By default this is based on the median value in each column but you can specify a function to evaluate columns as the second argument

In [37]:
a = isopy.array(ru = [1, 11, 111], pd = [2, 12, 22], cd = [3, 13, 23])
isopy.keymax(a) # Based on the median value of each column

ElementKeyString('Cd')

In [36]:
isopy.keymax(a, np.mean) # Based on the mean value of each column

ElementKeyString('Ru')

### arrayfunc
The [``arrayfunc`` function](#) **LINK MISSING** allows you to run array functions on isopy arrays that are not supported by isopy. For example, we can use this function to call the ``scipy.stats`` function ``sem`` to calculate the standard error of an isopy array.

In [9]:
a = isopy.random(100, [(0, 0.1), (1, 1), (10, 10)], ['ru', 'pd', 'cd'])
isopy.arrayfunc(stats.sem, a)

(row) , Ru        , Pd      , Cd      
None  , 0.0086625 , 0.10031 , 0.99548 

You can specify the default value for missing columns and the keys on which the function should be called.

In [10]:
a1 = isopy.array(ru = [1, 11], pd = [2, 12], cd = [3,13])
a2 = isopy.array(ru = 1, rh = 1.5, pd = 2, ag = 2.5, cd = 3)
isopy.arrayfunc(np.add, a1, a2, default_value = 0)

(row) , Ru , Pd , Cd , Rh  , Ag  
0     , 2  , 4  , 6  , 1.5 , 2.5 
1     , 12 , 14 , 16 , 1.5 , 2.5 

In [11]:
a = isopy.random(100, [(0, 0.1), (1, 1), (10, 10)], ['ru', 'pd', 'cd'])
isopy.arrayfunc(stats.sem, a, keys=['pd', 'ag', 'cd'])

(row) , Pd      , Ag  , Cd     
None  , 0.10156 , nan , 0.8771 

### Numpy
Isopy arrays support a range of numpy array functions. You can use the ``allowed_numpy_functions`` to return a list of the allowed functions

In [3]:
from IPython.display import Markdown
Markdown(isopy.allowed_numpy_functions('markdown')) # Give us hyper links

[concatenate](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html), [sin](https://numpy.org/doc/stable/reference/generated/numpy.sin.html), [cos](https://numpy.org/doc/stable/reference/generated/numpy.cos.html), [tan](https://numpy.org/doc/stable/reference/generated/numpy.tan.html), [arcsin](https://numpy.org/doc/stable/reference/generated/numpy.arcsin.html), [arccos](https://numpy.org/doc/stable/reference/generated/numpy.arccos.html), [arctan](https://numpy.org/doc/stable/reference/generated/numpy.arctan.html), [degrees](https://numpy.org/doc/stable/reference/generated/numpy.degrees.html), [isnan](https://numpy.org/doc/stable/reference/generated/numpy.isnan.html), [radians](https://numpy.org/doc/stable/reference/generated/numpy.radians.html), [deg2rad](https://numpy.org/doc/stable/reference/generated/numpy.deg2rad.html), [rad2deg](https://numpy.org/doc/stable/reference/generated/numpy.rad2deg.html), [sinh](https://numpy.org/doc/stable/reference/generated/numpy.sinh.html), [cosh](https://numpy.org/doc/stable/reference/generated/numpy.cosh.html), [tanh](https://numpy.org/doc/stable/reference/generated/numpy.tanh.html), [arcsinh](https://numpy.org/doc/stable/reference/generated/numpy.arcsinh.html), [arccosh](https://numpy.org/doc/stable/reference/generated/numpy.arccosh.html), [arctanh](https://numpy.org/doc/stable/reference/generated/numpy.arctanh.html), [rint](https://numpy.org/doc/stable/reference/generated/numpy.rint.html), [floor](https://numpy.org/doc/stable/reference/generated/numpy.floor.html), [ceil](https://numpy.org/doc/stable/reference/generated/numpy.ceil.html), [trunc](https://numpy.org/doc/stable/reference/generated/numpy.trunc.html), [exp](https://numpy.org/doc/stable/reference/generated/numpy.exp.html), [expm1](https://numpy.org/doc/stable/reference/generated/numpy.expm1.html), [exp2](https://numpy.org/doc/stable/reference/generated/numpy.exp2.html), [log](https://numpy.org/doc/stable/reference/generated/numpy.log.html), [log10](https://numpy.org/doc/stable/reference/generated/numpy.log10.html), [log2](https://numpy.org/doc/stable/reference/generated/numpy.log2.html), [log1p](https://numpy.org/doc/stable/reference/generated/numpy.log1p.html), [reciprocal](https://numpy.org/doc/stable/reference/generated/numpy.reciprocal.html), [positive](https://numpy.org/doc/stable/reference/generated/numpy.positive.html), [negative](https://numpy.org/doc/stable/reference/generated/numpy.negative.html), [sqrt](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html), [cbrt](https://numpy.org/doc/stable/reference/generated/numpy.cbrt.html), [square](https://numpy.org/doc/stable/reference/generated/numpy.square.html), [fabs](https://numpy.org/doc/stable/reference/generated/numpy.fabs.html), [sign](https://numpy.org/doc/stable/reference/generated/numpy.sign.html), [absolute](https://numpy.org/doc/stable/reference/generated/numpy.absolute.html), [prod](https://numpy.org/doc/stable/reference/generated/numpy.prod.html), [sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html), [nanprod](https://numpy.org/doc/stable/reference/generated/numpy.nanprod.html), [nansum](https://numpy.org/doc/stable/reference/generated/numpy.nansum.html), [cumprod](https://numpy.org/doc/stable/reference/generated/numpy.cumprod.html), [cumsum](https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html), [nancumprod](https://numpy.org/doc/stable/reference/generated/numpy.nancumprod.html), [nancumsum](https://numpy.org/doc/stable/reference/generated/numpy.nancumsum.html), [amin](https://numpy.org/doc/stable/reference/generated/numpy.amin.html), [amax](https://numpy.org/doc/stable/reference/generated/numpy.amax.html), [nanmin](https://numpy.org/doc/stable/reference/generated/numpy.nanmin.html), [nanmax](https://numpy.org/doc/stable/reference/generated/numpy.nanmax.html), [ptp](https://numpy.org/doc/stable/reference/generated/numpy.ptp.html), [median](https://numpy.org/doc/stable/reference/generated/numpy.median.html), [average](https://numpy.org/doc/stable/reference/generated/numpy.average.html), [mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html), [std](https://numpy.org/doc/stable/reference/generated/numpy.std.html), [var](https://numpy.org/doc/stable/reference/generated/numpy.var.html), [nanmedian](https://numpy.org/doc/stable/reference/generated/numpy.nanmedian.html), [nanmean](https://numpy.org/doc/stable/reference/generated/numpy.nanmean.html), [nanstd](https://numpy.org/doc/stable/reference/generated/numpy.nanstd.html), [nanvar](https://numpy.org/doc/stable/reference/generated/numpy.nanvar.html), [add](https://numpy.org/doc/stable/reference/generated/numpy.add.html), [subtract](https://numpy.org/doc/stable/reference/generated/numpy.subtract.html), [true_divide](https://numpy.org/doc/stable/reference/generated/numpy.true_divide.html), [multiply](https://numpy.org/doc/stable/reference/generated/numpy.multiply.html), [power](https://numpy.org/doc/stable/reference/generated/numpy.power.html)

Attempting to use a function not included in this list will give a warning and in most instances raise an exception. It is still possible to use functions that are not supported by isopy by using the [``arrayfunc`` function](#arrayfunc).

### Scipy
Scipy functions are unfortunately not supported by isopy arrays at the moment. To run scipy functions on isopy arrays use the [``arrayfunc`` function](#arrayfunc).