# Tutorial Part 2: Arrays, Dictionaries and Reference Values

**Table of Content**

* [Isopy Arrays](#Isopy-Arrays)
    * [Creating Arrays](#Creating-Arrays)
    * [Array Attributes](#Array-Attributes)
    * [Array Methods](#Array-Methods)
    * [Array Functions](#Array-Functions)
* [Isopy Dictionaries](#Isopy-Dictionaries)
* [Reference Values](#Reference-Values)

In [1]:
import isopy
import numpy as np
import pyperclip # library for interacting with the clipboard

## Isopy Arrays
An isopy array can be seen as a table of data with a certain number of rows and columns. Each column has a key in the form of an isopy key string. These arrays allow you to easily manipulate data while keeping track of what the values represents. Technically, isopy arrays are a custom view of a structured numpy array. This means that they inherit much of the functionality of a numpy array. 

### Creating Arrays
You can create arrays directly using the different array flavours, e.g. ``IsotopeArray``, or by using the ``array`` and ``asarray`` functions. Isopy arrays can be created from a range of different data, described below.

---
When the input is a list/tuple or a numpy array we have to pass along the keys for each column in the input

In [50]:
data = [10, 20, 30] # Produces a 0-dimensional array
isopy.array(data, ['ru', 'pd', 'cd']) 

(row) , Ru , Pd , Cd 
None  , 10 , 20 , 30 

In [44]:
data = np.array([[10, 20, 30], [11, 21, 31]]) #Produces a 1-dimensional array
isopy.array(data, ['ru', 'pd', 'cd']) 

(row) , Ru , Pd , Cd 
0     , 10 , 20 , 30 
1     , 11 , 21 , 31 

**Note** The data type of each column defaults to ``numpy.float64`` for values that do not have a numpy dtype. If a value cannot be represented as a float the default data type inferred by numpy will be used. In the examples above the first array created will use the default data type while the second array will inherit the data type of the input, ``np.int32`` in this case.

---
Using the ``dtype`` keyword you can specify the data type for columns in the array. Any data type accepted by numpy is valid. You can either pass a single data type or a tuple of data types. The latter will use the first the data type which is valid for all the data in the column.

In [20]:
isopy.array([10, 20, 30], ['ru', 'pd', 'cd']).dtype # f8 stands for np.float64

dtype([('Ru', '<f8'), ('Pd', '<f8'), ('Cd', '<f8')])

In [21]:
isopy.array([10, 20, 30], ['ru', 'pd', 'cd'], dtype=np.int32).dtype # i4 stands for np.int32

dtype([('Ru', '<i4'), ('Pd', '<i4'), ('Cd', '<i4')])

In [30]:
isopy.array(['ten', 20, 30], ['ru', 'pd', 'cd'], dtype=(np.int32, str)).dtype # U stands for unicode string and the next number is the maximum length

dtype([('Ru', '<U3'), ('Pd', '<i4'), ('Cd', '<i4')])

To specify different data types for different columns pass a list of equal length to the number of columns.

In [24]:
isopy.array([10, 20, 30], ['ru', 'pd', 'cd'], dtype=[str, np.int32, np.float64]).dtype

dtype([('Ru', '<U2'), ('Pd', '<i4'), ('Cd', '<f8')])

---
Using the ``ndim`` keyword you can specify the number of dimensions of the array. To return a 0-dimensional array if possible, otherwise return a 1-dimensional array specify ``ndim`` as ``-1``. The row number of 0-dimensional arrays will appear as ``"None"`` in the ``repr()`` output of an array. You can also check the dimensionality of an array using the ``ndim`` attribute.

In [25]:
isopy.array([10, 20, 30], ['ru', 'pd', 'cd'], ndim=0) # Make 0-dimensional

(row) , Ru , Pd , Cd 
None  , 10 , 20 , 30 

In [26]:
isopy.array([10, 20, 30], ['ru', 'pd', 'cd'], ndim=1) # Make 1-dimensional

(row) , Ru , Pd , Cd 
0     , 10 , 20 , 30 

In [28]:
isopy.array([10, 20, 30], ['ru', 'pd', 'cd'], ndim=-1) # Make 0-dimesional if possible otherwise 1-dimensional.

(row) , Ru , Pd , Cd 
None  , 10 , 20 , 30 

In [27]:
isopy.array([[10, 20, 30], [11, 21, 31]], ['ru', 'pd', 'cd'], ndim = -1) 

(row) , Ru , Pd , Cd 
0     , 10 , 20 , 30 
1     , 11 , 21 , 31 

---
If the input is a dictionary or a structured numpy array the name of each column will be automatically inferred from the first argument.

In [46]:
data = dict(ru = [10, 11], pd= [20, 21], cd = [30, 31])
isopy.array(data)

(row) , Ru , Pd , Cd 
0     , 10 , 20 , 30 
1     , 11 , 21 , 31 

In [47]:
data = np.array([(10, 20, 30), (11, 21, 31)], dtype = [('ru', float), ('pd', float), ('cd', float)])
isopy.array(data)

(row) , Ru , Pd , Cd 
0     , 10 , 20 , 30 
1     , 11 , 21 , 31 

You can overwrite the inferred column keys by passing keys during creation

In [49]:
data = dict(ru = [10, 11], pd= [20, 21], cd = [30, 31])
isopy.array(data, ['101ru', '105pd', '111cd'])

(row) , 101Ru , 105Pd , 111Cd 
0     , 10    , 20    , 30    
1     , 11    , 21    , 31    

---
You can also create arrays from existing isopy arrays

In [57]:
a = isopy.array([10, 20, 30], ['ru', 'pd', 'cd']) 
isopy.array(a)

(row) , Ru , Pd , Cd 
None  , 10 , 20 , 30 

There key difference between ``array`` and ``asarray`` is that if the first argument is an isopy array then  ``asarray`` will return a reference that array, rather than a copy, if no other arguments are given while ``array`` will return a copy.

In [2]:
a = isopy.array([10, 20, 30], ['ru', 'pd', 'cd']) 
isopy.array(a) is a, isopy.asarray(a) is a

(False, True)

#### Filled Arrays
You can create an array of uninitiated values, zeros or ones using the ``empty``, ``zeros`` and ``one`` functions.

In [17]:
isopy.empty(None, ['ru', 'pd', 'cd']) #None, or -1, creates a 0-dimensional array

(row) , Ru , Pd          , Cd 
None  , 0  , 8.2859e-312 , -0 

In [19]:
isopy.zeros(1, ['ru', 'pd', 'cd'])

(row) , Ru , Pd , Cd 
0     , 0  , 0  , 0  

In [65]:
isopy.ones(2, ['ru', 'pd', 'cd'])

(row) , Ru , Pd , Cd 
0     , 1  , 1  , 1  
1     , 1  , 1  , 1  

To create an array filled with a specific value use the ``full`` function. The second arguement is the fill value. This can either be a single value used for all rows in the column or a sequence of values of of the same length as the number of rows.

In [6]:
isopy.full(2, np.nan, ['ru', 'pd', 'cd'])

(row) , Ru  , Pd  , Cd  
0     , nan , nan , nan 
1     , nan , nan , nan 

In [5]:
isopy.full(2, [1,2], ['ru', 'pd', 'cd'])

(row) , Ru , Pd , Cd 
0     , 1  , 1  , 1  
1     , 2  , 2  , 2  

---
If no keys are given, or can be inferred, a normal numpy array is returned.

In [3]:
isopy.ones(5) # same as np.ones(5)

array([1., 1., 1., 1., 1.])

---
#### Random Arrays
To create an array of random values use the ``random`` function. The second argument is either a single argument or a tuple of arguments that will be passed to the random generator. By default this function draws values from a normal distribution. The following example draws values from a normal distribution with a center of 1 and standard deviation of 0.1

In [12]:
isopy.random(10, (1, 0.1), ['ru', 'pd', 'cd'])

(row) , Ru      , Pd      , Cd      
0     , 0.96959 , 1.0281  , 0.79312 
1     , 0.95192 , 1.0449  , 1.0856  
2     , 1.0982  , 0.97543 , 0.94024 
3     , 1.0573  , 0.9973  , 0.8874  
4     , 0.94976 , 1.019   , 0.78627 
5     , 1.099   , 0.99962 , 1.0155  
6     , 1.1471  , 0.86524 , 1.0227  
7     , 1.0914  , 1.0253  , 0.93167 
8     , 1.0465  , 0.90448 , 0.94372 
9     , 0.91533 , 0.95526 , 0.99472 

You can specify different distributions for different columns by passing a list as the second argument

In [13]:
isopy.random(10, [(1, 0.1), (0,1), (10, 1)], ['ru', 'pd', 'cd'])

(row) , Ru      , Pd       , Cd     
0     , 1.0187  , 2.4206   , 11.091 
1     , 1.1119  , -1.0512  , 9.2026 
2     , 0.92309 , 0.26226  , 10.231 
3     , 1.0747  , 0.31492  , 10.051 
4     , 0.89444 , 1.4324   , 10.175 
5     , 0.87109 , 0.68955  , 9.2957 
6     , 1.0984  , -0.70649 , 10.48  
7     , 1.2338  , 0.40023  , 10.142 
8     , 1.0538  , 1.0639   , 10.11  
9     , 0.9434  , -0.50517 , 10.73  

For examples on how to change the type of distribution values are generated from have a look at the reference documentation here **(Link missing)**

---
If no keys are given, or can be inferred, a normal numpy array is returned.

In [4]:
isopy.random(10)

array([-0.66929881, -0.66053013,  0.93081883, -1.07150801, -0.23177742,
       -0.93340998, -0.58119726, -0.45745434, -0.30148965,  1.02422904])

### Array Attributes
Since isopy arrays are custom implementation of a numpy arrays they have all the attributes you would find in numpy arrays, e.g. ``size``, ``.ndim``, ``.shape`` and ``.dtype``.

In [35]:
a = isopy.array(dict(ru = [10, 11], pd= [20, 21], cd = [30, 31]))
a.size, a.ndim, a.shape, a.dtype

(2, 1, (2,), dtype([('Ru', '<f8'), ('Pd', '<f8'), ('Cd', '<f8')]))

In addition to the numpy attributes ``.nrows`` and ``.ncols`` are also available for isopy arrays. There return the number of rows and number of columns in the array respectively.

In [36]:
a.nrows, a.ncols

(2, 3)

**Note** That ``.size`` will return ``1`` for both 0-dimensional arrays and 1-dimensional arrays with 1 row.  ``.nrows`` on the other hand will return ``-1`` 0-dimensional arrays.

In [5]:
a = isopy.array(dict(ru = 10, pd= 20, cd = 30))
a.size, a.nrows

(1, -1)

The column keys are available through the ``.keys`` attribute

In [38]:
a.keys # a.keys() also works fine

ElementKeyList('Ru', 'Pd', 'Cd')

### Array Methods
While isopy arrays also contain all the method found in numpy arrays many of these are not relevant to isopy arrays and may therefore not work as expected if at all. See the reference documentation for a list of all methods that have been implemented for isopy arrays. Any methods not listed there should be used with **caution** as the behavior is undefined.

---
Isopy arrays have a number of methods that mimic those found in dictionaries. In addition to the ``.keys`` attribute, that can be used as a method, arrays also have ``values()``, ``items()`` and ``get()`` methods.

In [6]:
a = isopy.array(dict(ru = [10, 11], pd= [20, 21], cd = [30, 31]))
a.values()

(array([10., 11.]), array([20., 21.]), array([30., 31.]))

In [7]:
a.items()

((ElementKeyString('Ru'), array([10., 11.])),
 (ElementKeyString('Pd'), array([20., 21.])),
 (ElementKeyString('Cd'), array([30., 31.])))

**Note** both``values()`` and ``items()`` both return a tuple (Unlike dictionaries where they return iterators). 

In [45]:
a.get('ru')

array([10., 11.])

If a column with the specified key is not present in the array a default value is return with the same shape as a column in the array.

In [46]:
a.get('ag') # If not specified the default value is np.nan

array([nan, nan])

In [48]:
a.get('ag', 40) # Second argument is the default value

array([40, 40])

In [50]:
a.get('ag', [40, 41]) # A sequence the same shape as a valid column is also accepted

array([40, 41])

---
The ``copy()`` method can be used to return a copy of the array

In [3]:
a = isopy.array(dict(ru101 = [10, 11], pd105= [20, 21], cd111 = [30, 31]))
b = a.copy()
b is a, a == b

(False, True)

You can copy only those columns that meet a certain criteria by passing filter keywords. See the ``filter()`` method for the different key lists for available filter keywords.

In [4]:
a = isopy.array(dict(ru101 = [10, 11], pd105= [20, 21], cd111 = [30, 31]))
a.copy(mass_number_gt = 104) # Return only the columns with that have a mass number greater than 104

(row) , 105Pd , 111Cd 
0     , 20    , 30    
1     , 21    , 31    

---
You can create a ratio from data within an array using the ``ratio()`` method

In [4]:
c = a.ratio('105Pd'); c

(row) , 101Ru/105Pd , 111Cd/105Pd 
0     , 0.5         , 1.5         
1     , 0.52381     , 1.4762      

Ratio arrays have a ``deratio()`` method for flattening a ratio array. This requires that all column keys in the array have a common denominator

In [5]:
c.deratio()

(row) , 101Ru   , 111Cd  , 105Pd 
0     , 0.5     , 1.5    , 1     
1     , 0.52381 , 1.4762 , 1     

In [7]:
c.deratio([20, 21]) #You can specify the value(s) for the denominator

(row) , 101Ru , 111Cd , 105Pd 
0     , 10    , 30    , 20    
1     , 11    , 31    , 21    

---
The  ``normalise`` function allows you to normalise the data in the array to a certian value. Calling the function without any arguments will normalise all the values to that the sum of each row is ``1``

In [10]:
a = isopy.array(dict(ru = [10, 11], pd= [20, 21], cd = [30, 31]))
a.normalise()

(row) , Ru      , Pd      , Cd      
0     , 0.16667 , 0.33333 , 0.5     
1     , 0.1746  , 0.33333 , 0.49206 

The optional arguments are 1) the value you with to normalise to and 2) the key(s) of the columns that the normalisation should be based on.

In [11]:
a.normalise(100, 'pd') # Normalise values in column Pd to 100.

(row) , Ru     , Pd  , Cd     
0     , 50     , 100 , 150    
1     , 52.381 , 100 , 147.62 

In [19]:
a.normalise([100, 1000], ['ru', 'cd']) # The sum of the specified keys will be equal to the values given

(row) , Ru    , Pd  , Cd    
0     , 25    , 50  , 75    
1     , 261.9 , 500 , 738.1 

In [16]:
a.normalise([100, 1000], ['ru', 'cd']).normalise([20, 21], 'pd')

(row) , Ru , Pd , Cd 
0     , 10 , 20 , 30 
1     , 11 , 21 , 31 

---
You can convert an isopy array into a text string using the ``to_text()`` method

In [25]:
a = isopy.array(dict(ru = [10, 11], pd= [20, 21], cd = [30, 31]))
a.to_text()

'Ru   , Pd   , Cd   \n10.0 , 20.0 , 30.0 \n11.0 , 21.0 , 31.0 '

In [24]:
print(a.to_text()) # Same as print(a)

Ru   , Pd   , Cd   
10.0 , 20.0 , 30.0 
11.0 , 21.0 , 31.0 


There are a number of optional arguments you can specify to change things like number formats and the delimiter for the string

In [28]:
a.to_text(delimiter = '\t', include_row=True) #includes the row number and uses a tab delimiter

'(row) \tRu   \tPd   \tCd   \n0     \t10.0 \t20.0 \t30.0 \n1     \t11.0 \t21.0 \t31.0 '

The method ``to_clipboard()`` takes the same arguments as ``to_text()`` but copies the string to the 

In [30]:
a.to_clipboard() # It also returns the copied string

'Ru   , Pd   , Cd   \n10.0 , 20.0 , 30.0 \n11.0 , 21.0 , 31.0 '

In [32]:
pyperclip.paste() # Paste whatever is currently is in the clipboard

'Ru   , Pd   , Cd   \n10.0 , 20.0 , 30.0 \n11.0 , 21.0 , 31.0 '

---
There are a number of methods for converting isopy arrays into other python objects

In [33]:
a = isopy.array(dict(ru = [10, 11], pd= [20, 21], cd = [30, 31]))
a.to_list() # Convert array to a list

[[10.0, 20.0, 30.0], [11.0, 21.0, 31.0]]

In [34]:
a.to_dict() # Converts array into a dictionary

{'Ru': [10.0, 11.0], 'Pd': [20.0, 21.0], 'Cd': [30.0, 31.0]}

In [36]:
a.to_ndarray() # Converts array into a structured numpy array

array([(10., 20., 30.), (11., 21., 31.)],
      dtype=[('Ru', '<f8'), ('Pd', '<f8'), ('Cd', '<f8')])

The methods ``from_csv()``/``to_csv()`` and ``from_xlsx()``/``to_xlsx()`` allows you to import/export arrays from csv and excel files. See [Introduction Part 4: Importing and exporting data]() for more information on importing and exporting arrays.

### Array Functions
The isopy package comes with several custom made array functions and isopy arrays support a large number of the numpy array functions. An array function is a function that perform an action on one or more arrays, e.g. adding arrays together of finding the mean of values in an array. See [Introduction Part 3: Working with arrays]() **LINK MISSING** for a comprehensive explanation of array functions with lots examples.

A few quick examples are

In [51]:
a1 = isopy.array(dict(ru = 1, pd= 2, cd = 3))
a2 = isopy.array(dict(ru = 10, pd= 20, ag = 25))
a1 + a2 # Columns not present in all arrays are assinged a value of np.nan

(row) , Ru , Pd , Cd  , Ag  
None  , 11 , 22 , nan , nan 

In [44]:
a = isopy.random(100, [(1, 0.1), (0, 1), (10, 2)], ['ru', 'pd', 'cd'])
np.mean(a) #Calculate the mean of each column

(row) , Ru      , Pd       , Cd     
None  , 0.99613 , 0.060085 , 10.023 

In [46]:
np.std(a) # Calculate the standard deviation of each column

(row) , Ru       , Pd      , Cd     
None  , 0.090816 , 0.96784 , 1.9129 

---
One useful feature of the array function implementation for isopy arrays is that they can be used in conjunction with dictionaries. Only the keys in the dictionary found in the array will be used making dictionaries useful for storing for example reference values.

In [53]:
a = isopy.array(dict(ru = 10, pd= 20, cd = 30))
d = dict(ru = 1, rh = 2, pd=3, ag=4, cd = 5)
a + d # Only column keys in the array are present in the output

(row) , Ru , Pd , Cd 
None  , 11 , 23 , 35 

**Note** The dictionary keys do not need to formatted to match the proper key string format.

## Isopy Dictionaries

Isopy has two special dictionaries. ``IsopyDict`` functions like normal dictionary with a few enhancements. First, all values are stored as isopy key strings. Second, they can be readonly and have predefined default values. Third, create a subsection of the dictionary using ``get()`` by passing filter keywords.

In [73]:
d = isopy.IsopyDict(ru101 = 1, rh103 = 2, pd105=3, ag107=4, cd111 = 5, default_value=0) # The first input can also be a another dictionary
d

IsopyDict(default_value = 0, readonly = False,
{"101Ru": 1
"103Rh": 2
"105Pd": 3
"107Ag": 4
"111Cd": 5})

In [74]:
d.get('pd'), d.get('ge') # No need to specify a default value since it was defined during creation

(0, 0)

In [71]:
d = isopy.IsopyDict(ru101 = 1, rh103 = 2, pd105=3, ag107=4, cd111 = 5, default_value=0)
d.get(mass_number_gt = 104) # Returns a new dict containing only isotopes with a mass number greater than 104

IsopyDict(default_value = 0, readonly = False,
{"105Pd": 3
"107Ag": 4
"111Cd": 5})

---
``ScalarDict`` works just like an ``IsopyDict`` with two exceptions. First it can only store scalars, that is a single numerical value. Second, the ``get()`` method can calculate the ratio of two values in the dictionary if a ratio key string is not present in the dictionary.

In [8]:
d = isopy.ScalarDict(ru101 = 1, rh103 = 2, pd105=3, ag107=4, cd111 = 5) # Default value is by default np.nan
d.get('pd105/cd111') # Automatically calculated from the numerator and denominator values

0.6

## Reference Values
There are a number of reference values included with isopy under the ``refval`` namespace. You can find the available reference values listed [here](https://isopy.readthedocs.io/en/latest/refpages/reference_values.html) toghether with a short description. There are currently three categories of reference values, ``mass``, ``element`` and ``isotope`` referring to the flavour of the key string of the values in the dictionaries. 

In [26]:
isopy.refval.element.atomic_number.get('pd') # The atomic number of palladium

46

In [22]:
isopy.refval.isotope.abundance.get('105pd') # The natural abundance of 105Pd relative to all palladium isotopes

0.2233

In [19]:
isopy.refval.element.isotopes.get('pd') # Returns a list of all naturally occuring isotopes of palladium

IsotopeKeyList('102Pd', '104Pd', '105Pd', '106Pd', '108Pd', '110Pd')

---
All reference values are isopy dictionaries so the ``get()`` method accepts filter keywords

In [27]:
isopy.refval.isotope.abundance.get(element_symbol='pd')

ScalarDict(default_value = nan, readonly = False,
{"102Pd": 0.0102
"104Pd": 0.1114
"105Pd": 0.2233
"106Pd": 0.2733
"108Pd": 0.2646
"110Pd": 0.1172})

---
Creating an array from reference values is as simple as

In [18]:
isopy.array(isopy.refval.isotope.abundance.get(element_symbol='pd'))

(row) , 102Pd  , 104Pd  , 105Pd  , 106Pd  , 108Pd  , 110Pd  
None  , 0.0102 , 0.1114 , 0.2233 , 0.2733 , 0.2646 , 0.1172 

Reference values are particularly useful in combination with array frunctions

In [16]:
a = isopy.ones(None, isopy.refval.isotope.abundance.get(element_symbol='pd'))
a * isopy.refval.isotope.abundance

(row) , 102Pd  , 104Pd  , 105Pd  , 106Pd  , 108Pd  , 110Pd  
None  , 0.0102 , 0.1114 , 0.2233 , 0.2733 , 0.2646 , 0.1172 