## Create NDDataset objects

In [1]:
from spectrochempy.api import *


        SpectroChemPy's API
        Version   : 0.1a2.post71
        Copyright : 2014-2017 - LCS (Laboratory for Catalysis and Spectrochempy)
            


Multidimensional array are defined in Spectrochempy using the ``NDDataset`` object.

``NDDataset`` objects mostly behave as numpy's `numpy.ndarray`.

However, unlike raw numpy's ndarray, the presence of optional properties such
as `uncertainty`, `mask`, `units`, `axes`, and axes `labels` make them
(hopefully) more appropriate for handling spectroscopic information, one of
the major objectives of the SpectroChemPy package.

Additional metadata can also be added to the instances of this class through the
`meta` properties.

### Create a ND-Dataset from scratch

In the following example, a minimal 1D dataset is created from a simple list, to which we can add some metadata:

In [2]:
da = NDDataset([1,2,3])
da.title = 'intensity'   
da.description = 'Some experimental measurements'
da.units = 'dimensionless'
da

0,1
Id/Name,4decda46
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-03 14:45:50.785499
,
Last Modified,2017-11-03 14:45:50.801072
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3 (complex)
,
Units,dimensionless
,
Values,[ 1 2 3]
,


Except few addtional metadata such `author`, `created` ..., there is not much
differences with respect to a conventional `numpy.ndarray`. For example, one
can apply numpy ufunc's directly to a NDDataset or make basic arithmetic
operation with these objects:

In [3]:
da2 = np.sqrt(da**3)
da2

0,1
Id/Name,4df2f1fe
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-03 14:45:50.825607
,
Last Modified,2017-11-03 14:45:50.825806
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3 (complex)
,
Units,dimensionless
,
Values,[ 1.000 2.828 5.196]
,


In [4]:
da3 = da + da/2.
da3

0,1
Id/Name,4df79e18
,
Author,christian@MacBook-Pro-de-Christian.local
,
Created,2017-11-03 14:45:50.856210
,
Last Modified,2017-11-03 14:45:50.856517
,
Description,Some experimental measurements
,

0,1
Title,intensity
,
Size,3 (complex)
,
Units,dimensionless
,
Values,[ 1.500 3.000 4.500]
,


### Create a NDDataset : full example

There are many ways to create |NDDataset| objects.

Above we have created a NDDataset from a simple list, but it is generally more
convenient to create `numpy.ndarray`).

Below is an example of a 3D-Dataset created from a ``numpy.ndarray`` to which axes can be added. 

Let's first create the 3 one-dimensional axes, for which we can define labels, units, and masks! 

In [5]:
axe0 = Axis(coords = np.linspace(200., 300., 3),
            labels = ['cold', 'normal', 'hot'],
            mask = None,
            units = "K",
            title = 'temperature')

axe1 = Axis(coords = np.linspace(0., 60., 100),
            labels = None,
            mask = None,
            units = "minutes",
            title = 'time-on-stream')

axe2 = Axis(coords = np.linspace(4000., 1000., 100),
            labels = None,
            mask = None,
            units = "cm^-1",
            title = 'wavenumber')

 ERROR | NameError: name 'Axis' is not defined


Here is the displayed info for axe1 for instance:

In [None]:
axe1

Now we create some 3D data (a ``numpy.ndarray``):

In [None]:
nd_data=np.array([np.array([np.sin(axe2.data*2.*np.pi/4000.)*np.exp(-y/60.) for y in axe1.data])*float(t) 
         for t in axe0.data])**2

The dataset is now created with these data and axis. All needed information are passed as parameter of the 
NDDataset instance constructor. 

In [None]:
mydataset = NDDataset(nd_data,
               axes = [axe0, axe1, axe2],
               title='Absorbance',
               units='absorbance'
              )

mydataset.description = """Dataset example created for this tutorial. 
It's a 3-D dataset (with dimensionless intensity)"""

mydataset.author = 'Tintin and Milou'

We can get some information about this object:

In [None]:
mydataset

### Copying existing NDDataset

To copy an existing dataset, this is as simple as:

In [None]:
da_copy = da.copy()

or alternatively:

In [None]:
da_copy = da[:]

Finally, it is also possible to initialize a dataset using an existing one:

In [None]:
dc = NDDataset(da3, title='Absorbance')
dc

#### See also

Any numpy creation function can be used to set up the initial dataset array:
       [numpy array creation routines](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation)



### Importing from external dataset

NDDataset can be created from the importation of external data

In [None]:
import os
source = NDDataset.read_omnic(os.path.join(data, 'irdata', 'NH4Y-activation.SPG'))
source

## Slicing a NDDataset

NDDataset can be sliced like conventional numpy-array...

*e.g.,*:

1. by index, using a slice such as [3], [0:10], [:, 3:4], [..., 5:10], ...

2. by values, using a slice such as [3000.0:3500.0], [..., 300.0], ...

3. by labels, using a slice such as ['monday':'friday'], ...

In [None]:
new = mydataset[..., 0]
new

or using the axes labels:

In [None]:
new = mydataset['hot']

Single-element dimension are kept but can also be squeezed easily:

In [None]:
new = new.squeeze()
new

Be sure to use the correct type for slicing.

Floats are use for slicing by values

In [None]:
correct = mydataset[...,2000.]

In [None]:
outside_limits = mydataset[...,10000.]

<div class='alert alert-info'>**NOTE:**

If one use an integer value (2000), then the slicing is made **by index not by value**, and in the following particular case, an `IndexError` is issued as index 2000 does not exists (size along axis -1 is only 100, so that index vary between 0 and 99!).

</div>

When slicing by index, an error is generated is the index is out of limits:

In [None]:
try:
    fail = mydataset[...,2000]
except IndexError as e:
    log.error(e)

One can mixed slicing methods for different dimension:

In [None]:
new = mydataset['normal':'hot', 0, 4000.0:2000.]
new


## Loading of experimental data


### NMR Data

Now, lets load a NMR dataset (in the Bruker format).

The builtin **data** variable contains a path to our *test*'s data:

In [None]:
# let check if this directory exists and display its actual content:
import os
if os.path.exists(data):
    l = list_data
print(list_data)

In [None]:
path = os.path.join(data, 'nmrdata','bruker', 'tests', 'nmr','bruker_1d')

# load the data in a new dataset
ndd = NDDataset()
ndd.read_bruker_nmr(path, expno=1, remove_digital_filter=True)
ndd

In [None]:
# view it...
figure()
ndd.plot()
show()  # in notebooks this is not required, as figure are showed automatically

In [None]:
path = os.path.join(data, 'nmrdata','bruker', 'tests', 'nmr','bruker_2d')

# load the data directly (no need to create the dataset first)
ndd2 = NDDataset.read_bruker_nmr(path, expno=1, remove_digital_filter=True)

# view it...
ndd2.x.to('s')
ndd2.y.to('ms')

figure()
fig2 = ndd2.plot() 

### IR data

In [None]:
source = NDDataset.read_omnic(os.path.join(data, 'irdata', 'NH4Y-activation.SPG'))
source

In [None]:
source = read_omnic(NDDataset(), os.path.join(data, 'irdata', 'NH4Y-activation.SPG'))

In [None]:
figure()
fig = source.plot(kind='stack')


## Transposition

Dataset can be transposed

In [None]:
newT = new.T
newT

## Units


Spectrochempy can do calculations with units - it uses [pint](https://pint.readthedocs.io) to define and perform operation on data with units.

### Create quantities

* to create quantity, use for instance, one of the following expression:

In [None]:
Quantity('10.0 cm^-1')

In [None]:
Quantity(1.0, 'cm^-1/hour')

In [None]:
Quantity(10.0, ur.cm/ur.km)

or may be (?) simpler,

In [None]:
10.0 * ur.meter/ur.gram/ur.volt

`ur` stands for **unit registry**, which handle many type of units
(and conversion between them)

### Do arithmetics with units

In [None]:
a = 900 * ur.km
b = 4.5 * ur.hours
a/b

Such calculations can also be done using the following syntax, using a string expression

In [None]:
Quantity("900 km / (8 hours)")

### Convert between units

In [None]:
c = a/b
c.to('cm/s')

We can make the conversion *inplace* using *ito* instead of *to*

In [None]:
c.ito('m/s')
c

### Do math operations with consistent units

In [None]:
x = 10 * ur.radians
np.sin(x)

Consistency of the units are checked!

In [None]:
x = 10 * ur.meters
np.sqrt(x)

but this is wrong...

In [None]:
x = 10 * ur.meters
try:
    np.cos(x)
except DimensionalityError as e:
    log.error(e)

Units can be set for NDDataset data and/or Axes

In [None]:
ds = NDDataset([1., 2., 3.], units='g/cm^3', title='concentration')
ds

In [None]:
ds.to('kg/m^3')

## Uncertainties

Spectrochempy can do calculations with uncertainties (and units).

A quantity, with an `uncertainty` is called a **Measurement** .

Use one of the following expression to create such `Measurement`:

In [None]:
#Measurement(10.0, .2, 'cm')    TO FINISH (format doesn't work)

In [None]:
# Quantity(10.0, 'cm').pluminus(.2)   TO FINISH

## Numpy universal functions (ufunc's)

A numpy universal function (or `numpy.ufunc` for short) is a function that
operates on `numpy.ndarray` in an element-by-element fashion. It's
vectorized and so rather fast.

As SpectroChemPy NDDataset imitate the behaviour of numpy objects, many numpy
ufuncs can be applied directly.

For example, if you need all the elements of a NDDataset to be changed to the
squared rooted values, you can use the `numpy.sqrt` function:

In [None]:
da = NDDataset([1., 2., 3.])
da_sqrt = np.sqrt(da)
da_sqrt

### Ufuns with NDDataset with units

When NDDataset have units, some restrictions apply on the use of ufuncs:

Some function functions accept only dimensionless quantities. This is the
case for example of logarithmic functions: :`exp` and `log`.

In [None]:
np.log10(da)

In [None]:
da.units = ur.cm

try:
    np.log10(da)
except DimensionalityError as e:
    log.error(e)

## Complex or hypercomplex NDDatasets


NDDataset objects with complex data are handled differently than in
`numpy.ndarray`.

Instead, complex data are stored by interlacing the real and imaginary part.
This allows the definition of data that can be complex in several axis, and *e
.g.,* allows 2D-hypercomplex array that can be transposed (useful for NMR data).

In [None]:
da = NDDataset([  [1.+2.j, 2.+0j], [1.3+2.j, 2.+0.5j],
...                   [1.+4.2j, 2.+3j], [5.+4.2j, 2.+3j ] ])
da

if the dataset is also complex in the first dimension (columns) then we
should have (note the shape description!):