# Data Structures and I/O

In [None]:
# This cells setups the environment when executed in Google Colab.
try:
    import google.colab
    !curl -s https://raw.githubusercontent.com/ibs-lab/cedalion/colab_setup/scripts/colab_setup.py -o colab_setup.py
    # Select branch with --branch "branch name" (default is "dev")
    %run colab_setup.py
except ImportError:
    pass

In [None]:
import numpy as np
import pandas as pd
import xarray as xr
import tempfile
from pathlib import Path

import cedalion
import cedalion.io
import cedalion.datasets
import cedalion.nirs
import cedalion.xrutils as xrutils

pd.set_option('display.max_rows', 10)
xr.set_options(display_expand_data=False);

In [None]:
# helper function
def calc_concentratoin(rec):
    od = cedalion.nirs.int2od(rec["amp"])
    dpf = xr.DataArray([6, 6], dims="wavelength", coords={"wavelength" : od.wavelength})
    return cedalion.nirs.od2conc(od, rec.geo3d, dpf)

## Reading Snirf Files

Snirf files can be loaded with the `cedalion.io.read_snirf` method. This returns a list of `cedalion.dataclasses.Recording` objects. The 

In [None]:
path_to_snirf_file = cedalion.datasets.get_fingertapping_snirf_path()

recordings = cedalion.io.read_snirf(path_to_snirf_file)

display(path_to_snirf_file)
display(recordings)
display(len(recordings))

## Accessing example datasets

Example datasets are accessible through functions in `cedalion.datasets`. These take care of downloading, caching and updating the data files. Often they also already load the data.

In [None]:
rec = cedalion.datasets.get_fingertapping()
display(rec)

## Recording containers

The class `cedalion.dataclasses.Recording` is Cedalion's **main data container** to carry related data objects through the program. 
It can store time series, masks, auxiliary timeseries, probe, headmodel and stimulus information as well as meta data about the recording.
It has the following properties:



| field      | description                                                | 
|------------|------------------------------------------------------------|
| timeseries | a dictionary of timeseries objects                         |  
| masks      | a dictionary of masks that flag time points as good or bad | 
| geo3d      | 3D probe geometry                                          | 
| geo2d      | 2D probe geometry                                          | 
| stim       | dataframe with stimulus information                        |
| aux_tx     | dictionary of auxiliary time series objects                |
| aux_tx     | dictionary for any other auxiliary objects                 |
| head_model | voxel image, cortex and scalp surfaces                     |
| meta_data  | dictionary for meta data                                   |

* container is very similar to the layout of a snirf file
* `Recording` maps mainly to nirs groups
* timeseries objects map to data elements


### Dictionaries in `Recording`

- dictionaries are key value stores
- maintain order in which values are added -> facilitate workflows
- the user differentiates time series by name. 
- names are free to choose but there are a few **canonical names** used by `read_snirf` and expected by `write_snirf`:

| data type                         | canonical name|  
|-----------------------------------|---------------|
|  unprocessed raw                  | "amp"         |
|  processed raw                    | "amp"         |
|  processed dOD                    | "od"          |
|  processed concentrations         | "conc"        |
|  processed central moments"       | "moments"     |
|  processed blood flow inddata_structures_oldex       | "bfi"         |
|  processed HRF dOD                | "hrf_od"      |
|  processed HRF central moments    | "hrf_moments" |
|  processed HRF concentrations"    | "hrf_conc"    |
|  processed HRF blood flow index   | "hrf_bfi"     |
|  processed absorption coefficient | "mua"         |
|  processed scattering coefficient | "musp"        |
  
  



### Inspecting a Recording container

In [None]:
display(rec.timeseries.keys())
display(type(rec.timeseries["amp"]))

In [None]:
rec.meta_data

Shortcut for accessing time series:

In [None]:
rec["amp"] is rec.timeseries["amp"]

## Time Series

<center>
<img src="../img/recording/ndarray.png">
</center>

- mulitvariate time series are stored in `xarray.DataArrays`
- if it has dimensions 'channel' and 'time' we call it a `NDTimeSeries`
- named dimensions
- coordinates
- physical units



In [None]:
rec["amp"]

In [None]:
rec["conc"] = calc_concentratoin(rec)
display(rec["conc"])

## Probe Geometry - geo3D

- labeled points stored in 2D array
- if it has a 'label' dimension and 'label' and 'type' coordinates we call it a `LabeledPointCloud`

In [None]:
rec.geo3d

## Xarray functionality

Specify axis by name:

In [None]:
amp = rec["amp"]

amp.mean("time") 

get the second channel formed by S1 and D2:

In [None]:
amp[1, :, :] # location-based indexing
amp.loc["S1D2", :, :] # label-based indexing
amp.sel(channel="S1D2") # label-based indexing

Joins between two arrays:

In [None]:
rec.geo3d.loc[amp.source]

In [None]:
distances = xrutils.norm(rec.geo3d.loc[amp.source] - rec.geo3d.loc[amp.detector], "digitized")
display(distances)

Physical units:

In [None]:
rec.masks["distance_mask"] = distances > 1.5 * cedalion.units.cm
display(rec.masks["distance_mask"])

Additional functionality through accessors:

In [None]:
distances.pint.to("mm")

## Writing snirf files

- pass `Recording` object to `cedalion.io.write_snirf`
- caveat: many `Recording`fields have correspondants in snirf files, but not all.

In [None]:
with tempfile.TemporaryDirectory() as tmpdir:
    output_path = Path(tmpdir).joinpath("test.snirf")

    cedalion.io.write_snirf(output_path, rec)