# Data Structures and I/O

In [1]:
import numpy as np
import pandas as pd
import xarray as xr
import tempfile
from pathlib import Path

import cedalion
import cedalion.io
import cedalion.datasets
import cedalion.nirs
import cedalion.xrutils as xrutils

pd.set_option('display.max_rows', 10)
xr.set_options(display_expand_data=False);

In [2]:
# helper function
def calc_concentratoin(rec):
    od = cedalion.nirs.int2od(rec["amp"])
    dpf = xr.DataArray([6, 6], dims="wavelength", coords={"wavelength" : od.wavelength})
    return cedalion.nirs.od2conc(od, rec.geo3d, dpf)

## Reading Snirf Files

Snirf files can be loaded with the `cedalion.io.read_snirf` method. This returns a list of `cedalion.dataclasses.Recording` objects. The 

In [3]:
path_to_snirf_file = cedalion.datasets.get_fingertapping_snirf_path()

recordings = cedalion.io.read_snirf(path_to_snirf_file)

display(path_to_snirf_file)
display(recordings)
display(len(recordings))

PosixPath('/home/eike/.cache/cedalion/fingertapping.zip.unzip/fingertapping/sub-01_task-tapping_nirs.snirf')

[<Recording |  timeseries: ['amp'],  masks: [],  stim: ['1.0', '15.0', '2.0', '3.0'],  aux_ts: [],  aux_obj: []>]

1

## Accessing example datasets

Example datasets are accessible through functions in `cedalion.datasets`. These take care of downloading, caching and updating the data files. Often they also already load the data.

In [4]:
rec = cedalion.datasets.get_fingertapping()
display(rec)

<Recording |  timeseries: ['amp'],  masks: [],  stim: ['1.0', '15.0', '2.0', '3.0'],  aux_ts: [],  aux_obj: []>

## Recording containers

The class `cedalion.dataclasses.Recording` is Cedalion's **main data container** to carry related data objects through the program. 
It can store time series, masks, auxiliary timeseries, probe, headmodel and stimulus information as well as meta data about the recording.
It has the following properties:



| field      | description                                                | 
|------------|------------------------------------------------------------|
| timeseries | a dictionary of timeseries objects                         |  
| masks      | a dictionary of masks that flag time points as good or bad | 
| geo3d      | 3D probe geometry                                          | 
| geo2d      | 2D probe geometry                                          | 
| stim       | dataframe with stimulus information                        |
| aux_tx     | dictionary of auxiliary time series objects                |
| aux_tx     | dictionary for any other auxiliary objects                 |
| head_model | voxel image, cortex and scalp surfaces                     |
| meta_data  | dictionary for meta data                                   |

* container is very similar to the layout of a snirf file
* `Recording` maps mainly to nirs groups
* timeseries objects map to data elements


### Dictionaries in `Recording`

- dictionaries are key value stores
- maintain order in which values are added -> facilitate workflows
- the user differentiates time series by name. 
- names are free to choose but there are a few **canonical names** used by `read_snirf` and expected by `write_snirf`:

| data type                         | canonical name|  
|-----------------------------------|---------------|
|  unprocessed raw                  | "amp"         |
|  processed raw                    | "amp"         |
|  processed dOD                    | "od"          |
|  processed concentrations         | "conc"        |
|  processed central moments"       | "moments"     |
|  processed blood flow inddata_structures_oldex       | "bfi"         |
|  processed HRF dOD                | "hrf_od"      |
|  processed HRF central moments    | "hrf_moments" |
|  processed HRF concentrations"    | "hrf_conc"    |
|  processed HRF blood flow index   | "hrf_bfi"     |
|  processed absorption coefficient | "mua"         |
|  processed scattering coefficient | "musp"        |
  
  



### Inspecting a Recording container

In [5]:
display(rec.timeseries.keys())
display(type(rec.timeseries["amp"]))

odict_keys(['amp'])

xarray.core.dataarray.DataArray

In [6]:
rec.meta_data

OrderedDict([('SubjectID', 'P1'),
             ('MeasurementDate', '2020-01-01'),
             ('MeasurementTime', '13:16:16Z'),
             ('LengthUnit', 'm'),
             ('TimeUnit', 's'),
             ('FrequencyUnit', 'Hz'),
             ('DateOfBirth', '1986-01-01'),
             ('MNE_coordFrame', 4),
             ('sex', '1')])

Shortcut for accessing time series:

In [7]:
rec["amp"] is rec.timeseries["amp"]

True

## Time Series

<center>
<img src="../img/recording/ndarray.png">
</center>

- mulitvariate time series are stored in `xarray.DataArrays`
- if it has dimensions 'channel' and 'time' we call it a `NDTimeSeries`
- named dimensions
- coordinates
- physical units



In [8]:
rec["amp"]

0,1
Magnitude,[[[0.0913686 0.0909875 0.0910225 ... 0.0941083 0.0940129 0.0944882] [0.1856806 0.186377 0.1836514 ... 0.1856486 0.1850836 0.1842172]] [[0.227516 0.2297024 0.2261366 ... 0.2264519 0.2271665 0.226713] [0.6354927 0.637668 0.6298023 ... 0.6072068 0.6087293 0.6091066]] [[0.1064704 0.1066212 0.1053444 ... 0.121114 0.1205022 0.1205441] [0.2755033 0.2761615 0.2727006 ... 0.2911952 0.2900544 0.2909847]] ... [[0.2027881 0.1996586 0.2004866 ... 0.2318743 0.2311941 0.2330808] [0.4666358 0.4554404 0.4561614 ... 0.4809749 0.4812827 0.4862896]] [[0.4885007 0.4802285 0.4818338 ... 0.6109142 0.6108118 0.613845] [0.8457658 0.825988 0.8259648 ... 0.975894 0.9756599 0.9826459]] [[0.6304559 0.6284427 0.6287045 ... 0.6810626 0.6809573 0.6818709] [1.2285622 1.2205907 1.2190002 ... 1.2729124 1.2727222 1.2755645]]]
Units,volt


In [9]:
rec["conc"] = calc_concentratoin(rec)
display(rec["conc"])

0,1
Magnitude,[[[0.13358239209978304 0.003830283928282647 0.3485663033081772 ... 0.44265804481800175 0.5026875842869235 0.6632956216585129] [-0.7839218209250456 -0.7640062849016895 -0.6372321468890523 ... 0.23076741243710222 0.20512673319586014 0.16981311715887412] [0.11212379127914074 0.0731324774375613 0.23388667595717802 ... 0.1596030122051564 0.19659631898706498 0.12887590142000407] ... [-0.08996858531937893 0.29760277238751065 0.3035425193755599 ... 0.5709685540656639 0.5273568764191445 0.37714069302428915] [0.9042366397918427 1.3400782049340045 1.3813304247520295 ... -0.27197257740205305 -0.2674864774443464 -0.4012836512581511] [-0.18579734906426823 0.4272187017064305 0.6069404693046493 ... -0.3055763013235321 -0.2958772890477925 -0.4927704425869438]] [[0.4302890462699156 0.5291122886883187 0.39408017947069485 ... -0.03812950019946504 -0.048780953537845895 -0.16954954368367178] [0.19718416808530376 0.07475108502790183 0.21473067649191507 ... -0.13048136047941516 -0.15862865728380518 -0.12125825687699171] [1.1069744947570999 1.1055519748193772 1.1824229590870727 ... -0.38395393707435815 -0.34006765489373475 -0.31841423445257] ... [1.1310069656643422 1.1631091585413527 1.1132916692449675 ... -0.6597509742064699 -0.6094788888477872 -0.6460548286777947] [1.8250852563797992 1.8988537984724563 1.8366005072863651 ... -0.855162631152548 -0.8545173888400667 -0.8731183537487607] [3.6718816282381956 3.6383236051532477 3.5444730339882793 ... -1.0728951396816504 -1.0669728023777476 -1.0756417750550116]]]
Units,micromolar


## Probe Geometry - geo3D

- labeled points stored in 2D array
- if it has a 'label' dimension and 'label' and 'type' coordinates we call it a `LabeledPointCloud`

In [10]:
rec.geo3d

0,1
Magnitude,[[-0.041613204679326624 0.026799775287857947 0.1299043936308115] [-0.06476686499872276 0.05814256998996063 0.0908425773727145] [-0.07120554551675068 -0.012874272652217859 0.10787860947691345] [-0.0859043654400404 0.018971698468891116 0.06509762433137256] [0.03694171596700852 0.02748380530252158 0.13022129709104263] [0.06065133742692848 0.05882414589197514 0.09117717995727878] [0.06712771392323756 -0.012199231886346213 0.1085725493643022] [0.08188685574250908 0.020427932162352107 0.06571325110115192] [-0.037619588707178915 0.06322851630256272 0.11572802770110814] [-0.04134445059646741 -0.011779611291995052 0.13495002938154654] [-0.07242424650162711 0.02347293206381116 0.10322218957482163] [-0.07912592748234686 0.05140929117919257 0.057370046083468226] [0.03352717285472944 0.06359968341212022 0.11583881331702946] [0.03686639505686032 -0.011397164907962862 0.13536724076864515] [0.06791592703520163 0.02468254467119271 0.10366605207860985] [0.075310088095807 0.05226884499005337 0.05787698428594235] [-0.03773895423262196 0.034082658086024245 0.1294919790818403] [-0.061454307897075164 0.06443800208211416 0.09061004226260877] [-0.07282878975853647 -0.00527870527992114 0.10743054838539287] [-0.08439610638498087 0.02706123378098264 0.06559510739262155] [0.040013338219712126 0.020439745814301982 0.13063767506528579] [0.06428020193514211 0.05162125732852231 0.09133632943784001] [0.06521393141744246 -0.019260368037897515 0.10880928230870081] [0.08272091030272573 0.012990608473329186 0.06658402323335233] [-0.0824899918305801 3.5272652784690273e-09 -8.985265795291575e-10] [6.534060185275914e-12 0.11404663614484922 -8.956669156345853e-09] [0.08248999697928468 3.893090638057428e-09 4.766247813092761e-10] [-0.04018770669918394 0.044642295725887106 0.12357659157001165] [-0.04174110787598461 0.007685839199884737 0.13437743644514044] [-0.05885642692737942 0.026136335712672674 0.11745327806321545] [0.03851939726517181 0.03078283979366837 0.1281798987708399] [-0.052808259274512416 0.06188780045911764 0.10403189889709587] [-0.06922421433143165 0.04108974212744533 0.0972095427514579] [-0.07351067792317667 0.05556043944468993 0.07592438053279707] [0.031418413207589764 0.05609701242391968 0.12112072355182572] [-0.05798909959739463 -0.013176608236818512 0.12293491786782985] [-0.07419549041066105 0.00549319711704184 0.10669158774707252] [0.023828589288117673 0.003996293896988853 0.14081960191362605] [-0.08184195746940305 0.022098763709400317 0.08422485645365857] [-0.08309090979477994 0.03520830182778492 0.0610323268526085] [0.026425335781627597 0.043375220063173334 0.12939415639453739] [0.035781770725957916 0.04568049698512951 0.12354633505671649] [0.037556277679295876 0.008001190852073497 0.134555406719549] [0.054080633753515184 0.026705026363902817 0.11818464121484251] [0.06801076312164317 0.02284818084680601 0.10445872040876931] [0.0481712114118228 0.062034828924799924 0.10518292837437133] [0.06461419274656584 0.042329086589563227 0.0977770136200937] [0.06967032405092426 0.05604290375898291 0.07590918892077606] [0.07526945345643418 0.037578919664336574 0.08007431341875021] [0.05288202968424921 -0.012139614237985343 0.12430811820847815] [0.06938013615060833 0.0069016618704050535 0.10806432085401242] [0.07932239267332403 -0.00044800545870984573 0.08903670686337778] [0.07756024143390347 0.022515587935674892 0.08578432016555051] [0.07912303758068968 0.036496511669642975 0.06257633565753129] [0.08225821050778642 0.01751423306824721 0.06619028345490524]]
Units,meter


## Xarray functionality

Specify axis by name:

In [12]:
amp = rec["amp"]

amp.mean("time") 

0,1
Magnitude,[[0.09513744118937993 0.1901567040061965] [0.22563993078445715 0.6122525196393992] [0.11773369356684882 0.28961069878221957] [0.07213032784543226 0.11612604311717371] [0.6012405407332502 1.1734634188261113] [0.2898949243297904 0.7158700540169542] [0.17382142664056113 0.32430761493179566] [0.17511486315245925 0.3141774412367141] [0.06067031535780369 0.15475601518998236] [0.23063375960669563 0.5546144922027626] [0.16268459982787556 0.3035099493265631] [0.31171069947071733 0.7708742284048365] [0.10345713121476828 0.22655799399715998] [0.3220696665777357 0.4690957563965747] [0.14294614785059598 0.4006938615000645] [0.13915339294289772 0.2840855372520332] [0.7337741573948966 1.510695531089978] [0.2581577363655923 0.28385182482464827] [0.6067782309393692 1.222022708167305] [0.3771531387323035 0.7178758075261414] [0.08943971095572097 0.1569932692370584] [0.2234768945909893 0.24446904954602178] [0.07899666477473213 0.1552060079134214] [0.24383550556822584 0.41887414996772665] [0.17879039660484528 0.2290249103403761] [0.2230945552992814 0.4849904454537631] [0.5704392675115108 0.9370558970738844] [0.668139127987435 1.2588788966521796]]
Units,volt


get the second channel formed by S1 and D2:

In [13]:
amp[1, :, :] # location-based indexing
amp.loc["S1D2", :, :] # label-based indexing
amp.sel(channel="S1D2") # label-based indexing

0,1
Magnitude,[[0.227516 0.2297024 0.2261366 ... 0.2264519 0.2271665 0.226713] [0.6354927 0.637668 0.6298023 ... 0.6072068 0.6087293 0.6091066]]
Units,volt


Joins between two arrays:

In [14]:
rec.geo3d.loc[amp.source]

0,1
Magnitude,[[-0.041613204679326624 0.026799775287857947 0.1299043936308115] [-0.041613204679326624 0.026799775287857947 0.1299043936308115] [-0.041613204679326624 0.026799775287857947 0.1299043936308115] [-0.041613204679326624 0.026799775287857947 0.1299043936308115] [-0.06476686499872276 0.05814256998996063 0.0908425773727145] [-0.06476686499872276 0.05814256998996063 0.0908425773727145] [-0.06476686499872276 0.05814256998996063 0.0908425773727145] [-0.06476686499872276 0.05814256998996063 0.0908425773727145] [-0.07120554551675068 -0.012874272652217859 0.10787860947691345] [-0.07120554551675068 -0.012874272652217859 0.10787860947691345] [-0.07120554551675068 -0.012874272652217859 0.10787860947691345] [-0.0859043654400404 0.018971698468891116 0.06509762433137256] [-0.0859043654400404 0.018971698468891116 0.06509762433137256] [-0.0859043654400404 0.018971698468891116 0.06509762433137256] [0.03694171596700852 0.02748380530252158 0.13022129709104263] [0.03694171596700852 0.02748380530252158 0.13022129709104263] [0.03694171596700852 0.02748380530252158 0.13022129709104263] [0.03694171596700852 0.02748380530252158 0.13022129709104263] [0.06065133742692848 0.05882414589197514 0.09117717995727878] [0.06065133742692848 0.05882414589197514 0.09117717995727878] [0.06065133742692848 0.05882414589197514 0.09117717995727878] [0.06065133742692848 0.05882414589197514 0.09117717995727878] [0.06712771392323756 -0.012199231886346213 0.1085725493643022] [0.06712771392323756 -0.012199231886346213 0.1085725493643022] [0.06712771392323756 -0.012199231886346213 0.1085725493643022] [0.08188685574250908 0.020427932162352107 0.06571325110115192] [0.08188685574250908 0.020427932162352107 0.06571325110115192] [0.08188685574250908 0.020427932162352107 0.06571325110115192]]
Units,meter


In [15]:
distances = xrutils.norm(rec.geo3d.loc[amp.source] - rec.geo3d.loc[amp.detector], "digitized")
display(distances)

0,1
Magnitude,[0.0392934026450024 0.038908864513936804 0.04089410956725208 0.008259557139206735 0.037176969994756656 0.03760151703968715 0.03703957437473776 0.007117553818273828 0.040320528877789995 0.03666451648479846 0.0077799951360732755 0.04068730988246594 0.03402734352786281 0.008243962417367905 0.039023987441095136 0.0392201001283926 0.04089535242655265 0.007695908520396754 0.0369692064063704 0.03707283537792224 0.03696963582233907 0.008066944268659296 0.04042704810426235 0.03721505659762734 0.0073197165052477095 0.040665755600745715 0.03344405298373886 0.0075344326110820415]
Units,meter


Physical units:

In [16]:
rec.masks["distance_mask"] = distances > 1.5 * cedalion.units.cm
display(rec.masks["distance_mask"])

Additional functionality through accessors:

In [17]:
distances.pint.to("mm")

0,1
Magnitude,[39.293402645002395 38.908864513936805 40.89410956725207 8.259557139206736 37.176969994756654 37.60151703968715 37.03957437473776 7.117553818273827 40.32052887779 36.66451648479846 7.779995136073276 40.68730988246594 34.02734352786281 8.243962417367905 39.02398744109514 39.2201001283926 40.89535242655265 7.695908520396754 36.9692064063704 37.07283537792224 36.96963582233907 8.066944268659295 40.42704810426235 37.21505659762734 7.319716505247709 40.66575560074571 33.444052983738864 7.534432611082042]
Units,millimeter


## Writing snirf files

- pass `Recording` object to `cedalion.io.write_snirf`
- caveat: many `Recording`fields have correspondants in snirf files, but not all.

In [18]:
with tempfile.TemporaryDirectory() as tmpdir:
    output_path = Path(tmpdir).joinpath("test.snirf")

    cedalion.io.write_snirf(output_path, rec)