# Time-series analysis with XArray and Zarr

Here, we'll introduce a couple more tools that add to our arsenal and perform a more realistic analysis of neuroscience data. 

We'll use two tools that integrate with the other things we've seen so far. 

The first is Zarr. This is primarily a file format that allows us to save data (e.g., time-series) in a way that supports optimal ingesting into a distributed cluster. 

The second is XArray. This is a Python library that supports the management of complex datasets, such as multi-channel time-series data from neuroscience experiments.

Let's start by importing XArray and using it to read some data from a GCS bucket

In [1]:
import gcsfs
import xarray as xr

This time we're pointing to another bucket that is publicly available and contains some data that Chris Holdgraf collected (as described [here](https://www.nature.com/articles/ncomms13654)).

In [2]:
fs = gcsfs.GCSFileSystem('holdgraf-ecog')

XArray knows how to read data stored as a zarr into an `XArray` `DataSet` object. To identify the GCS location, it is given a `GCSMap` object with the file-system as a pointer and with `check` and `create` both set to `False` (this is the read only mode).

In [3]:
gcsmap = gcsfs.mapping.GCSMap('holdgraf-ecog/sub-01-zarr', 
                              gcs=fs, 
                              check=False, 
                              create=False)

Once this map is provided to XArray it creates the Dataset

In [4]:
data = xr.open_zarr(gcsmap)

In [5]:
channels = list(data.keys())

Let's start a dask kubernetes cluster. I've packaged all that code into a module that we can call:

In [6]:
import tools

In [7]:
cluster, client = tools.init_cluster(n_workers=10)

In [8]:
from nitime import algorithms as tsa
from nitime import utils as tsu

Coherence for real-valued data is symmetric, so we calculate this for only half the spectrum (the other half is identical):

The data was filtered between 0.1 and 1,000 Hz, so that's the sampling frequency:

In [10]:
Fs = 1000

In [11]:
N = data[channels[0]].shape[0]

In [14]:
NW = 4
bandwidth = NW * (2 * Fs) / N
tapers, eigs = tsa.dpss_windows(N, NW, 2 * NW - 1, interp_from=N//10)

  complex_result = (np.issubdtype(in1.dtype, np.complex) or
  np.issubdtype(in2.dtype, np.complex))
  complex_result = (np.issubdtype(in1.dtype, np.complex) or
  np.issubdtype(in2.dtype, np.complex))


In [24]:
import numpy as np
import dask.array as da

In [16]:
ch1 = channels[0]
ch2 = channels[1]

In [None]:
def mt_coherence(data, ch1, ch2, tapers, eigs):
    d1 = data[ch1].chunk(data[ch1].shape[-1]) 
    d2 = data[ch2].chunk(data[ch2].shape[-1])
    sx = tsa.tapered_spectra(d1.values, tapers)
    sy = 

In [58]:
def get_spectra(data, ch, tapers, eigs):
    dd = data[ch].chunk(data[ch].shape[0])
    ss = tsa.tapered_spectra(dd.values, tapers)
    ww, df = tsu.adaptive_weights(sx, eigs)
    return ss, ww, df

In [60]:
def mt_coherence(data, ch1, ch2, tapers, eigs):
    sx, wx, dfx = get_spectra(data, ch1, tapers, eigs)
    sy, wy, dfy = get_spectra(data, ch2, tapers, eigs)
    sxy = tsa.mtm_cross_spectrum(sx, sy, (wx, wy),
                                 sides='onesided')
    sxx = tsa.mtm_cross_spectrum(sx, sx, (wx, wx),
                                 sides='onesided')
    syy = tsa.mtm_cross_spectrum(sy, sy, (wy, wy),
                                 sides='onesided')
    coh = np.abs(sxy) ** 2 / (sxx *  syy)
    
    # XXX Calculate jack-knife estimates of 95% confidence intervals
    
    return coh

In [None]:
coh_test = mt_coherence(data, channels[0], channels[1], tapers, eigs)