# Data source basics

In ecogdata, load methods for recording files return a `Bunch` object, which is simply an unstructured data container analogous to the Matlab "struct". The electrode signals are found in the 2-dimensionary "array timeseries" named `data`, which is an `ElectrodeDataSource`. `ElectrodeDataSource` defines several generic attributes, operations, and access patterns for electrode data, but it cannot be used directly. Instead, two flavors of sources exist:

* `PlainDataSource` -- this is a data source whose signal data is loaded as a numpy ndarray
* `MappedSource` -- this source provides array like access to signal data mapped from a file

## Common features for `ElectrodeDataSource`

All sources implement a common set of attributes and access patterns. Let's look a simple source to illustrate.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from ecogdata.datasource import PlainArraySource, MappedSource
from ecogdata.expconfig import OVERRIDE, load_params
from ecogdata.filt.time import filter_array, notch_all

# a PlainArraySource uses an array in memory
array = np.arange(6 * 15).reshape(6, 15)
print(array)

In [None]:
%matplotlib inline

### Array attributes
The source object exposes a few attributes about the underlying data buffer, and exposes the buffer itself as `data_buffer`

In [None]:
source = PlainArraySource(array)
print('Length:', len(source), 'Shape:', source.shape, 'Dims:', source.ndim, 'dtype:', source.dtype)
print('Buffer type:', type(source.data_buffer))

### Reduction methods
Sources implement a few **array reduction** methods:

In [None]:
print('min:', source.min(axis=1), 'overall:', source.min())
print('max:', source.max(axis=1), 'overall:', source.max())
print('mean:', source.mean(axis=1), 'overall:', source.mean())

The reductions in this list are implemented, and generally respect the `axis`, `out`, `dtype`, and `keepdims` arguments documented for [ndarray methods](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.sum.html#numpy.ndarray.sum)

* .min
* .max
* .sum
* .mean
* .std (standard deviation)
* .var (variance)

For example:

In [None]:
out = np.zeros(source.shape[1], 'd')
source.sum(axis=0, out=out)
print(out)

### Arbitrary array access
Sources have **array-like access** for read-out and write-in, if the source is writeable, which is always true for `PlainArraySource`.

In [None]:
print(source[2:4, 4:8])
source[2, 6:8] = -1
print('After write:')
print(source[2:4, 4:8])

### Iterator access
In various cases (especially when working with large, mapped data sources) it is convenient to access small bits of the signal data at a time. Data sources have iterator access that work over channels or over time.

* `ElectrodeDataSource.iter_blocks`: yields sequential blocks in time, possibly with overlap (*note that the final block will not necessarily have the specified block length*)
* `ElectrodeDataSource.iter_channels`: yields all time points for sequential channels

#### iter_blocks

In [None]:
help(source.iter_blocks)

In [None]:
# Forward iteration with a 2-point rewind
for n, b in enumerate(source.iter_blocks(block_length=6, overlap=2)):
    print('Block {}'.format(n + 1), b)

In [None]:
# Reverse iteration with a 2-point rewind
for n, b in enumerate(source.iter_blocks(block_length=6, overlap=2, reverse=True)):
    print('Block {}'.format(n + 1), b)

Iterators can also be built and manipulated outside of loops.

In [None]:
itr = source.iter_blocks(block_length=3, overlap=0)
print('Number of blocks:', len(itr))
next_block = next(itr)
print('First block:', next_block)

This iterator returns both the data blocks and also the slice used to pull the data, which is convenient if transformed data should be written to a different array or source.

In [None]:
itr = source.iter_blocks(block_length=3, overlap=0, return_slice=True)
# Create a duplicate source
source2 = PlainArraySource(np.zeros(source.shape, source.dtype))
# Get data and slice for n=2 (third) block
data, slicer = itr.block(2)
source2[slicer] = data **  2
source2[:]

#### iter_channels

Channel iteration yields all signal samples for groups of channels at a time. If `use_max_memory` is specified, then the number of channels returned per group is limited to prevent memory blow-outs. If nothing is specified, then a default value of 16 channels at a time are returned.

In [None]:
help(source.iter_channels)

Example: memory mode

Temporarily set the memory limit to be about 40 4-byte integer numbers. Only two channels at a time fit inside this memory limit.

In [None]:
OVERRIDE['memory_limit'] = 40 * source.dtype.itemsize
print('Over-ride bytes limit:', load_params().memory_limit)
for block in source.iter_channels(use_max_memory=True):
    print('block size:', block.shape)

In [None]:
del OVERRIDE['memory_limit']
print('Normal bytes limit without over-ride:', load_params().memory_limit)

### Array filtering

These filtering methods are defined for data sources

* `filter_array`: general iir, fir filtering
* `notch_filter`: filters to suppress line-noise (make a "notch" in the power spectrum)
* `batch_change_rate`: anti-alias (low-pass) filtering and sample rate change for all channels: copies result to new source

#### `filter_array`
The `filter_array` method takes arguments relevant to `ecogdata.filt.time.filter_array`

In [None]:
help(filter_array)

In [None]:
source = PlainArraySource(np.random.randn(10, 200))
f_source = source.filter_array(inplace=False, design_kwargs=dict(hi=0.1, Fs=1, ord=1))
plt.figure()
lines_a = plt.plot(np.c_[source[0], source[1]], color='k', lw=2)
lines_b = plt.plot(np.c_[f_source[0], f_source[1]], color='r', lw=2)
plt.legend([lines_a[0], lines_b[0]], ('Random', 'Lowpass (20% bandwidth)'))

#### `notch_filter`

Notch filter arguments apply to `ecogdata.filt.time.proc.notch_all`

In [None]:
help(notch_all)

In [None]:
# add a cosine with jittered phase and amplitude across channels
amps = np.random.rand(source.shape[0]) + 9
phases = np.random.rand(source.shape[0]) * 2 * np.pi
line_source = PlainArraySource(source[:].copy())
# Fake sampling rate of 500 Hz
line_noise = amps[:, np.newaxis] * np.cos(2 * np.pi * 60 * np.arange(source.shape[1]) / 500. + phases[:, np.newaxis])
line_source[:] = source[:] + line_noise
f_source = line_source.notch_filter(500.0, inplace=False, lines=60, nzo=4, nwid=3)
plt.figure()
lines_a = plt.plot(np.c_[source[0], source[1]], color='k', lw=1)
lines_b = plt.plot(np.c_[line_source[0], line_source[1]], color='b', lw=1)
lines_c = plt.plot(np.c_[f_source[0], f_source[1]], color='r', lw=1)
plt.legend([lines_a[0], lines_b[0], lines_c[0]], ('Random', 'High amp cosine', 'Notch filter'))