# Medium Dataset - Multi-Channel Timeseries with Pandas and Downsampling

TODO create banner image
![]()

---

## Overview

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold"> Visit the Index Page </p>
    This workflow example is part of set of related workflows. If you haven't already, visit the <a href="/index.html">index</a> page for an introduction and guidance on choosing the appropriate workflow.
</div>

The intended use-case for this workflow is to browse and annotate multi-channel timeseries data from an [electrophysiological](https://en.wikipedia.org/wiki/Electrophysiology) recording session. Compared to the notebooks in this set of workflows, this particular workflow is focused on 'medium-sized' dataset, which we will loosely define as a dataset with >100k samples and comfortably fits into available RAM. 

Medium-sized datasets can start to slow down a browser, and may require strategies like downsampling - a processing strategy that only sends a strided subsample of the data from memory to the browser for visualization. If there are many timeseries and they utilize a common time index, we can often streamline the added processing computation by using a single index-based slicing operation on all the timeseries.



## Prerequisites and Resources

| Topic | Type | Notes |
| --- | --- | --- |
| [Intro and Guidance](./index.ipynb) | Prerequisite | Background |
| [Time Range Annotation](./time_range_annotation.ipynb) | Next Step | Display and edit time ranges |
| [Smaller Dataset Workflow](./small_multi-chan-ts.ipynb) | Alternative | Use Pandas and downsample |
| [Larger Dataset Workflow](./large_multi-chan-ts.ipynb) | Alternative | Use dynamic data chunking |

---

## Imports and Configuration

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import zscore
import string
import wget
from pathlib import Path

import mne

import colorcet as cc
import holoviews as hv
from holoviews.plotting.links import RangeToolLink
from holoviews.operation.datashader import rasterize
from holoviews.operation.downsample import downsample1d
from bokeh.models import HoverTool
import panel as pn

pn.extension()
hv.extension('bokeh')
np.random.seed(0)

## Download the data

Let's get some data! The following code downloads a dataset (2.6 MB) from a specified URL into a designated directory. It performs these steps:

1. Sets the URL for the dataset.
2. Identifies the directory to store the downloaded file.
3. Ensures the directory exists, creating it if necessary.
4. Constructs the file path by combining the directory and dataset's filename.
5. Checks if the file already exists to avoid redundant downloads.
6. Downloads and saves the file if it's not already present.

In [None]:
data_url = 'https://physionet.org/files/eegmmidb/1.0.0/S001/S001R04.edf'
output_directory = Path('./data')

output_directory.mkdir(parents=True, exist_ok=True)
data_path = output_directory / Path(data_url).name
if not data_path.exists():
    data_path = wget.download(data_url, out=str(data_path))

## Read the data

Next, let's load the data into an MNE Raw object:

In [None]:
raw = mne.io.read_raw_edf(data_path, preload=True)

Let's take a look at some general information for this data.

In [None]:
print('num samples in dataset:', len(raw.times) * len(raw.ch_names))
raw

Here is the output from the previous code:

```
num samples in dataset: 1280000

General
Measurement date	August 12, 2009 16:15:00 GMT
Experimenter	Unknown
Participant	X
Channels
Digitized points	Not available
Good channels	64 EEG
Bad channels	None
EOG channels	Not available
ECG channels	Not available
Data
Sampling frequency	160.00 Hz
Highpass	0.00 Hz
Lowpass	80.00 Hz
```

So we have 64 channels of filtered 'EEG' data, sampled at 160Hz for about 2 minutes, and over a million data samples in total.

Let's preview the channel names, types, unit, and signal ranges. This `describe` method is from MNE, and we can have it return a Pandas DataFrame, from which we can `sample` some rows.

In [None]:
raw.describe(data_frame=True).sample(5)

## Pre-processing


### Averaging

We'll first remove some of the large noise artifacts that impact all the channels by using an average reference. The idea is to compute the average across channels for every time point to get an average time series, and then subtract that average out of the raw EEG signal.

In [None]:
raw.set_eeg_reference("average")

### Clean Channel Names

From the output of the `describe` method, it looks like the channels are from commonly used standardized locations (e.g. 'Cz'), but contain some unnecessary periods, so let's clean those up.

In [None]:
raw.rename_channels(lambda s: s.strip("."));

## *Optional*: Get Channel Locations

This is an optional step, but let's see if we can add locations to the channels. MNE has functionality to assign locations of the channels based on their standardized channel names, so we can go ahead and assign a commonly used arrangement (or 'montage') of electrodes ('10-05') to this data. Read more about making and setting the montage [here](https://mne.tools/stable/auto_tutorials/intro/40_sensor_locations.html#sphx-glr-auto-tutorials-intro-40-sensor-locations-py).

In [None]:
montage = mne.channels.make_standard_montage("standard_1005")
raw.set_montage(montage, match_case=False)

We can see that the 'digitized points' (locations) are now added to the raw data.

Now let's plot the channels ('sensors') using MNE [`plot_sensors`](https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.plot_sensors) on a top-down view of a head. Note, we'll adjust the reference point so the points are contained in the head.

In [None]:
sphere=(0, 0.015, 0, 0.099) # manually adjust the y origin coordinate and radius
raw.plot_sensors(show_names=True, sphere=sphere);

## Prepare the data for plotting

We'll use an MNE method, `to_data_frame`, to create a Pandas DataFrame. By default, MNE will convert EEG data from Volts to microVolts (µV) during this operation.

In [None]:
# TODO: file issue about rangetool not working with datetime (timezone error)

In [None]:
df = raw.to_data_frame() # time_format='datetime'
df.set_index('time', inplace=True) 
df.head()

## Interactive plot

As of writing, there's no easy way to track units with Pandas, so we can use a modular HoloViews approach to create and annotate dimensions with a unit, and then refer to these dimensions when plotting. Read more about annotating data with HoloViews [here](https://holoviews.org/user_guide/Annotating_Data.html).

In [None]:
amplitude_dim = hv.Dimension("amplitude", unit="µV")
time_dim = hv.Dimension("time", unit="s") # matches the index name in the df

Now we will loop over the columns (channels) in the dataframe, creating a HoloViews `Curve` element from each. Since each column in the df has a different name, we will use the `redim` method to map from the channel name to the common `amplitude_dim`. We'll set the Curve label to be the original channel name so we can still see this info in the hover tooltip.

We will use HoloViews `.opts` to set the plotting options per Curve element. A couple important options include `hover_tooltip` and `subcoordinate_y`.

The custom `hover_tooltip` argument is new in HoloViews as of 1.19.0. It allows us to specify which data dimensions show up in the tooltip when hovering over a data point. We can also specify that the values of 'group' or 'label' arguments should be included as well. Read more about `hover_tooltip` and related arguments [here](https://holoviews.org/user_guide/Plotting_with_Bokeh.html).

The `subcoordinate_y` argument was introduced in HoloViews 1.18.0. Setting this to True  will automatically distribute overlay elements along the y-axis, each with their own distinct y-axis subcoordinate system. Read more about `subcoordinate_y` [here](https://holoviews.org/user_guide/Customizing_Plots.html#subcoordinate-y-axis).


In [None]:

curves = {}
for channel_name, channel_data in df.items():
    curve = (
        hv.Curve(
            df, kdims=[time_dim], vdims=[channel_name], group="EEG", label=channel_name
        )
        .redim(**{channel_name: amplitude_dim})
        .opts(
            subcoordinate_y=True,
            subcoordinate_scale=2,
            color="black",
            line_width=1,
            tools=["hover"],
            hover_tooltips=[
                ("type", "$group"),
                ("channel", "$label"),
                ("time"),  #'@time{%H:%M:%S.%3N}'), # hide date and use ms precision
                ("amplitude"),
            ],
            # hover_formatters = {'time': 'datetime'},
        )
    )
    curves[channel_name] = curve


Using a HoloViews `Overlay` container, we can now overlay all the curves on the same plot.

In [None]:

curves_overlay = hv.Overlay(curves, kdims="channel").opts(
    ylabel="channel",
    show_legend=False,
    padding=0,
    aspect=1.5,
    responsive=True,
    shared_axes=False,
    framewise=False,
    min_height=100,
)

Since there are 64 channels and over a million data samples, we'll make use of downsampling before trying to send all that data to the browser. We can use `downsample1d` imported from HoloViews. Starting in HoloViews version 1.19.0, integration with the `tsdownsample` library introduces enhanced downsampling algorithms. Read more about downsampling [here](https://holoviews.org/user_guide/Large_Data.html).

In [None]:
curves_overlay = downsample1d(curves_overlay, algorithm='minmax-lttb')
curves_overlay

Now that we've created the main plot, let's add a secondary plot to hold the linked minimap element, which will allow for range control over the main plot, while contextualizing with a Datashaded rendering of all the data, so a view of the zoomed out data is maintained while navigating in on the main plot.

In [None]:
channels = df.columns
time = df.index.values

y_positions = range(len(channels))
yticks = [(i, ich) for i, ich in enumerate(channels)]
z_data = zscore(df, axis=0).T
minimap = rasterize(hv.Image((time, y_positions, z_data), ["Time", "Channel"], "amplitude"))
https://holoviews.org/user_guide/Large_Data.html = minimap.opts(
    cmap="RdBu_r",
    colorbar=False,
    xlabel='',
    alpha=0.5,
    yticks=[yticks[0], yticks[-1]],
    toolbar='disable',
    height=120,
    responsive=True,
    default_tools=[],
    )


With the minimap created, we can now go ahead and link the minimap to the main plot using a HoloViews `RangeToolLink`. We'll also constrain the initial x-range view to a third of the duration.

In [None]:
# Link minimap widget to curves overlay plot
RangeToolLink(minimap, curves_overlay, axes=["x", "y"],
              boundsx=(0, time[len(time)//3]) # limit the initial x-range of the minimap
             )

Finally, we'll layout the main plot and minimap and use HoloViz Panel to allow for serving the application from command line. 

In [None]:
app = (curves_overlay + minimap).cols(1)
app

## *Optional:* Standalone App

Using HoloViz Panel, we can also set this application as servable so we can see it in a browser window, outside of a Jupyter Notebook.

In [None]:
template = pn.template.FastListTemplate(
    title = "Medium Multi-Chanel Timeseries App",
    main = pn.Column(app, min_height=500)
).servable()