# Medium - Multi-Channel Timeseries with Downsampling

TODO create banner image
![]()

---

## Overview

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold"> Visit the Intro Page </p>
    Explore related workflows in this series. For a guided introduction and help with selecting the most suitable workflow, please visit the <a href="index.ipynb">Introduction and Selection Guide</a> page.
</div>

This workflow is tailored for processing and analyzing medium-sized multi-channel timeseries data derived from [electrophysiological](https://en.wikipedia.org/wiki/Electrophysiology) recordings.

### What Defines a 'Medium-Sized' Dataset?

A medium-sized dataset typically includes more than 100,000 samples (data points) and can be handled within the available RAM without exhausting system resources. However, these datasets can still strain the processing capabilities when visualizing or analyzing data directly in the browser. To address this challenge, we will employ downsampling.

### Why Downsample?

Downsampling is a technique for reducing the dataset size by selectively sampling every few data points, depending on the downsampling algorithm employed. For instance, we'll make use of a downsampling algorithm called [Largest Triangle Three Buckets (LTTB)](https://skemman.is/handle/1946/15343). LTTB allows data points not contributing significantly to the visible shape to be dropped, reducing the amount of data to send to the browser but preserving the appearance (and particularly the envelope, i.e. highest and lowest values in a region). This ensures efficient data handling and visualization without significant loss of information.

Downsampling is particularly beneficial when dealing with numerous timeseries sharing a common time index, as it allows for a consolidated slicing operation across all series, significantly reducing the computational load and enhancing responsiveness for interactive visualization. We'll make use of a [Pandas](https://pandas.pydata.org/docs/index.html) index to represent the time index across all timeseries.

### Introduction to MNE (MNE-Python)

[MNE (MNE-Python)](https://mne.tools/stable/index.html) is an open-source Python library designed specifically for analyzing data like EEG and MEG. In this workflow, since we are using a demo EEG dataset, we use MNE for loading, preprocessing, and conversion to Pandas. However, the data visualization section is highly generalizable to dataset types beyond the scope of MNE, as you can get your data into a Pandas DataFrame with a time index and channel columns.


## Prerequisites and Resources

| Topic | Type | Notes |
| --- | --- | --- |
| [Introduction and Selection Guide](./index.ipynb) | Prerequisite | Read the foundational concepts and workflow selection assistance. |
| [Time Range Annotation](./time_range_annotation.ipynb) | uggested Next Step | Learn to display and edit time ranges in data. |
| [Handling Smaller Datasets](./small_multi-chan-ts.ipynb) | Alternative Workflow | Use Numpy for flexibility with smaller datasets |
| [Handling Larger Datasets](./large_multi-chan-ts.ipynb) | Alternative Workflow | Discover techniques for dynamic data chunking in larger datasets. |

---

## Imports and Configuration

The following code block imports and sets up the necessary libraries and tools, ensuring that the environment is prepared for data handling and visualization:

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import zscore
import wget
from pathlib import Path
import mne
import colorcet as cc
import holoviews as hv
from holoviews.plotting.links import RangeToolLink
from holoviews.operation.datashader import rasterize
from holoviews.operation.downsample import downsample1d
from bokeh.models import HoverTool
import panel as pn

# Extensions for visualization
pn.extension()
hv.extension('bokeh')

np.random.seed(0)

## Data Acquisition


Let's get some data! This section walks through obtaining an EEG dataset (2.6 MB). If it doesn't already exist, it will put the data in a new 'data' folder in the same directory of this notebook:

In [None]:
data_url = 'https://physionet.org/files/eegmmidb/1.0.0/S001/S001R04.edf'
output_directory = Path('./data')

output_directory.mkdir(parents=True, exist_ok=True)
data_path = output_directory / Path(data_url).name
if not data_path.exists():
    data_path = wget.download(data_url, out=str(data_path))

## Loading and Inspecting the Data

Once the data is acquired, the next crucial step is to load it into an analysis-friendly format and inspect its basic characteristics:

In [None]:
raw = mne.io.read_raw_edf(data_path, preload=True)
print('num samples in dataset:', len(raw.times) * len(raw.ch_names))
raw # Could also use `raw.info`

This step confirms the successful loading of the data and provides an initial understanding of its structure, such as the number of channels and samples.

Now, let's preview the channel names, types, unit, and signal ranges. This `describe` method is from MNE, and we can have it return a Pandas DataFrame, from which we can `sample` some rows.

In [None]:
raw.describe(data_frame=True).sample(5)

## Pre-processing the Data


### Noise Reduction via Averaging

Significant noise reduction is often achieved by employing an average reference, which involves calculating the mean signal across all channels at each time point and subtracting it from the individual channel signals:

In [None]:
raw.set_eeg_reference("average")

### Standardizing Channel Names

From the output of the `describe` method, it looks like the channels are from commonly used standardized locations (e.g. 'Cz'), but contain some unnecessary periods, so let's clean those up to ensure smoother processing and analysis.

In [None]:
raw.rename_channels(lambda s: s.strip("."));

### Optional: Enhancing Channel Metadata

Visualizing physical locations of EEG channels enhances interpretative analysis. MNE has functionality to assign locations of the channels based on their standardized channel names, so we can go ahead and assign a commonly used arrangement (or 'montage') of electrodes ('10-05') to this data. Read more about making and setting the montage [here](https://mne.tools/stable/auto_tutorials/intro/40_sensor_locations.html#sphx-glr-auto-tutorials-intro-40-sensor-locations-py).

In [None]:
montage = mne.channels.make_standard_montage("standard_1005")
raw.set_montage(montage, match_case=False)

We can see that the 'digitized points' (locations) are now added to the raw data.

Now let's plot the channels using MNE [`plot_sensors`](https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.plot_sensors) on a top-down view of a head. Note, we'll tweak the reference point so that all the points are contained within the depiction of the head.

In [None]:
sphere=(0, 0.015, 0, 0.099) # manually adjust the y origin coordinate and radius
raw.plot_sensors(show_names=True, sphere=sphere);

## Data Visualization

### Preparing Data for Visualization

We'll use an MNE method, `to_data_frame`, to create a Pandas DataFrame. By default, MNE will convert EEG data from Volts to microVolts (µV) during this operation.

TODO: file issue about rangetool not working with datetime (timezone error)

In [23]:
df = raw.to_data_frame() # TODO: add time_format='datetime'
df.set_index('time', inplace=True) 
df.head()

Unnamed: 0_level_0,Fc5,Fc3,Fc1,Fcz,Fc2,Fc4,Fc6,C5,C3,C1,...,P8,Po7,Po3,Poz,Po4,Po8,O1,Oz,O2,Iz
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.0,8.703125,15.703125,50.703125,52.703125,43.703125,39.703125,-2.296875,-0.296875,17.703125,31.703125,...,-7.296875,5.703125,-21.296875,-31.296875,-52.296875,-25.296875,-19.296875,-34.296875,-25.296875,-25.296875
0.00625,46.0625,34.0625,59.0625,56.0625,43.0625,36.0625,3.0625,22.0625,31.0625,33.0625,...,8.0625,18.0625,-9.9375,-6.9375,-25.9375,6.0625,37.0625,16.0625,27.0625,24.0625
0.0125,-13.65625,-14.65625,4.34375,-1.65625,0.34375,8.34375,-3.65625,-24.65625,-7.65625,-1.65625,...,46.34375,41.34375,13.34375,28.34375,15.34375,45.34375,43.34375,21.34375,34.34375,36.34375
0.01875,-3.015625,-4.015625,12.984375,-2.015625,2.984375,7.984375,-5.015625,0.984375,9.984375,8.984375,...,23.984375,2.984375,-15.015625,-1.015625,-5.015625,21.984375,18.984375,0.984375,28.984375,19.984375
0.025,20.65625,10.65625,32.65625,12.65625,11.65625,2.65625,-17.34375,13.65625,15.65625,12.65625,...,21.65625,10.65625,-4.34375,11.65625,2.65625,28.65625,5.65625,-4.34375,31.65625,20.65625


### Creating the Main Plot

As of the time of writing, there's no easy way to track units with Pandas, so we can use a modular HoloViews approach to create and annotate dimensions with a unit, and then refer to these dimensions when plotting. Read more about annotating data with HoloViews [here](https://holoviews.org/user_guide/Annotating_Data.html).

In [24]:
amplitude_dim = hv.Dimension("amplitude", unit="µV")
time_dim = hv.Dimension("time", unit="s") # match the index name in the df

Now we will loop over the columns (channels) in the dataframe, creating a HoloViews `Curve` element from each. Since each column in the df has a different name, we will use the `redim` method to map from the channel name to the common `amplitude_dim`. We'll set the Curve label to be the original channel name so we can still see this info in the hover tooltip.

In configuring these curves, we apply the `.opts` method from HoloViews to fine-tune the visualization properties of each curve. Two significant settings are `hover_tooltip` and `subcoordinate_y`. The `hover_tooltip` feature, introduced in HoloViews version 1.19.0, enhances user interactivity by allowing customization of the tooltip content that appears when hovering over data points, including the inclusion of 'group' and 'label' data. You can explore further details on configuring hover_tooltip [here](https://holoviews.org/user_guide/Plotting_with_Bokeh.html).

The subcoordinate_y feature, available since HoloViews 1.18.0, is pivotal for managing time-aligned, amplitude-diverse plots. When enabled, it arranges each curve along its own segment of the y-axis within a single composite plot. This method not only aids in differentiating the data visually but also in analyzing comparative trends across multiple channels, ensuring that each channel's data is individually accessible and comparably presentable, thereby enhancing the analytical value of the visualizations. Read more about `subcoordinate_y` [here](https://holoviews.org/user_guide/Customizing_Plots.html#subcoordinate-y-axis). 

In [None]:
curves = {}
for channel_name, channel_data in df.items():
    
    curve = hv.Curve(df, kdims=[time_dim], vdims=[channel_name], group="EEG", label=channel_name)

    curve = curve.redim(**{channel_name: amplitude_dim})

    curve = curve.opts(
        subcoordinate_y=True,
        subcoordinate_scale=2,
        color="black",
        line_width=1,
        tools=["hover"],
        hover_tooltips=[
            ("type", "$group"),
            ("channel", "$label"),
            ("time"),  # TODO: '@time{%H:%M:%S.%3N}'),
            ("amplitude"),
        ],
        # TODO: hover_formatters = {'time': 'datetime'},
    )
    curves[channel_name] = curve

Using a HoloViews `Overlay` container, we can now overlay all the curves on the same plot.

In [None]:

curves_overlay = hv.Overlay(curves, kdims="channel").opts(
    ylabel="channel",
    show_legend=False,
    padding=0,
    aspect=1.5,
    responsive=True,
    shared_axes=False,
    framewise=False,
    min_height=100,
)

Since there are 64 channels and over a million data samples, we'll make use of downsampling before trying to send all that data to the browser. We can use `downsample1d` imported from HoloViews. Starting in HoloViews version 1.19.0, integration with the `tsdownsample` library introduces enhanced downsampling algorithms. Read more about downsampling [here](https://holoviews.org/user_guide/Large_Data.html).

In [None]:
curves_overlay = downsample1d(curves_overlay, algorithm='minmax-lttb')
curves_overlay

### Creating the Minimap Plot and Range-Link

To assist in navigating the dataset, we integrate a minimap widget. This secondary minimap plot provides a condensed overview of the entire dataset, allowing users to select and zoom into areas of interest quickly in the main plot while maintaining the contextualization of the zoomed out view.

We will employ datashader rasterization of the image for the minimap plot to display a browser-friendly, aggregated view of the entire dataset. Read more about datashder rasterization via HoloViews [here](https://holoviews.org/user_guide/Large_Data.html).

In [None]:
channels = df.columns
time = df.index.values

y_positions = range(len(channels))
yticks = [(i, ich) for i, ich in enumerate(channels)]
z_data = zscore(df, axis=0).T
minimap = rasterize(hv.Image((time, y_positions, z_data), ["Time", "Channel"], "amplitude"))
minimap = minimap.opts(
    cmap="RdBu_r",
    colorbar=False,
    xlabel='',
    alpha=0.5,
    yticks=[yticks[0], yticks[-1]],
    toolbar='disable',
    height=120,
    responsive=True,
    default_tools=[],
    cnorm='eq_hist'
    )

The connection between the main plot and the minimap is facilitated by a `RangeToolLink`, enhancing user interaction by synchronizing the visible range of the main plot with selections made on the minimap. Optionally, we'll also constrain the initially displayed x-range view to a third of the duration.

In [None]:
RangeToolLink(minimap, curves_overlay, axes=["x", "y"],
              boundsx=(0, time[len(time)//3]) # limit the initial x-range of the minimap
             )

### Display the Application

Finally, we'll layout the main plot and minimap and use HoloViz Panel to allow for serving the application from command line. 

In [None]:
app = (curves_overlay + minimap).cols(1)
app

## *Optional:* Standalone App
This layout, combined with the capabilities of HoloViz Panel, allows for the deployment of this complex visualization as a standalone, template-styled, interactive web application (outside of a Jupyter Notebook). Read more about Panel [here](https://panel.holoviz.org/).

In [None]:
template = pn.template.FastListTemplate(
    title = "Medium Multi-Chanel Timeseries App",
    main = pn.Column(app, min_height=500)
).servable()