# Demonstration of Spike Train Analysis Algorithm Replication

Wesley Borden

## Introduction

The second objective of my MS project is to "Implement and experiment with 3-5 algorithms for electrophysiology-based identification of connectome subgraphs with International Brain Lab small animal data". Here, I demonstrate the use of a set of python functions I have developed to replicate these algorithms. I also show simple visualizations of the identified networks.

## Setup

### Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from brainbox.io.one import SpikeSortingLoader
from iblutil.util import Bunch
from one.alf.io import AlfBunch
from one.api import OneAlyx, ONE  # Docs: https://int-brain-lab.github.io/ONE/

from cc.cc import cross_correlate
# from gu.utils import *
# from te.te import *

### API

IBL API as demonstrated in `.../data-demo/nsp_data_demo_jwb.ipynb`

In [None]:
one_alyx: OneAlyx = ONE(
    cache_dir="/Users/wesley/GitHub/BYU/ms-proj/tmp/one-cache",  # any directory where temporary files can be synced
    base_url="https://openalyx.internationalbrainlab.org",  # base url for the API
    password="international",  # public-access password
    silent=True,  # don't print progress, etc.
)  # most 'type: ignore' are because IBL's libraries are less strict on types # type: ignore

In [None]:
data_tag = "2024_Q2_IBL_et_al_BWM_iblsort"  # tag for most recent data release ()
all_sessions: list = one_alyx.search(  # list of sessions
    tag=data_tag, query_type="remote"
)  # type: ignore
n_sessions = len(all_sessions)
print(f"Session count: {n_sessions}")
print(f"Session example: {all_sessions[0]}")

all_insertions: list = one_alyx.search_insertions(  # list of insertions
    tag=data_tag, query_type="remote"
)  # type: ignore
n_insertions = len(all_insertions)
print(f"Insertion count: {n_insertions}")
print(f"Insertion example: {all_insertions[0]}")

In [None]:
# Use the same one for consistency between this demo and other IBL demos
pid_i = 534
pid: str = str(all_insertions[pid_i])
pid_details: tuple[str, str] = one_alyx.pid2eid(pid)
eid, p_name = pid_details

print(f"Probe ID: {pid}")
print(f"Probe Name: {p_name}")
print(f"Experiment ID: {eid}")

### Load Spike-Sorted Data

As demonstrated in `.../data-demo/nsp_data_demo_jwb.ipynb`

In [None]:
spike_loader = SpikeSortingLoader(pid=pid, one=one_alyx)

In [None]:
spike_sorting_data: tuple[AlfBunch, AlfBunch, Bunch] = spike_loader.load_spike_sorting()  # type: ignore
spikes, clusters, channels = spike_sorting_data

In [None]:
spikes_df = spikes.to_df()
spikes_df

In [None]:
clusters_wrangled: dict = {}
for k, v in clusters.items():
    if v.ndim == 1:
        clusters_wrangled[k] = v
    elif v.ndim == 2:
        for k_sub in v:
            v_sub = v[k_sub]
            clusters_wrangled[k_sub] = v_sub
    else:
        raise ValueError("Bad dimensions")


clusters_df: pd.DataFrame = pd.DataFrame(clusters_wrangled)
clusters_df

In [None]:
channels_df = AlfBunch(channels).to_df()
channels_df

In [None]:
merged_clusters: AlfBunch = spike_loader.merge_clusters(spikes, clusters, channels)  # type: ignore

In [None]:
merged_clusters_df = merged_clusters.to_df()
merged_clusters_df

### Timeframe

As demonstrated in `.../data-demo/nsp_data_demo_jwb.ipynb`

In [None]:
start_time = 150  # seconds since beginning the electrophysiology recording
end_time = 152  # seconds since beginning the electrophysiology recording

In [None]:
spikes_df_timeframe = spikes_df[start_time <= spikes_df["times"]]
spikes_df_timeframe = spikes_df_timeframe[spikes_df_timeframe["times"] <= end_time]
spikes_df_timeframe

### Determine Bin Size

In `.../data-demo/nsp_data_demo_jwb.ipynb`, we used high-resolution bins that slowed down processing. For spike train analysis, we can tune bin size as a hyperparameter. We will use a 10ms bins size, which aligns with a prior study (Moore, 1970, Statistical Signs of Synaptic Interaction in Neurons, https://doi.org/10.1016/S0006-3495(70)86341-X)

In [None]:
bins_per_s = 100 # each bin is 10ms

### Wrangle to Clusters-by-Time Matrix

Adapted from `.../data-demo/nsp_data_demo_jwb.ipynb`

In [None]:
cluster_channel_map = (
    merged_clusters_df[["cluster_id", "channels"]]
    .copy()
    .sort_values(by="channels", ascending=True)
    .reset_index(drop=True)
    .reset_index(drop=False)
    .rename(inplace=False, columns={"index": "cluster_channel_id"})
)
cluster_channel_map

In [None]:
spikes_df_timeframe = spikes_df_timeframe.merge(
    cluster_channel_map, left_on="clusters", right_on="cluster_id", how="left"
)
spikes_df_timeframe["time_bin"] = (np.floor((spikes_df_timeframe["times"] - start_time) * bins_per_s)).astype(int)  # bin by microsecond
spikes_df_timeframe

In [None]:
clusters_spikes_matrix = np.zeros(
    (cluster_channel_map.shape[0], ((end_time - start_time) * bins_per_s))
)  # type: ignore
clusters_spikes_matrix[
    (
        spikes_df_timeframe["cluster_channel_id"].max()
        - spikes_df_timeframe["cluster_channel_id"].values
    ),
    spikes_df_timeframe["time_bin"].values,
] = int(1)  # 1 represents a spike # type: ignore
clusters_spikes_matrix

### Visualize Spike Trains

As demonstrated in `.../data-demo/nsp_data_demo_jwb.ipynb`

In [None]:
fig, axs = plt.subplots(figsize=(10, 8))

axs.scatter(
    spikes_df_timeframe["times"].values,  # type: ignore
    spikes_df_timeframe["cluster_channel_id"].values,  # type: ignore
    s=1,
    alpha=0.5,
    c="#000000",
    marker="s",
)

axs.set_title("Putative Neural Spikes")
axs.set_xlabel("Time (s)")
axs.set_ylabel("Putative Neuron")

### Notes

Everything to this point has been copied or adapted from `.../data-demo/nsp_data_demo_jwb.ipynb`. Now we will show how to use the data to identify a biological neural network: a partial connectome.

## Cross Correlation

Cross correlation involves a sliding dot product of two vectors that represent parallel spike trains. The resulting distribution includes outliers if there is a significant correlation between the two spike trains. This is implemented in `cross_correlate`, which returns a category as follows:

|Category | Meaning |
|---|---|
|  1| relationship |
|  0| no relationship |

In [None]:
print(f"Comparing a spike train to itself returns {cross_correlate(clusters_spikes_matrix[0], clusters_spikes_matrix[0])}")
print(f"Comparing a spike train to a distant spike train returns {cross_correlate(clusters_spikes_matrix[0], clusters_spikes_matrix[-1])}")

In [None]:
sample_limit = 10
sample_count = 0
for i, _ in enumerate(clusters_spikes_matrix):
    for j in range(i+1, min(i+100, len(clusters_spikes_matrix))):
        if cross_correlate(clusters_spikes_matrix[i], clusters_spikes_matrix[j]):
            sample_count += 1
            print(f"Spike train {i} is functionally connected to spike train {j}")
            if sample_count >= sample_limit:
                break
    if sample_count >= sample_limit:
        break

        
