# Data Explorer 

<a href="https://colab.research.google.com/github/neurologic/Neurophysiology-Lab/blob/main/modules/eod-stim/Data-Explorer_eod-stim.ipynb" target="_blank" rel="noopener noreferrer"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"/></a>   

<a id="toc"></a>
# Table of Contents

- [Introduction](#intro)
- [Setup](#setup)
- [Part I. Random or Non-Random?](#one)
- [Part II. What's in a trial?](#two)
- [Part III. Is it *real*?](#three)
- [Part IV. Can you hear me now?](#four)
- [Part V. ](#five)

<a id="intro"></a>
# Introduction: Experimental Design for Analysis

As you found last week (<a href='https://neurologic.github.io/Neurophysiology-Lab/modules/eod/eod_landing.html' target="_blank" rel="noopener noreferrer">Electric Organ Discharge</a>), weakly electric fish vary their EOD rate over time. Is this variation random or is there non-random structure in it? If there is non-random structure, do the fish change their EOD rate in response to something in the environment or just spontaneously? How can you determine if a stimulus evokes a response? This notebook provides a tutorial on ways to approach this kind of analysis and how it depends on experimental design. We will not inclusively cover all possible approaches, but rather focus on basic principles of trial-based experimental design and the estimation of results under <i>null</i> hypotheses. After you complete your work, think about other questions that you are interested in and what kinds of experimental design considerations you would need to implement to analyze the data. 

<a id="setup"></a>
# Setup
[toc](#toc)

Import and define functions

In [2]:
#@title {display-mode: "form" }

#@markdown Run this code cell to import packages and define functions 
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import ndimage
from scipy.signal import hilbert,medfilt,resample, find_peaks, unit_impulse
import seaborn as sns
from datetime import datetime,timezone,timedelta
pal = sns.color_palette(n_colors=15)
pal = pal.as_hex()
import matplotlib.pyplot as plt
import random

from pathlib import Path

from ipywidgets import widgets, interact
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/nma.mplstyle")

print('Task completed at ' + str(datetime.now(timezone(-timedelta(hours=5)))))

Task completed at 2022-08-18 19:50:04.563815-05:00


Mount Google Drive

In [None]:
#@title {display-mode: "form" }

#@markdown Run this cell to mount your Google Drive.

from google.colab import drive
drive.mount('/content/drive')

print('Task completed at ' + str(datetime.now(timezone(-timedelta(hours=5)))))

Import data digitized with *Nidaq USB6211* and recorded using *Bonsai-rx* as a *.bin* file

If you would like sample this Data Explorer, but do not have data, you can download an example from [here](https://drive.google.com/file/d/10cxBdfnEwRv77-dwcReqHyjYv-uLODe4/view?usp=sharing) and then upload the file to Google Colab (or access the file through Drive after uploading it to your Drive). If you are using this example file, the samplerate was 50000 on two channels (each channel was a set of bipolar electrodes perpendicular to each other with a fisn in the middle). 

In [3]:
#@title {display-mode: "form" }

#@markdown Specify the file path 
#@markdown to your recorded data on Drive (find the filepath in the colab file manager:

# filepath = "full filepath goes here"  #@param 
filepath = '/Users/kperks/Downloads/eod-50k2022-08-17T14_32_36.bin'

#@markdown Specify the sampling rate and number of channels recorded.

sampling_rate = 50000 #@param
number_channels = 2 #@param

downsample = False #@param
newfs = 10000 #@param

#@markdown After you have filled out all form fields, 
#@markdown run this code cell to load the data. 

filepath = Path(filepath)

# No need to edit below this line
#################################
data = np.fromfile(Path(filepath), dtype = np.float64)
data = data.reshape(-1,number_channels)
data_dur = np.shape(data)[0]/sampling_rate
print('duration of recording was %0.2f seconds' %data_dur)

fs = sampling_rate
if downsample:
    # newfs = 10000 #downsample emg data
    chunksize = int(sampling_rate/newfs)
    data = data[0::chunksize,:]
    fs = int(np.shape(data)[0]/data_dur)

time = np.linspace(0,data_dur,np.shape(data)[0])

print('Data upload completed at ' + str(datetime.now(timezone(-timedelta(hours=5)))))

duration of recording was 63.00 seconds
Data upload completed at 2022-08-18 19:50:25.551454-05:00


In [4]:
#@title {display-mode: "form"}

#@markdown Run this code cell to plot imported data. <br> 
#@markdown Use the range slider to scroll through the data in time.
#@markdown Be patient with the range refresh... the more data you are plotting the slower it will be. 

slider = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 1,
    readout=True,
    continuous_update=False,
    description='Time Range (s)')
slider.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(x):
    fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones
    starti = int(x[0]*fs)
    stopi = int(x[1]*fs)
    ax.plot(time[starti:stopi], data[starti:stopi,:])

w = interact(update_plot, x=slider);

interactive(children=(FloatRangeSlider(value=(0.0, 1.0), continuous_update=False, description='Time Range (s)'…

For a more extensive ***RAW*** Data Explorer than the one provided in the above figure, use the [DataExplorer.py](https://raw.githubusercontent.com/neurologic/Neurophysiology-Lab/main/howto/Data-Explorer.py) application found in the [howto section](https://neurologic.github.io/Neurophysiology-Lab/howto/Dash-Data-Explorer.html) of the course website.

<a id="one"></a>
# Part I. Event Detection

We will use "peak detection" for detecting both the stimulu pulses and the eods.

Python has built-in algorithms for detecting "peaks" in a signal. However, it will detect *all* peaks. Therefore, the function takes in arguments that specify parameters for minimum height that can count as a peak and a minimum acceptible interval between independent peaks. 

## EOD times
First, we will subtract the median of the signal, take the absolute value of the signal, and sum across all channels (if you recorded more than one). With this single combined signal, we will detect peaks. 

In [5]:
#@title {display-mode: "form"}

#@markdown Run this code cell to plot the combined signal for peak detection. 
#@markdown Use the plot to determine an appropriate detection threshold.

y = data - np.median(data)
y = np.sum(np.abs(y),1)

slider = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 1,
    readout=False,
    continuous_update=False,
    description='Time Range (s)')
slider.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(x):
    fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones
    starti = int(x[0]*fs)
    stopi = int(x[1]*fs)
    ax.plot(time[starti:stopi], y[starti:stopi])

w = interact(update_plot, x=slider);

interactive(children=(FloatRangeSlider(value=(0.0, 1.0), continuous_update=False, description='Time Range (s)'…

In [6]:
#@title {display-mode: "form"}

#@markdown Fill in this form with the detection threshold. 

detection_threshold = None #@param
detection_threshold = 0.02 #@param
#@markdown Then run the code cell to detect peaks (events)

y = data - np.median(data)
y = np.sum(np.abs(y),1)

d = 0.0003*fs #minimum time allowed between distinct events
r = find_peaks(y,height=detection_threshold,distance=d)

eod_times = r[0]/fs

In [7]:
#@title {display-mode: "form"}

#@markdown Run this code cell to plot the signal on each trial 
#@markdown overlaid with a scatter of EOD times detected using your threshold. 
    
slider = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 1,
    readout=False,
    continuous_update=False,
    description='Time Range (s)')
slider.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(x):
    fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones
    starti = int(x[0]*fs)
    stopi = int(x[1]*fs)
    ax.plot(time[starti:stopi], data[starti:stopi,:])
    ax.scatter(eod_times[(eod_times>x[0]) & (eod_times<x[1])],
               [np.median(data)] * len(eod_times[(eod_times>x[0]) & (eod_times<x[1])]),
              zorder=3,color='black',s=50)

w = interact(update_plot, x=slider);

interactive(children=(FloatRangeSlider(value=(0.0, 1.0), continuous_update=False, description='Time Range (s)'…

Once you know the times of each peak (each event), we can look at the waveforms of those events. To do this, we plot the peak of the signal at the event time and some duration before and after that peak. 

> Note: If you do not think you are detecting enough of the events or if you think you are detecting too much noise, modify your detection threshold and go through the detection steps in Part I again.

## Stimulus Times

In [None]:
# first, detect peaks on stimulus channel

# then, take dirivative of stimulus channel in the 500ms(?) before the peak and the argmax index will be the stimulus onset time.



<a id="one"></a>
# Part I. Random or Non-Random?

[toc](#toc)

We know that the EOD rate is variable (you measured this variability last week (<a href='https://neurologic.github.io/Neurophysiology-Lab/modules/eod/Data-Explorer_eod.html' target="_blank" rel="noopener noreferrer">Electric Organ Discharge</a>). Variability can be random or non-random. Are events distributed randomly in time or is there some <i>structure</i> to how events are generated?

Let's compare the time series of EOD pulses to a ***Poisson model*** with the same average rate. A Poisson Process is a model for a series of discrete events where the average time between events is known, but the exact timing of events is random. 

In [1]:
from scipy.stats import poisson

In [8]:
# get average eod rate 
average_rate = len(eod_times)/((eod_times[-1]-eod_times[0]))

In [9]:
average_rate

5.567771068525262

In [10]:
# use average to get poisson based on average
r = poisson.rvs(average_rate, size=len(eod_times))

In [31]:
1/poisson.rvs(average_rate, size=1)

array([0.5])

In [11]:
sim = [0]
for t in eod_times:
    sim.append(sim[-1]+poisson.rvs(average_rate, size=1))

array([ 6,  5,  6,  3,  8,  7,  6,  9,  1,  4,  7,  4,  5,  4,  3,  6,  7,
        3,  5,  7,  4,  9,  8,  3,  4,  7,  2,  8,  5,  3,  5,  4,  6,  2,
        9,  5,  4,  6,  7,  7,  2,  5,  4,  7,  3,  5,  2,  5,  7,  3,  7,
        4,  9,  5,  6,  6,  5,  2,  3,  5,  4,  6,  5,  4,  3,  5,  3,  8,
        6,  7,  2,  6,  1,  5, 10,  2,  1,  4,  7,  5,  3,  2,  6,  6,  6,
        7,  4,  3,  4,  5,  4,  9,  9,  6,  8,  3,  6,  4,  7,  4, 10,  6,
        6,  8,  6,  4,  3,  7,  6,  4,  6,  3,  9,  3,  6,  2,  4,  8,  3,
        5,  7,  5,  3,  5,  5,  4,  3,  8,  6,  4,  2,  6,  6,  9,  6,  7,
        4,  9,  4,  3,  3,  5,  6,  9,  3,  3,  6,  6,  7,  5, 10,  7,  3,
        4,  7,  7,  9,  5,  4,  6,  5,  5,  4,  4,  6,  3,  2,  4,  4,  8,
        6,  3,  1,  6,  4,  4, 10,  1,  3,  2,  8,  2,  7,  2, 10,  3,  5,
        4,  6,  8,  3,  2,  7,  8,  9,  2,  3,  6,  6,  4,  3,  2,  8,  6,
        5,  4,  4,  8,  7,  5,  7,  9, 10,  3,  7,  6,  5,  7,  7,  6,  5,
        6,  5,  9, 10,  8

<a id="two"></a>
# Part II. What's in a trial?

[toc](#toc)

A trial is generally defined as a portion of data (or behavior... or whatever...) that is defined by occuring at or starting at a particular moment in time and having a defined duration. A set of trials is often referred to as a *bout* if they happen contiguously. 

Practically, how do we define a trial? We need some temporal marker of the thing that we are analyzing a response to. In this case, is the fish responding to the EOD pulse of other fish? In that case, trials would be defined by the EOD times of other fish. Is the fish responding to the lights in the room turning on? In that case, trials would be defined by a transition in light switch state. Is the fish responding to our voice? ... we would need a microphone ...etc.

Trials don't have to be externally-imposed or experimentally-controlled. If we could track the position of the fish over time, then we could ask if the fish responded to being in a specific location in the tank. 

In this experiment, you presented an experimentally controlled stimulus to the fish (the dipole electric field). You recorded a copy of that stimulus command on channel 0 of your ADC. Therefore, we can use that channel to define trial start times. 

First, we need to determine when the stimulus occurred

Then, we need to determine the amplitude of the stimulus each time it was presented. 

We will end up with a table of of times corresponding to trials (rows) across stimulus amplitude (columns).

In [None]:
...

<a id="three"></a>
# Part III. Is it <i>real</i>?

[toc](#toc)

How can you tell if the response to a stimulus is 'real' or if the fish's EOD pattern resembled a response just by chance? We need to compare the trial-averaged response (and distribution of trial-averaged responses) to a <i>null</i> response distribution (this is the <i>null hypothesis</i>). 

One way that we can do this is by randomly marking "trials" for each stimulus condition distributed randomly throughout the recording time period. 

For simplicity, let's average all trials together for this first analysis. 

In [None]:
...

<a id="four"></a>
# Part IV. Can you hear me now?

[toc](#toc)

How strong does a stimulus need to be for the fish to detect it and respond to it? When a stimulus can be varied along a single parameter, we can test the response/detection threshold of an animal to that stimulus. The response/detection threshold is defined as the value of the stimulus parameter (for example amplitude or frequency) that the animal can detect or that it responds to. 

In [None]:
...

To determine if a difference in response across stimulus amplitude is "real" we can do a "trial shuffle." Shuffling trials means that we use the SIU-determined stimulus times to segment the data, but randomly shuffle the identity of the stimulus amplitude on each trial. 

In [None]:
...

<hr> 
Written by Dr. Krista Perks for courses taught at Wesleyan University.

<a id="setup"></a>

<a id="one"></a>

<a id="two"></a>

<a id="three"></a>

<a id="four"></a>