# Basics of spike train analysis

This is the first part of the course. 

**You will learn how to:** 
- Load spiking data into NumPy arrays and plot spike rasters.
- Calculate firing rates from spike trains.
- Calculate and interpret: interspike interval histograms, auto- and cross-correlograms.

Let's first import the packages we are going to use, and set up some plotting parameters.

In [None]:
%matplotlib inline
%config InlineBackend.rc={'figure.figsize': (12, 6), 'font.size': 14 }
import matplotlib.pyplot as plt
import numpy as np
from os import listdir
from os.path import isfile, join

## 1 - Loading and plotting spike trains

A spike train of a single neuron can be defined as a set of spike times. 

We will beging by learning how to load the spike times of a single neuron in a numpy array. We will then try to visualize the spike train.

### Loading and plotting a single spike train

Let's first examine the spike times file. NumPy offers a very easy way to load such a file in memory.

In [None]:
example_spikes_path = 'example_spikes.txt' #path of spike train in working folder
example_spike_times = np.loadtxt(example_spikes_path) #load using the function

The file above contains the spike train of a single neuron in seconds.

In [None]:
print(example_spike_times)

**Exercise:** Can you print the shape and the maximum element of the ```example_spike_times``` array?

*Hint:* ```np.shape()``` and ```np.max()``` may be handy!

In [None]:
### START CODE HERE ###
print(np.shape(example_spike_times))
print(np.max(example_spike_times))
### END CODE HERE ###

Can you understand what the meaning of your answers?

**Exercise:** Since you will often have to load of spike trains, you should really understand how to do it. This exercise should also test your basic understanding of passing arguments to functions. Try to fill `load_spike_train()`:

In [None]:
def load_spike_train(spike_train_path):
    """
    Parameters
    ----------
    spike_train_path : string
        File path to spike train text file
    
    Returns
    -------
    spike_times : numpy.ndarray
        Spike times
    """
    
    ### START CODE HERE ###
    spike_times = np.loadtxt(spike_train_path)
    ### END CODE HERE ###
    
    return spike_times

In [None]:
print(load_spike_train('data_spike_trains/18_SP_C203.txt'))

Expected output:
```python
[  0.5766   2.8239   4.5523 ... 481.387  482.4371 482.4677]
```

An useful function for plotting spike trains as a **raster plot** is ```plt.eventplot()```. Let's try plotting our ```example_spike_times```:

In [None]:
plt.eventplot(example_spike_times);

The plot looks very dense, but we can focus only in a specific part to see the individual spikes. We can do that with ```plt.xlim()```!

In [None]:
plt.eventplot(example_spike_times)
plt.xlim([0,10]);

Have you seen this plot before? Can you understand it?

We can also use more Matplotlib functions to make our plot prettier and more informative. Let's try that below:

In [None]:
plt.eventplot(example_spike_times,colors='black',linewidths=0.5)

#add lims-labels
plt.xlim([0,10])
plt.xlabel('Time (s)')
plt.title('Raster plot');

### Loading and plotting multiple spike trains

It is also useful to load multiple spike trains in our workspace. A way to do that is by calling ```np.loadtxt()``` for each file separately, and putting them in a list...

In [None]:
print('spikes1')
spikes1=np.loadtxt('data_spike_trains/18_SP_C101.txt')
print(spikes1)

print('spikes2')
spikes2=np.loadtxt('data_spike_trains/18_SP_C203.txt')
print(spikes2)

print('spikes3')
spikes3=np.loadtxt('data_spike_trains/18_SP_C603.txt')
print(spikes3)

print('spikeslist')
spikeslist=[spikes1, spikes2, spikes3]
print(spikeslist)

However, this can get annoying very fast. Repeating code very often like the example above often indicates that we should think of a better way to solve our problem.

An idea is to save the paths of the relevant spike trains in a list, and access them from there!

In [None]:
path_list = ['data_spike_trains/18_SP_C101.txt', 'data_spike_trains/18_SP_C203.txt', 'data_spike_trains/18_SP_C603.txt']

spikes = np.loadtxt(path_list[2])
print(spikes)

spike_list=[np.loadtxt(path_list[0]), np.loadtxt(path_list[1]), np.loadtxt(path_list[2])]

**Exercise:** Instead of having to do ```np.loadtxt()``` all the time, we can save our multiple spike trains in memory using another list! Given a list of paths, as the one above, return a list of spike trains! Think of what should happen if the input is just a single path!

*Hint:* Make use of `np.loadtxt()` that we used above!

In [None]:
def load_spike_trains_to_list(list_of_paths):
    """
    Parameters
    ----------
    list_of_paths : list of strings
        File paths to the spike train text files
    
    Returns
    -------
    list_of_spikes : list of numpy.ndarrays
        Loaded spike trains
    """
    
    ### START CODE HERE ###
    list_of_spikes = [np.loadtxt(spath) for spath in list_of_paths]
    ### END CODE HERE ###
    
    return list_of_spikes

In [None]:
mypath = 'data_spike_trains/'
onlyfiles = [join(mypath, f) for f in listdir(mypath) if isfile(join(mypath, f))]
print(load_spike_trains_to_list(onlyfiles)[2])

Expected output:
```python
[3.710000e-02 5.223000e-01 5.517000e-01 ... 4.816942e+02 4.818235e+02
 4.832769e+02]
```

How does one get such a list of paths efficiently? There are different options:
- Put all the paths in a text file, and load it as a list
- Read the paths directly into a list
- ...


When checking our result, we used the second option. Let's look closely at how it works:

In [None]:
mypath = 'data_spike_trains/' # path where all the data files are

filenames = listdir(mypath) # this function returns a list of filenames in mypath!

# we need the full path for each element (like folder/folder/file)
onlyfiles = []
for fname in filenames:
    onlyfiles.append(mypath+fname)

all_spike_trains = load_spike_trains_to_list(onlyfiles) #best place to use our function!

Now we have all our spike trains loaded in memory, in the list ```all_spike_trains```! We can now use eventplot to plot spike trains on top of one another. 

In [None]:
plt.eventplot(all_spike_trains, colors='k', linelengths=0.8, linewidths=0.5)

plt.xlim([0,32])
plt.title('Raster plot')
plt.xlabel('Time (s)')
plt.ylabel('Neuron ID');

This took quite some time. How much time exactly? The package ```time``` can help us find out. 

In [None]:
import time

Let's now use it by running the exact same code snippet. The function ```time.time()``` returns the current time in seconds

In [None]:
startTime = time.time() # start counter 

plt.eventplot(all_spike_trains, colors='k', linelengths=0.8, linewidths=0.5)
plt.xlim([0, 32]) 

endTime = time.time() # end counter 

print('Plotting took ' + str(endTime-startTime) + ' seconds')

We wanted to focus on the first 32 s. Instead of plotting the entire length of all spike trains, we can generate a new list of shorter spike trains, and plot only that.

In [None]:
startTime = time.time() # start counter 

trains_to_plot=[]
for ii in range(0, len(all_spike_trains)):
    long_train=all_spike_trains[ii]
    trains_to_plot.append(long_train[long_train < 32])
#trains_to_plot = [all_spike_trains[ii][all_spike_trains[ii] < 32] for ii in range(0, len(all_spike_trains))]

plt.eventplot(trains_to_plot, colors='k', linelengths=0.8,linewidths=0.5)
plt.xlim([0, 32])

endTime = time.time() # end counter 

print('Plotting with short spike trains took ' + str(endTime - startTime) + ' seconds')

Can you predict what would happen to the second plot if you change the xlims to [0, 64]?

**Exercise:** Let's test some basics again! Can you plot the first 32 s of five spike trains from `all_spike_trains`? Specifically, we want the 1st, 12th, 13th, 14th and 19th neurons in the list...

*Hint:* To match the raster above, focus on looking only at the first 32 seconds. You can use either of the two methods we showed!

In [None]:
### START CODE HERE ###
Idx = [0, 11,12,13,18]
newList = [all_spike_trains[ii][all_spike_trains[ii] < 32] for ii in Idx]
plt.eventplot(newList, linelengths=0.8, colors='k',linewidths=0.5);
### END CODE HERE ###

## 2 - Firing rate estimation by binning spike trains

Spike trains almost never contain the same number of events. This fact makes their manipulation for plotting and analysis harder.

### Binning spike trains

Let's first create our bins...

In [None]:
Tmin = 0 # min time in seconds
Tmax = 16 # max time in seconds
dt = 0.01 # time bin in seconds

binedges = np.arange(Tmin, Tmax+dt, dt)  # Include last right edge (+dt)
bincenters = binedges[:-1] + dt/2 # get the centers of the bins

In [None]:
print(binedges[0:10])
print(bincenters[0:10])

...and use them to bin spikes with ```np.histogram()```:

In [None]:
frate,_ = np.histogram(example_spike_times, binedges) # do binning
print(frate[0:150])

frate = frate / dt # transform counts to rates
print(frate[0:150])

In [None]:
# plot results
plt.plot(bincenters, frate)
plt.xlim([0, Tmax])
plt.xlabel('Time (s)')
plt.ylabel('Firing rate (Hz)');

**Exercise:** Let's put what we have learned in the following function

In [None]:
def calculate_firing_rate(spike_times, dt, Tmin, Tmax):
    """
    Parameters
    ----------
    spike_times: numpy.ndarray
        Spike train
        
    dt: float
        Bin size in seconds
        
    Tmin: float
        Starting bin edge in seconds
        
    Tmax: float
        Ending bin edge in seconds
    
    Returns
    -------
    frate: numpy.ndarray
        Firing rate in Hz

    bincenters: numpy.ndarray
        Bin centers in seconds
    """
    ### START CODE HERE ###
    binedges = np.arange(Tmin, Tmax+dt, dt)
    bincenters = binedges[:-1] + dt/2 # get the centers of the bins
    frate, _ = np.histogram(spike_times, binedges) # do binning
    frate = frate / dt # transform counts to rates
    ### END CODE HERE ###
    
    return frate, bincenters

In [None]:
#set parameters
tshowmin = 0
tshowmax = 10
dt1 = 0.01
dt2 = 0.05

#use function
frate1, centers1 = calculate_firing_rate(example_spike_times, dt1, tshowmin, tshowmax)
frate2, centers2 = calculate_firing_rate(example_spike_times, dt2, tshowmin, tshowmax)

#plot results
plt.plot(centers1, frate1)
plt.plot(centers2, frate2)
plt.xlim([0, tshowmax])

#add legend
plt.legend(('Bin size ' + str(dt1*1e3) +' ms', 'Bin size ' + str(dt2*1e3) +' ms'));

If your code works, you should see the firing rate estimated with two different bin sizes. Can you understand this? You can also play around with the parameters in the cell above in order to understand binning.

## 3 - Spike train statistics

By looking at statistical properties of spike trains, we can extract useful information.

### Inter-spike interval histogram

Below, you see the inter-spike interval histogram of a single neuron:

In [None]:
#get all isis
isis_array = np.diff(example_spike_times)

#plot histogram
plt.hist(isis_array * 1e3, np.linspace(0, 30, 100))

#add lims & labels
plt.xlim([0, 30])
plt.xlabel('ISI duration (ms)')
plt.ylabel('# ISIs')
plt.title('Inter-spike interval (ISI) histogram');

What does the plot mean? How long is the absolute refractory period of a neuron?

Let's try to quantify the number of ISIs below a specific interval.

**Exercise:** Estimate the % of intervals that are below ```time_interval``` (in ms) by filling in `percent_intervals(spike_times, time_interval)`

In [None]:
def percent_intervals(spike_times, time_interval):
    """
    Parameters
    ----------
    spike_times : numpy.ndarray
        Spike train

    time_interval: float
        Time interval
    
    Returns
    -------
    pint : float
        Percentage of ISIs below time_interval
    """
    
    ### START CODE HERE ### (approx. 2 lines)
    spdiffs = np.diff(spike_times) * 1e3
    pint = np.sum(spdiffs < time_interval) / spdiffs.size
    ### END CODE HERE ###
    
    return pint

In [None]:
print('ISIs below 2 ms are', 100 * percent_intervals(example_spike_times,2), '%')

**Expected output:**
```python
1.150278293135436%
```

### Autocorrelogram

An approach we can take to construct an auto-correlogram, is to bin our spike times first. Since we are looking at individual spikes, one has to use a time resolution of <1 ms. Furthermore, we will look only at timescales relevant for spiking.

In [None]:
t_res = 0.0008
Tlag = 0.04
Tmax = 600

Nlag = np.round(Tlag / t_res)
bins = np.arange(0, Tmax, t_res)
binnedspikes, binbin = np.histogram(example_spike_times, bins)

Because NumPy's functions are not the best for our purpose, we will import an external library

In [None]:
import pycorrelate as pyc

Now we can use the function ```pyc.ucorrelate()``` to calculate the autocorrelogram. Let's plot the output.

In [None]:
ac = pyc.ucorrelate(binnedspikes, binnedspikes, Nlag)
plt.plot(ac);

What does the peak at 0 mean? In order to understand the auto-correlogram better, it would be nice to remove the peak and plot it in a symmetric fashion!

In [None]:
peakremoved = ac[1:]
ytoplot = np.hstack((np.flip(peakremoved), 0, peakremoved)) 
ytoplot = ytoplot/ ac[0] # normalized by maximum

xtoplot = np.arange(1, Nlag) * t_res * 1e3;
xtoplot = np.hstack((-np.flip(xtoplot), 0, xtoplot))

plt.plot(xtoplot, ytoplot)

# Setup plot appearance
plt.title('Autocorrelogram')
plt.xlabel('Time lag (ms)')
plt.ylabel('Correlation (norm.)')
plt.xlim((-Tlag * 1e3, Tlag * 1e3));

In order to get the autocorrelogram in a nice format, we had to call multiple functions. Often, it helps to define a new function that can do the calculations for us.

**Exercise:** Make the autocorrelogram calculation into a function! 

*Bonus:* think of how to add an extra option in the function that allows it to plot the result as well...

In [None]:
def calculate_auto_correlogram(spike_train, time_resolution, time_lag):
    """
    Parameters
    ----------
    spike_train : numpy.ndarray
        Array of spike times in seconds

    time_resolution : float
        Resolution in seconds

    time_lag : float
        Number of seconds of time lag
    
    Returns
    -------
    xvals: numpy.ndarray
        Array of times

    yvals: numpy.ndarray
        Array of autocorrelogram values
    """
    ### START CODE HERE ###
    Nlag = np.round(time_lag / time_resolution)
    bins = np.arange(0, np.max(spike_train), time_resolution)
    binnedspikes, binbin = np.histogram(spike_train, bins)
    ac = pyc.ucorrelate(binnedspikes, binnedspikes, Nlag)
    yvals = np.hstack((np.flip(ac[1:]), 0, ac[1:])) / ac[0]
    xvals = np.arange(1, Nlag) * time_resolution;
    xvals = np.hstack((-np.flip(xvals), 0, xvals))
    ### END CODE HERE ###

    return yvals, xvals

In [None]:
py, px = calculate_auto_correlogram(all_spike_trains[2], 1e-3, 40e-3)
print(py)

**Expected output:**
```python
[0.05465071 0.06013122 0.0547279  0.05681204 0.06283288 0.05735237
 0.05943651 0.05974527 0.0650714  0.06229255 0.06275569 0.06530297
                        # ...
 0.05943651 0.05735237 0.06283288 0.05681204 0.0547279  0.06013122
 0.05465071]
```

Let's plot the result of your function:

In [None]:
plt.plot(px, py)
plt.xlim((-40e-3, 40e-3));
plt.title('Autocorrelogram')
plt.xlabel('Time lag (s)')
plt.ylabel('Correlation (norm.)');

### Crosscorrelogram

We can also correlate the spike trains of two different neurons. The resulting function is called the crosscorrelogram. By examing the crosscorrelogram, we can infer useful information.

We will now fill in the following function that calculates the crosscorrelogram between spike_train1 and spike_train2. It should be similar as the autocorrelogram above.

In [None]:
def calculate_cross_correlogram(spike_train_1, spike_train_2, time_resolution, time_lag):
    """
    Parameters
    ----------
    spike_train_1 : numpy.ndarray
        Array of spike times in seconds
    
    spike_train_2 : numpy.ndarray
        Array of spike times in seconds
    
    time_resolution : float
        Resolution in seconds

    time_lag : float
        Number of seconds of time lag
    
    Returns
    -------
    xvals: numpy.ndarray
        Array of times

    yvals: numpy.ndarray
        Array of autocorrelogram values    
    """
    
    Nlag = np.round(time_lag / time_resolution)
    bins = np.arange(0, np.max((np.max(spike_train_1), np.max(spike_train_2))), time_resolution)
    binnedspikes1, bb = np.histogram(spike_train_1, bins)
    binnedspikes2, bb = np.histogram(spike_train_2, bins)
    ac1 = pyc.ucorrelate(binnedspikes1, binnedspikes2, Nlag)
    ac2 = pyc.ucorrelate(binnedspikes2, binnedspikes1, Nlag)
    yvals = np.hstack((np.flip(ac2[1:]), ac1))

    xvals = np.arange(1, Nlag) * time_resolution;
    xvals = np.hstack((-np.flip(xvals), 0, xvals))

    return yvals, xvals

If our function works, we should be able to plot the crosscorrelogram of the 1st and 9th neuron.

In [None]:
neuronIdx = [0, 8]

spktrain1 = all_spike_trains[neuronIdx[0]]
spktrain2 = all_spike_trains[neuronIdx[1]]

py, px = calculate_cross_correlogram(spktrain1, spktrain2, 1e-3, 80e-3) # pairs(11,2),

plt.plot(px, py)

plt.xlim((-80e-3, 80e-3));
plt.ylim([0, np.max(py)]);
plt.title('Crosscorrelogram')
plt.xlabel('Time lag (s)')
plt.ylabel('Correlation (norm.)');

Can we understand the cross-correlogram? Let's examine the spike trains...

In [None]:
newList = [all_spike_trains[ii][all_spike_trains[ii] < 16] for ii in neuronIdx]

plt.eventplot(newList, linelengths=0.8,colors='black',linewidths=0.5);

**Exercise:** Using the function from above, calculate and plot the cross-correlogram between the 3rd and 12th spike trains of ```all_spike_trains```. How do you interpret what you see?

In [None]:
### START CODE HERE ### (approx. 2 lines)
py, px = calculate_cross_correlogram(all_spike_trains[2], all_spike_trains[11], 1e-3, 100e-3)
plt.plot(px, py);
### END CODE HERE ###

Well done!