# Multielectrode Data



We're going to load in the multielectrode array data from a CSV file and work with the multielectrode data in pandas

## About the data

The data used here are from a study published by Snyder, Morais, Willis, and Smith (2015). The goal of the study was to relate neural activity across different scales, from single unit (single neuron) to whole-brain networks. To do this, the researchers simultaneously recorded neural activity from a 96-electrode array implanted in the brain, and EEG electrodes on the scalp, of rhesus macaque monkeys. Here we will work with only the 96 electrode invasive recordings, and not the EEG. Of interest is spike count correlation, which is a measure of neural functional connectivity. Functional connectivity is of widespread interest in neuroscience, and essentially refers to correlations in activity between different brain areas. If different brain areas (or individual neurons) show correlated activity, it is likely that they work together in some way. This is particularly true if their functional connectivity changes as a result of experimental manipulations. For example, if two areas (or neurons) show stronger correlation during a particular task than during a control condition, then we might infer that their functional connectivity is related to their involvement in the experimental task. Indeed, in their introduction Snyder and colleagues note that spike count correlations are structured and modulated by both perceptual and cognitive manipulations.

## Recording
The data were recorded from a Utah intracortical multielectrode electrode array (Maynard, Nordhausen, & Normann, 1997) implanted in area V4 of the visual cortex. 
Utah array is a type of microelectrode array consisting of a 10 x 10 grid of silicon microelectrodes that can be placed directly on the cortical surface of a living animal, covering an area of approximately 16 mm2. Each electrode is ~1 mm long, and the tip of each electrode records electrical voltage. In general, the electrodes do not penetrate individual neurons (and if they do, they likely destroy those cells), and so each electrode records signals from a small population of surrounding neurons. The strength of the electrical signals measured by the electrodes drops off with the inverse of the distance from it (i.e., 1/distance). This has two important implications:

Each electrode is spaced far enough apart that it will generate a unique measurement from other electrodes, because the signal strength drops off rapidly with distance.

However, because the electrodes are still relatively close together, and neurons will be located between electrodes, the spiking activity of any single neuron will typically be detected by more than one electrode.

Because we are interested in the behavior of neurons, not in anything inherent about the electrodes used for measurement, it is common practice to apply spike sorting to multiunit data. This is a mathematical process that takes microelectrode array data as input, and returns as output the spike times for individual neurons. Spike sorting is applied to the data after recording, to create separate spike trains for each neuron.

Spike sorting is a form of inverse problem, meaning that there are many possible solutions, and so the results of spike sorting are dependent on the algorithm used, and may not be entirely accurate. Nonetheless, spike sorting algorithms at this point are well-established and reasonably trustworthy. In the present data, spike sorting has already been performed, so the data are treated as coming from individual neurons.



In [None]:
#@title Data retrieval
import os, requests

fname = "data.zip"
url = "https://osf.io/kftxc/download"

if not os.path.isdir('data'):
    if not os.path.isfile(fname):
      try:
        r = requests.get(url)
      except requests.ConnectionError:
        print("!!! Failed to download data !!!")
      else:
        if r.status_code != requests.codes.ok:
          print("!!! Failed to download data !!!")
        else:
          with open(fname, "wb") as fid:
            fid.write(r.content)
    !unzip data.zip

## Import packages

~~~python
import numpy as np
import pandas as pd
~~~

In [111]:
import numpy as np
import pandas as pd

## Set known experiment parameters

~~~python
# times the stimulus went on and off
trial_start_time = -0.150
grating_on_time  = 0.0
grating_off_time = 2.0
trial_end_time   = 2.5
~~~

In [2]:
# times the stimulus went on and off
trial_start_time = -0.150
grating_on_time  = 0.0
grating_off_time = 2.0
trial_end_time   = 2.5

## Import the data


~~~python
df = pd.read_csv('data/multielectrode_data.csv')
~~~

### Exploring the data 


These data are again in **long format**, and they are sparse data with one row for each spike. Let's look at the head of the data to get oriented:
~~~python
df.head()
~~~

The columns are:
- channel — which electrode the data came from
- time — spike time, relative to stimulus onset (so we have negative values for spikes that occurred during the fixation period prior to stimulus onset). This is measured in seconds
- orientation — of stimulus (0 or 90 deg)
- trial number — 1150 trials for each orientation

We can see how many rows there are in the DataFrame (as well as the number of columns, but we could already see that in this case):
~~~python
df.shape
~~~

### Electrodes
Let's see how many electrodes we have data from, and what their labels are. We save each as a variable, which will come in handy later in looping through these.

~~~python
num_chan = len(df['channel'].unique())
print('Number of electrodes (channels): ' + str(num_chan))

channels = sorted(df['channel'].unique())  # use the sorted() function so the channels are listed sequentially
print('Channel labels: ' + str(channels))
~~~

This is a bit weird — we're told this is a 96 electrode array, but there are only 20 electrodes?!  

This is because the full data set is huge, with over 2 million rows. The amount of memory that this requires makes doing anything with the data quite slow. So we've provided data for a subset of channels for the purposes of this tutorial.

### Orientations
What about orientations?

~~~python
orientations = sorted(df['orientation'].unique())
num_ortns = len(orientations)
print('Found ' + str(num_ortns) + ' orientations, which are: ' + str(orientations))
~~~

```python 
df.shape[0] / 20 / 2
```

Selecting repeated trials for 1 channel 
```python
mask = (df.loc[:, 'orientation'] == 0.0) & (df.loc[:, 'channel'] == 42.0)
new_df = df[mask]
trials = new_df['trial'].unique()
neural_data = []
for trial in trials: 
    trial_data = new_df[new_df['trial'] == trial]['time'].values
    neural_data.append(trial_data) 
len(neural_data)
```

## Quick digression into matplotlib

<img src='https://matplotlib.org/stable/_static/logo2_compressed.svg' alt='Matplotlib' width=225>

[Matplotlib](https://matplotlib.org/) is, effectively, the core plotting and data visualization package in Python. Many other packages use Matplotlib for data visualization, including pandas, NumPy, and SciPy. Matplotlib is not the only visualization package in Python, by any means. There are many others, including [seaborn](https://seaborn.pydata.org), [Altair](https://altair-viz.github.io), [ggpy](http://yhat.github.io/ggpy/), [Bokeh](https://docs.bokeh.org/en/latest/index.html), and [plot.ly](https://plot.ly). Some of the others are actually built on top of Matplotlib, but simply the syntax for creating specific, complex types of graphics relative to what's required in Matplotlib (these are called **wrappers** for Matplotlib). Others are entirely independent. Regardless, Matplotlib is the most widely-used and flexible package for data visualization in Python, and so it's valuable to learn it first, and then build out your skills from there. 

Matplotlib is also a very mature Python package, having been first released in 2003 and continuously updated since then. It has a strong development community, a detailed website with extensive documentation and many examples, and there is copious third party documentation in the form of blog posts, books, and more — much of which is freely available.


## Importing Matplotlib

We have previously covered how to import a Python package using the `import` command. We also covered how to import a package with an alias, using the syntax `import [pacakge] as [alias]` 

For Matplotlib, we will do this again, but we add an extra detail: Matplotlib, like many Python packages, is organized into a number of "modules" (essentially subsets of functions). The one that you will typically want to import for plotting is called `pyplot`. So we use the syntax below:

~~~python
import matplotlib.pyplot as plt
~~~

## Generating a Plot
Now we can draw a simple line plot using the `matplotlib.pyplot`'s `plot()` function, by creating two lists of data points (each 4 elements long), which represent time elapsed and distance traveled by some hypothetical object:

~~~python
x = np.linspace(-np.pi,np.pi,100)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Sine curve');
~~~

* Specifying axes limits
```python 
x = np.linspace(-np.pi,np.pi,100)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Sine curve')
plt.xlim([0,3])
plt.ylim([-0.5,1.5]);
```

* Setting ticks on axes
```python 
x = np.linspace(-np.pi,np.pi,100)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Sine curve')
plt.xticks([-2,0,2])
plt.yticks([-1,0,1]);
```

* Controlling the color, linewidth and line type of plots <br>
Built-in color coding <br>
b: blue
g: green
r: red
c: cyan
m: magenta
y: yellow
k: black
w: white <br>
Specify the RGB values using tuples (r, g, b) or give the RGB hex color code
```python
linestyles = ['-', "--", ":"]

x = np.linspace(-np.pi,np.pi,100)

plt.plot(x, np.sin(x), c = (1,0,0), ls = linestyles[1], linewidth=3.0)
plt.plot(x, np.cos(x), c = (0,1,0), ls = linestyles[2], linewidth=1.0)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Basic plotting')
plt.xticks([-2,0,2])
plt.yticks([-1,0,1]);
```

_Adding figure legends_ <br>
Add 'label' attribute and value to plt.plot(), e.g., plt.plot(x,y, label="Plot A") <br>
Add plt.legend(): An optional yet important attribute for this function is loc. Example values for loc include "upper right", "lower center", "center" etc. <br>
For other attributes, check: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html <br>
Optionally, size of legend can be changed using prop keyword: e.g., plt.legend(prop={'size': 10})

```python 
linestyles = ['-', "--", ":"]

x = np.linspace(-np.pi,np.pi,100)
y = np.sin(x)
plt.plot(x, y, c = (1,0,0), ls = linestyles[1], label = "sin(x)")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc="upper left", prop={'size': 20})
plt.title('Sine curve')
plt.xticks([-2,0,2])
plt.yticks([-1,0,1]);
```

## Back to neural data!

```python
def plot_raster(neural_data):
    '''
    Takes a dataframe containing the units group of a nwb file (units_df) and creates 
    a raster plot with a given set of neurons (indexed by neuron_start, neuron_end) and a start and end time.
    '''
    
    num_trials = len(neural_data) # Calculate # of neurons
    my_colors = ['C{}'.format(i) for i in range(num_trials)]  #Generate a list of colors (C0, C1...)
    
    plt.eventplot(neural_data, colors=my_colors)  # Plot our raster plot 

    #plt.xlim([start_time,end_time]) # Set axis limits to only include points in our data
    
    # Label our figure 
    plt.title('Spike raster plot')
    plt.ylabel('Trial #')
    plt.xlabel('Time (s)')
    #plt.yticks()
    plt.show()

# Use our new function
plot_raster(neural_data)
```

The dataset and some parts of this notebook are adapted from Wallisch, Neural Data Science