# Accessing today's data

We will be working with one recording from Frank Lanfranchi's work. Even though this particular recording session lasted only one hour, it comprises more than 75 GB of raw data, and even more preprocessed intermediate results. Fortunately, you don't need to download the entire thing. Instead, you can access it through Google Drive. Please visit:

https://drive.google.com/drive/folders/13kbcg5sIaoP7D8tltuypPDlaiFL00MqL

Create a shortcut to our data inside your own Google drive by clicking on "datasai-daw" (near the top of the window) and on "Add Shortcut to Drive".

Once you have the data on your Drive, it is straightforward to access it in Colab:

In [1]:
from google.colab import drive
drive.mount('/content/drive')
!ls /content/drive/MyDrive/datasai-daw

ModuleNotFoundError: No module named 'google'

Your browser will ask for your permission to allow Colab to access your Drive.

All of today's data lives inside:

In [2]:
root = "/content/drive/MyDrive/datasai-daw/data/2021-07-20_11-59-01"

We will begin our exploration today by attaching the raw data to memory using a library written by Frank and me. We first need to install this library on your Colab instance:

In [1]:
!pip install ephysio



Then import the relevant part of the library into your actual notebook workspace:

In [4]:
from ephysio import openEphysIO

After those preliminaries, we can access the recording in our notebook:

In [5]:
oe = openEphysIO.Loader(root)

ValueError: No data at /content/drive/MyDrive/datasai-daw/data/2021-07-20_11-59-01

This recording comprises one "spike stream", i.e., raw electrode voltage data from all the electrodes:

In [None]:
oe.spikestreams()

and one "nidaq stream", which contains lots of metadata, such as timestamps of various events occurring during the recording, such as when stimuli were presented:

In [None]:
oe.nidaqstreams()

For the moment, we will take a look only at the raw ephys data.

It is unfortunately important to keep in mind that the day's data gathering comprised one "experiment" comprising multiple recordings, only one of which I uploaded to the Googledrive:

In [None]:
oe.experiments()

In [None]:
oe.recordings(expt=1)

Now we can load the data into memory:

In [None]:
dat = oe.data(oe.spikestream(), expt=1, rec=9)
fs_hz = oe.samplingrate(oe.spikestream())

(Actually, we cannot load the data into memory, as it is 75 GB large, but Python conveniently *pretends* the data are loaded, allowing us to access any part of the data as if it all exists in memory.)

### Mini-exercise: The shape of the data

How many electrodes do we have? How many time points? How many seconds?

What is the "contained" data type of these data?

In [None]:
# Insert your code here


### Cautionary exercise: The limits of computer memory

Let's say you want to know the baseline voltage of one of the electrodes, or its RMS noise level. It is tempting to write:

     baseline = np.mean(dat[:, c])
     noise = np.std(dat[:, c])

after setting *c* to the electrode number you care about. But that won't work, because to calculate that mean, Python would have to read the entire recording into memory. (It is not smart enough to work in chunks, and in any case, even reading through 75 GB of data on a Google drive takes a long time.)

Much better to calculate the baseline and noise on a small snippet, or a few small snippets.

Do all of the electrodes have a similar baseline voltage? (I.e., is the spread between baselines across electrodes greater than the noise level of individual electrodes?) How much does the noise vary between electrodes? Is the baseline in the first second of data different from the baseline 1 minute later?

In [None]:
# Insert your code here


What are the units of your answers? Remember that we worked on completely raw data from the DAQ. If you want an answer in volts, use the function `oe.bitvolts`. Educate yourself about it by running

In [None]:
oe.bitvolts?

Would it be a good idea to convert your raw data to microvolts, e.g., with

    dat_uV = dat * oe.bitvolts()

Why (not)?

### Quick aside on documentation

Some code libraries are very well documented online. Others less so. When frustrated, you can ask about all the functions provided by a Python class by way of its `__dict__` attribute, for instance:

In [None]:
openEphysIO.Loader.__dict__.keys()

The result is not very easy to read, but at least you get a list of keywords you can then type into a search engine or query in your notebook:

In [None]:
openEphysIO.Loader.nidaqevents?

By convention, functions that start with "_" are not meant for external use. Avoid them unless you really know what you are doing.

## Visualizing raw traces

Several libraries exist for visualizing electrophysiology traces. I had a hard time getting any to work in Colab. (Let me know if you fare better!) So I wrote a tiny little visualizer myself using Dash and Plotly. It's so basic that it may even be a usable example to see how this sort of thing is done.

In [None]:
!pip install dash
from ephysio import vizly

In [None]:
app = vizly.Vizly(dat, fs_hz)
app.run()

### Exercise: Can you see the action potentials?

Scroll around the recording a little bit to get a feel for the kind of signals these electrodes pick up.

## Stimulus artifacts

Extracellular recordings like these are very prone to electrical artifacts. Generally, a lot of care must be taken to avoid interference from nearby electronics, and even the lights in the room. One form of interference that is very hard to completely prevent, is crosstalk from probes that are used to electrically stimulate the same brain that you are recording from.

The recording we are working with today also contains such artifacts. Electrical stimuli were applied at:

In [None]:
tstim_s = oe.nidaqevents(oe.spikestream(), rec=9)[2][:,0] / fs_hz

(Sorry for the obscure code; the `[2]` is because the times of stimuli are recorded as TTL pulses on BNC connector #2; the `[:,0]` is because we only care about the start of each pulse.)

Let's tell Vizly about our stimulus times:

In [None]:
app = vizly.Vizly(dat, fs_hz, tstim_s)
app.run()

### Exercise: Artifacts and neuronal responses

Do you see the artifacts? Do any neurons respond to the electrical stimuli?

### Artifact removal

Because the electric artifacts are so large compared to the spikes, you cannot expect good results from spike sorting without first taking care of the artifacts. This is a very technical subject, so we're not going into it here. We use an algorithm called SALPA (Wagenaar et al., 2002) that (1) zeroes out parts of traces that are polluted with unrecoverable artifacts and (2) reconstructs signals polluted by lower-amplitude, lower-frequency artifacts.

Running SALPA on Colab is a little bit cumbersome, so we'll use precomputed results today:

In [None]:
dat_pp = oe.data(oe.spikestream(), rec=9, stage="salpa")

(I use the suffix `_pp` to label this variable as processed raw data.)

In [None]:
app1 = vizly.Vizly(dat_pp, fs_hz, tstim_s)
app1.run()

### Exercise: Does this look OK?

Do you think SALPA did a good job suppressing the artifacts? Do you think a smarter algorithm could reconstruct more of the data, so we could see responses that occur sooner after the stimulus? Create plots overlaying original and cleaned versions of a couple of electrode traces around a few stimuli.

In [None]:
# Insert your code here
