## 2 Time-Varying Data

The goal of this assignment is for you to learn how to use Python to manipulate and visualize time-varying data. As in the previous assignment, you'll be considering an experiment in which recordings of bird songs were presented to an animal while we recorded from neurons in the auditory system.

![experiment diagram](images/experiment_diagram.png "Auditory Neurophysiology Experiment")

This web page is a Python notebook. It's an interactive document that lets you mix text and computer code, organized into `cells`. A cell can contain either text or code. This is a text cell. You can edit the contents of this cell by double-clicking, then save by typing `Ctrl-Enter`.

**Your Assignment:** Some of the cells are left blank or have placeholder text (in *italics*). *Your assignment is to complete those cells*. When you're finished, download the notebook as an 'ipynb' file (see the `File/Download as` menu) and submit this to Collab for grading. If you need to work on the assignment outside the lab, you can run the notebook on [mybinder](https://mybinder.org/v2/gh/melizalab/comp-neurosci.git/master)

The cell below this is a code cell, as indicated by `In [ ]:` off to the left. When you type `Ctrl-Enter` in a code cell, the contents are passed to the Python interpreter. Each line is evaluated in turn. If the last line generates some output, this will be shown below the code cell. To run the notebook, you need to type `Ctrl-Enter` in each cell (including the one below):

In [None]:
# These lines of code import some external libraries we'll use in this exercise:
%matplotlib notebook
import numpy as np
import IPython
import matplotlib.pyplot as plt
import matplotlib as mpl
from IPython.display import display
import ipywidgets as widgets
from tools import signal
mpl.rcParams['image.origin'] = 'lower'
mpl.rcParams['image.aspect'] = 'auto'
mpl.rcParams['image.cmap'] = 'jet'
mpl.rcParams['figure.figsize'] = (10.0, 4.0)

Take a look at the menus and buttons at the top of the page. You can cut and paste cells with the `Edit` menu to move them around. You can execute one or more cells using the `Cell` menu. The Python interpreter is called a `kernel`. Under the `Kernel` menu, you can restart or interrupt the interpreter if it gets stuck. The `Help` menu explains more about using the notebook and has links to documentation for some of the main scientific programming libraries.

This exercise assumes that you're familiar with basic Python syntax. If you do not have any experience with Python, you should have completed the introductory modules at [DataCamp](https://www.datacamp.com/courses/intro-to-python-for-data-science) as a homework assignment. You may also want to refer to some of the following [Software Carpentry](https://software-carpentry.org/) modules if topics come up that were not covered in the DataCamp course:

- [Loops](http://swcarpentry.github.io/python-novice-gapminder/12-for-loops/)
- [Conditionals](http://swcarpentry.github.io/python-novice-gapminder/17-conditionals/index.html)
- [Plotting](http://swcarpentry.github.io/python-novice-gapminder/09-plotting/index.html)


---
### Working with Time Series and Point Process Data

In neurophysiology, much of the data we'll be working with represents some process that changes in time. There are two fundamental ways of representing time-varying data in a computer:

- A **time series** is a quantitative physical property of a system measured over a time interval. In digital computers, time series data are always sampled at discrete moments in time, usually at fixed intervals. The *sampling rate* of the data is the number of times per second the underlying process was measured. Examples of time series include sound waveforms and recordings of extracellular voltage.

- A **point process** is a series of times when an event took place. An example of a point process is the set of times when a neuron produced an action potential (spike).

Both point processes and time series are represented in Python in [numpy](http://www.numpy.org/) `arrays`; however, the meaning of the values is different. For a time series, the array holds the sequence of **measurements**. For a point process, the array holds the sequence of **event times**.

---
## Working with Acoustic Stimuli

Sound consists of waves of pressure moving through the air or another medium. When these waves press against the eardrum, the bones (or bone, if you're a bird) in the middle ear transmit the wave to the fluid in the cochlea. Movements in this fluid are detected by hair cells, which transduce the sound into neural signals. Sound can be recorded with a microphone, which transduces pressure changes into electrical voltage changes. These in turn can be sampled by a digitizer and turned into an array of numbers. This means that a sound recording is a **time series**.

Sound recordings can be stored in a variety of file formats. One of the most common is the `wave` format, which is what we'll be using here. The stimulus files used in this experiment have been placed in the `data/stimuli` directory. We can use the `os` module in Python to get a list of the contents of this directory:<a id="stim_list"/>

In [None]:
import os
os.listdir('data/stimuli')

As an aside, `motifs.csv` is not a sound file. We're going to use it in a later activity. Let's listen to `A8.wav`. If you prefer, you can download the file using [this link](stimuli/A8.wav) and open it in a sound-processing program like [Audacity](http://www.audacityteam.org/).

In [None]:
stim = "A8"
stimfile = os.path.join("data/stimuli", stim) + ".wav"
IPython.display.Audio(stimfile) 

Now let's look at the contents of the file. I've provided you with a simple widget for selecting a stimulus file and loading it into memory. An important quantity to note is the **sampling rate**, which indicates how frequently the sound pressure wave was measured during the recording. The units are samples per second, or *Hz*.

In [None]:
sampling_rate = None
osc = None

def f(stimulus):
    import ewave
    global sampling_rate, osc
    stimfile = os.path.join("data", "stimuli", stimulus)
    try:
        wavfile = ewave.open(stimfile)
        sampling_rate = wavfile.sampling_rate
        osc = wavfile.read(memmap=False)
        print("The sampling rate of '%s' is %d samples/second." % (stimulus, sampling_rate))
    except ewave.Error:
        print("Select a WAV file.")

widgets.interact(f, stimulus=os.listdir('data/stimuli'));

The widget has read the contents of the wave file into the variable `osc`. Select `A8.wav`, and let's see what the value is:

In [None]:
osc

Note that `...` means Python is omitting the samples in the middle, so we don't know how long the array is. To get the number of samples in the array, we use the builtin `len` function:

In [None]:
print("The stimulus is %d samples long." % len(osc))

**Question**: The cell below contains incomplete code. To receive credit for this question, edit the code so it produces the correct answer without raising any errors. You'll need to do some basic math to convert the number of samples into seconds (hint: the sampling rate is in units of samples / second)

In [None]:
## write code here to calculate the variable used in the statement below
stim_length = 
print("The stimulus is %3.2f seconds long." % stim_length)

### Visualizing time series

We can't learn much about the sound just by looking at the numbers in the array. We need to plot it out using the matplotlib library. We've imported matplotlib into the variable `plt`, and we'll use the most basic plotting function, `plot`. Remember that you can get a list of functions in `plt` by typing `plt.` and then hitting the `Tab` key. You can also get help about a specific function by typing `Shift-Tab` inside the parentheses of the function. Try putting your cursor on `osc` in the cell below and typing `Shift-Tab`. Click the up arrow to get the full documentation string.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
axes.plot(osc);

The plot this command generates by default is a line plot. Because there are so many data points in the array, what you see is a waveform that primarily represents the overall amplitude of the sound. The y-axis indicates the values in the array, which have arbitrary units. We only provided a single array to the `plot` function, so the x-axis indicates the number of samples.  To plot the signal as a function of time, we'll need to create a second array that holds the time at each sample:

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
t = np.arange(0.0, len(osc)) / sampling_rate
axes.plot(t, osc)
axes.set_xlabel("Time (s)");

Note: in the code cell above, we are accessing the numpy function `arange` through the alias `np`, which we created in the first cell of the notebook.

**Question**: In the cell below, plot samples 1000-2000 of the waveform, as a function of time in *milliseconds*. Hint: you'll need to convert the time array to milliseconds and then use array *slicing* to select the samples you want to plot.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
# fix these lines
t_ms = ???
axes.plot(???, ???)

#### Spectrograms

Another way of visualizing a time series is as a spectrogram. The spectrogram is based on a mathematical operation called the [Fourier transform](https://en.wikipedia.org/wiki/Fourier_transform), which converts a signal from the *time domain* to the *frequency domain*. That is, instead of representing the sound pressure as a function of time, we will show how much the signal varies as a function of frequency.

In a spectrogram, the original signal is divided up into short, overlapping windows. The signal in each window is converted to a frequency representation using the Fourier transform, and then the windows are "stacked up" to produce a two-dimensional image, with time on the x-axis and frequency on the y-axis.

![spectrogram calculation](images/spectrogram_calculation.png)

You can read a spectrogram a little bit like a musical score. The intensity of the image indicates how much power there is in the signal at a particular time and frequency. There are many different color schemes, but this is largely a matter of preference.

Let's calculate the spectrogram of the signal we just loaded. Although matplotlib comes with a spectrogram function, we're going to use a helper function that hides some of the details.

In [None]:
# this parameter controls the analysis window size
NFFT = 256
# this parameter controls the overlap
shift = 128
# this parameter controls the intensity scaling
compress = 1

P, freq, bins = signal.specgram(osc.astype('d'), sampling_rate, NFFT, shift, compress)

To visualize the spectrogram, which is 2-dimensional, we'll need to use the `imshow` function.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
axes.imshow(P, extent=(bins[0], bins[-1], freq[0], freq[-1]));

This looks a lot more interesting, doesn't it? Try listening to the sound while following along the spectrogram from right to left. If you chose a different song to analyze above, then edit the cell below with the correct stimulus name.

In [None]:
IPython.display.Audio('data/stimuli/A8.wav') 

There are a number of parameters that can be tweaked to make this look as good as possible. In the cell below, adjust the values for NFFT, shift, and compress and see how the spectrogram compares to the one above. If you want, you can zoom into a smaller interval by using the interactive controls.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))

from IPython.display import display
nfft_w = widgets.IntSlider(value=256, min=8, max=1024, step=1, description="NFFT:", continuous_update=False)
shift_w = widgets.IntSlider(value=128, min=8, max=1024, step=1, description="shift:", continuous_update=False)
compress_w = widgets.FloatLogSlider(value=0, base=10, min=-2, max=5, step=0.2, 
                                    description="compress:", continuous_update=False)
display(nfft_w, shift_w, compress_w)

p = ax.imshow(P, extent=(bins[0], bins[-1], freq[0], freq[-1]))

def update(nfft, shift, compress):
    P, freq, bins = signal.specgram(osc.astype('d'), sampling_rate, nfft, shift, compress)
    p.set_data(P)

widgets.interactive_output(update, {"nfft": nfft_w, "shift": shift_w, "compress": compress_w})

**Question**: Zoom in on a segment of about 1 s and adjust each of the parameters until you can see clear vertical, horizontal, and curving lines. What values seem to give the best resolution?

In [None]:
NFFT =
shift = 
compress = 

**Question**: Describe how changing `NFFT`, `shift`, and `compress` change the appearance of the plot. How are frequency and time resolution related to the window size (NFFT)?

*Double click this text and enter your response*

---
## Working with Spike Data

The basic structure of an auditory neurophysiology experiment is that we play a stimulus to the animal while recording from one or more neurons (or *units*) in the brain. The neurons are usually presented with a range of different stimuli to probe how selective they are or determine what parts of the stimulus excite them the best. Last week, you learned how to analyze these recordings to identify single units and the times that they spiked. Because we're only interested in the times the cells spike, this means we're going to be working with **point process** data.

Let's look at some data for a neuron in the songbird auditory system. We'll be using a format for spike time data called [pprox](https://meliza.org/spec:2/pprox/), which is an extension of the widely adopted JSON format. A large collection of responses from about 240 neurons in the starling auditory cortex are stored under `data/spikes`. This directory is too large to list, so we'll just look at one file. The `tools` module contains some functions for working with `pprox` data.

In [None]:
from tools import pprox
unit = "st49_2_1_5"
resp = pprox.load(unit)
# select only the responses to A8
resp_A8 = pprox.select_stimulus(resp, stim)

As discussed above, point process data is also stored in arrays. However, rather than holding values that represent physical measurements, the values indicate the times when an event ocurred. Rather than indicate the absolute time when an event occurred, the values in the data we just loaded represent the event times relative to the start of the stimulus. Each trial is then stored in a Python container called a `list`. Lists are like arrays in how they use indexing, but are more flexible in what they can store. Let's look at the first trial:

In [None]:
resp_A8[0]

Note that some of the times are negative; these correspond to spikes that occurred during a silent interval before the stimulus was played. These are sometimes referred to as "spontaneous" spikes, in contrast to the spikes that are driven by the stimulus.

**Question**: Complete the code cell so that the printed statements are correct and do not produce any errors. You will need to use a [for loop](http://swcarpentry.github.io/python-novice-gapminder/12-for-loops/) (or a list comprehension, if you know what you're doing) to produce an aggregate count of spikes across trials.

In [None]:
# write code here to calculate the variables used in the statements below
n_trials = ??              # hint: what type is the resp_A8 variable?
n_events_trial_2 = ??      # hint: how do you select a specific element from resp_A8?
avg_event_count = ??       # hint: you'll need to write a for loop to count events in each trial
print("The number of trials is %d" % n_trials)
print("The number of spikes in trial 2 is %d" % n_events_trial_2)
print("The average number of spikes in all the trials is %3.2f" % avg_event_count)

#### Visualizing point process data

If we try to plot one of the trials using the same command we used for time series data, the plot doesn't make a lot of sense:

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
ax.plot(resp_A8[0]);

A better representation of point process data is a **raster** plot, where the position on the x-axis represents the time when the event took place. We can make a simple raster plot with the `vlines` command. Take a look at the help for `vlines` (put the cursor inside the parentheses after `vlines` in the cell below and type `Shift-Tab`) to see if you can understand why the command produces this plot.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
ax.vlines(resp_A8[0], 0, 1);

It's clear that there are some periods when the neuron is spiking very rapidly and some periods when it is not spiking at all. How do we know this isn't random, spontaneous activity? Let's see what the cell did on subsequent presentations of the same stimulus. We can use a `for` loop to produce a raster for each trial. Do you understand why the variable `i` is incremented in each loop?

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
i = 0
for trial in resp_A8:
    ax.vlines(trial, i, i + 0.5)
    i = i + 1

**Question**: Describe differences and similarities in how the cell responds during the silent period (t < 0) and during the period when the stimulus is being presented. What do you think is happening during the gap between 2.2 and 4.5 seconds? 

*Double click this text and enter your response*

#### Plotting multiple processes on the same time scale

To get a better sense of how the cell is responding to the stimulus, we'd like to be see them next to each other on the same scale. Matplotlib can accomodate this using *subplots*. Subplots divide the figure into multiple frames. The following code brings together several concepts from above. As before, use the notebook's help and self-documentation features to understand how this next cell works.

In [None]:
# subplots for the spectrogram and raster. This will use the values you picked above
fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(9, 4), sharex=True)
P, freq, bins = signal.specgram(osc.astype('d'), sampling_rate, NFFT, shift, compress)
ax[0].imshow(P, extent=(bins[0], bins[-1], freq[0], freq[-1]))
# create a subplot for the raster
for i, trial in enumerate(resp_A8):
    ax[1].vlines(trial, i, i + 0.5)
# adjust the limits of the x axis so that the spikes align with the spectrogram
ax[1].set_xlim(bins[0], bins[-1]);

**Question**: Compare the stimulus and the response. Are there any features in the stimulus that seem to be correlated with the periods where the cell is firing intensely? Are those features present in the periods when the cell is not firing. In your own words, what kinds of sounds do you think are driving the cell? How could you test this hypothesis?

*Double click this text and enter your response*

### Response histograms

The raster plot is useful for comparing activity across trials, but what if we want to know about the *average* behavior of the neuron? One way of representing this is as a **peri-stimulus time histogram** or **PSTH**. This is a plot that shows the average number of spikes that occurred in a series of time bins. We'll pick bins that span the range of spike times.

In [None]:
fig, ax = plt.subplots(nrows=3, ncols=1, figsize=(9, 6), sharex=True)
P, F, T = signal.specgram(osc.astype('d'), sampling_rate, NFFT, shift, compress)
ax[0].imshow(P, extent=(T[0], T[-1], F[0], F[-1]))
# create a subplot for the raster
for i, trial in enumerate(resp_A8):
    ax[1].vlines(trial, i, i + 0.5)

# adjust binsize here:
binsize = 0.025  # s
# calculate the time of the first and last spike
start = min(spikes[0] for spikes in resp_A8)
stop  = max(spikes[-1] for spikes in resp_A8)
# calculate the histogram
rate, edges  = np.histogram(np.concatenate(resp_A8), bins=np.arange(start, stop, binsize))
# plot
ax[2].plot(edges[1:], rate)

**Question**: Adjust the bin size by editing the line that starts with `binsize =` in the code block above. How does changing this affect the appearance of the histogram? What bin size appears to give the most useful summary of the average behavior?

*Double-click this text and enter your response*

---
### Putting it together

In this final section, your task is to plot the response of the `st49_2_1_5` neuron to each stimulus it was presented with. As in the plot above, each response needs to be plotted under the stimulus and synchronized to it. This may sound simple, but there are a few little hiccups. To get full credit, you may not copy and paste your code however many times it takes to go through all the stimuli. You will need to use for loops, functions, or both. In addition to the resources linked above, you can consult the Software Carpentry chapters on [Looping through Data Sets](http://swcarpentry.github.io/python-novice-gapminder/13-looping-data-sets/) and [Writing Functions](http://swcarpentry.github.io/python-novice-gapminder/14-writing-functions/).

I've provided you with some code to get started, but there are a number of places where you'll have to fix it in order to get things to work.

In [None]:
# get the list of stimuli to plot. Hint: look at the documentation for pprox.get_stimuli
stims = ???
count = ???
print("Total number of stimuli: %d" % count)

# note: ax is a 2D array with row, column indices
fig, ax = plt.subplots(nrows=2, ncols=count, figsize=(10, 4), sharex='col', sharey='row')
fig.subplots_adjust(left=0.05, right=0.95, hspace=0.01, wspace=0.01)
# iterate through the stimuli
for i, stim in enumerate(stims):
    osc, sampling_rate = signal.load_stimulus(stim)
    ### TODO: calculate spectrogram and plot it in top row (index 0)
    ???
    ax[??,??].set_title(stim)
    spikes = pprox.select_stimulus(resp, stim)
    ### TODO: loop through the trials and plot the spikes as a raster
    ???
    # adjust the limits of the x axis so that the spikes align with the spectrogram
    ax[??,??].set_xlim(bins[0], bins[-1]);