# 1 Time-Varying Data

The goals of this exercise are to:

- learn what kind of data we deal with in neuroscience
- how it can be represented and manipulated in digital computers
- introduce Python notebooks

A notebook is an interactive document that lets you mix text and computer code, organized into `cells`. A cell can contain either text or code. This is a text cell. You can edit the contents of this cell by double-clicking, then save by typing `Ctrl-Enter`.

## Your Assignment 

- Turn in a written response to each of the questions (e.g. **Q1**) below. 
- If working in teams, assign the least experienced programmer to operate the notebook
- Assign the most experienced program to write the responses
- Only one submission per team, please


## Introduction to the Python Notebook

The cell below this is a code cell, as indicated by `In [ ]:` off to the left. When you type `Ctrl-Enter` in a code cell, the contents are passed to the Python interpreter. Each line is executed in turn. If the last line evaluates to something, this will be shown below the code cell. To run the notebook, you need to type `Ctrl-Enter` in each cell (including the one below):

In [None]:
8 + 9 * 2
(3 + 5) * 2

Why did the interpreter only print the value of the last line? 

Some important things to understand about computer programs:

- Think of each statement as a command to an stupid but extremely fast robot. 
- Each time you type `Ctrl-Enter` in a cell, the robot will do what each line in the cell says. 
- If the last statement is an expression that has some value, the interpreter prints that out.

### Python is interactive

It's common to work with an interpreter in this *interactive* mode, which is also called a *read-evaluate-print-loop* (REPL).

You can evaluate the same cell over and over, which is equivalent to telling the robot to do the same thing as it did before. However, if you edit the contents of the cell, you'll get results that correspond to your new statements and expressions.

### Variables

An important task the interpreter can do is to store and retrieve data from memory. A **variable** is essentially a named container where information can be stored. In Python, storing a value in a variable (**assignment**) is done with the equals sign (`=`). The value on the right side is assigned to the variable on the left. Variables are created the first time you use them, and you don't have to tell the interpreter what the type of the data is (more on this later).

In [None]:
age = 42
name = "Arthur"

Note that in contrast to the cell above, this code does not generate an output. 

To retrieve the value of a variable, you just use the name of the variable. For example, the expression below will evaluate to the value currently stored in `age`.

In [None]:
age

#### Displaying variables

You can also use the `print` function to display the values of variables. To use `print`, simply put the variables and expressions you want to display between parentheses, separated by commas, as below:

In [None]:
print(name, "is", age, "years old.")

#### Altering variables

It's important to keep in mind that when you use a variable in an expression, the interpreter looks up the **current value**. 

Storing a new value in a variable doesn't affect prior statements. 

Look at the following cell and try to determine (before running the cell) what the value of `my_sum` will be.

In [None]:
initial = 42
my_sum = initial + 29
initial = 0

print("my sum is", my_sum)

### Controlling the interpreter

As you execute code cells, the interpreter's **state** will change as values are stored in variables. All the cells in a notebook live in the same workspace or `kernel`. That means if you change the value of `age` and then go back and execute the code two cells above, you'll get a different output.

There's nothing to keep you from running cells out of order, and this can be very useful when you're working on solving a problem. However, you can easily get your kernel into a state where a lot of variables are not what you think they are. When this happens, usually the best strategy is to restart the kernel. To do this, go to the `Kernel` menu, select `Restart and Clear Output`. Then to re-run the code, go to the `Cell` menu and select `Run All Above`. Try this now.

While you're looking up there, browse through the other menus and buttons at the top of the page. You can copy, cut, and paste cells with the `Edit` menu to move them around, or insert blank cells in the notebook. This can be very useful when you're experimenting but don't want to mess up some of the code I've provided you. 

Finally, don't ignore the `Help` menu - it explains more about using the notebook and has links to documentation for many of the scientific programming libraries we'll be using.

### Data types

What kinds of data can you put in a variable in Python? The core data types are:

- strings of characters (`strings`)
- numbers (`integers` and `floats`)

Strings are used to store text. You'll often use strings as labels; for example, to indicate what stimulus was played or the identifier for a neuron. Python strings support unicode (which means they can be fancy Greek or other non-Roman letters), and are constructed by using single or double quotes. The following two expressions are the same.

In [None]:
'mötörheαd'
"mötörheαd"

### Numerical data types

There are two main kinds of numerical data types: integers and floats. Integers can be positive or negative, but they never have a fractional component. They're frequently used as counters, indexes, and categorical labels. Floating point numbers represent real quantities, and can be entered as decimals or using scientific notation:

In [None]:
an_integer = -1
a_float = 1.0
another_float = 1.22e6
print(an_integer, a_float, another_float)

As an aside, computers have limited precision representing real numbers, and you sometimes will see weird behavior like in the cell below. For the most part, you can ignore this [representation error](https://docs.python.org/2/tutorial/floatingpoint.html) issue, as none of the quantities you're working with are known to 10+ decimal places. However, it's a good reason to use integers instead of floats whenever you know that there's not supposed to be a fractional component.

In [None]:
0.1 + 0.2

## Time-varying Data

In neurophysiology, much of the data we'll be working with represents some process that changes in time. There are two fundamental ways of representing time-varying data in a computer:

A **time series** is a quantitative physical property of a system measured over a time interval. 

- In digital computers, time series data are always sampled at discrete moments in time.
- The *sampling rate* of the data is the number of times per second the underlying process was measured. 
- Examples of time series include sound waveforms and recordings of extracellular voltage.

A **point process** is a series of times when an event took place. 
  
- An example of a point process is the set of times when a neuron produced an action potential (spike).

### Arrays

Both time series and point process data involve many data points, one for each sampling interval (for time series) or for each event (for point processes). It would be very inefficient to have to give each data point a name; moreover, there's an inherent **order** to the data points that we want to preserve. Fortunately, Python can also store ordered collections of values in a data type called an **array**.

We'll start by considering one-dimensional arrays, which are simply a series of values that all have the same type (e.g., integer or floating point). Point processes and time series can both be stored in arrays; however, the meaning of the values is different. For a time series, the array holds the sequence of **measurements**. For a point process, the array holds the sequence of **event times**.

But enough theory! Let's look at some data. Start by running the cell below, which imports some external libraries. You don't need to worry about the contents yet.

In [None]:
%matplotlib inline
import sys
from pathlib import Path
import numpy as np
import scipy as sp
import IPython
import matplotlib.pyplot as plt
import matplotlib as mpl
from IPython.display import display
import ipywidgets as widgets
sys.path.insert(0,"/standard/psyc5270-cdm8j/comp-neurosci")
from comp_neurosci_uva import signal, data, graphics

---
## Time Series: Intracellular Voltage

One method of monitoring neural activity is by placing an electrode inside a neuron and measuring the voltage relative to a grounding electrode in the bath (below, top right trace):

<img src="https://meliza.org/public/courseware/comp-neurosci/images/recording_dayan.png" alt="recording methods" style="width: 400px;"/>

This generates a **time series** of voltage measurements.

Let's look at an example of a timeseries. Run the cell below.

In [None]:
V, Fs = data.load_timeseries(Path("zf", "intracellular", "920061fe_2_4.wav"))
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))
t = np.arange(0.0, len(V)) / Fs
axes.plot(t, V)
axes.set_xlabel("Time (s)")
axes.set_ylabel("Voltage (mV)")
axes.set_xlim(0, 5.0);

The plot shows the recorded voltage as a function of time. The neuron was stimulated with positive and negative current pulses. The positive pulse evoked a series of action potentials, which are the sharp spikes between 0.2 and 2.2 s. 

**Q1**

1. How many spikes did the neuron produce? 
2. Zoom in on one of the spikes by adjusting the x-axis limits (hint: edit the last line of the code cell). Describe what this looks like up close.
4. Zoom in on the negative-going response a bit after 3 s. Describe the shape of the curve.

### Sampling rate and bit depth

The voltage trace in the plot above is a **sampled** time-series. Although the underlying voltage dynamics are continuous in time, the computer that made the recording only stored the voltage at discrete time intervals. The time between those intervals is called the **sampling period**, and the inverse of this value is the **sampling rate**.

Similarly, although voltage is a continuous quantity, the computer has to store the voltage as a discrete value. Digital computers use binary memory, which means that numbers are represented as a series of *bits*, or binary digits. The number of bits for each sample determines the **resolution** or **bit depth** of the sampling. The more bits, the more different values can be stored.

#### Nyquist-Shannon theorem

Clearly, increasing the sampling rate will increase the amount of hard drive space a recording will use. If a signal is sampled at 40 kHz, it will use twice as much space as the same signal sampled at 20 kHz. So what's the right sampling rate to use?

According to the [Nyquist-Shannon theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem), in order for a sampled time series to represent a signal that's changing at $x$ Hz, the signal has to be sampled at $2x$ Hz. Let's see why this is by looking at one of the action potentials:

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 4))

sampling_rate = 20000

Vspk = np.r_[V[10900:11300], V[14630:15000], V[18510:19300]]
tspk = np.arange(0.0, len(Vspk)) / Fs
ax.plot(tspk, Vspk)
ax.set_xlabel("Time (s)")
ax.set_ylabel("Voltage (mV)")
downsample = Fs // int(sampling_rate)
V_ds = Vspk[::downsample]
t_ds = tspk[::downsample]
p = ax.plot(t_ds, V_ds, 'r.-')

This plot demonstrates how to change an analysis parameter and replot using the same code cell. You can edit the `sampling_rate` variable to change the sampling rate and see how sampling rate affects how well a sampled time series (in red) is able to represent changes in the "true" data (in blue). Each red dot indicates a sample. For illustration purposes, I've placed three spikes close to each other, cutting out some of the intervening time. 

Try setting `sampling_rate` to a few different values and then answer the following questions. Your answers can be qualitative.

**Q2**

1. What's the lowest sampling rate at which you can resolve the time of the peak of all three spikes to within 1 ms? How does this value compare to the duration of the peak?
2. What's the lowest sampling rate at which you can resolve the time of the onset of all three spikes to within 1 ms?
3. How high does the sampling rate need to be to accurately follow the tail of the last spike? How does this value compare to the duration of the downslope?

---
## Time Series: Extracellular Voltage

Another method of monitoring neural activity is by placing an electrode right next to a neuron (below, middle right trace). This is called "extracellular recording". As we'll see below, you can still detect action potentials, but not the subthreshold activity of the neuron.

<img src="https://meliza.org/public/courseware/comp-neurosci/images/recording_dayan.png" alt="recording methods" style="width: 400px;"/>

One major advantage of extracellular recording is that you can monitor activity in the brain while the animal is still alive and behaving. This allows us to more directly examine how sensory activity and brain activity are related. For example, we can present an auditory stimulus to the animal and record the evoked response:

![experiment diagram](https://meliza.org/public/courseware/comp-neurosci/images/experiment_diagram.png "Auditory Neurophysiology Experiment")


Let's see what an extracellular response looks like:

In [None]:
stim_name = "A8"
stim, sampling_rate = data.load_timeseries(Path("starling", "stimuli", stim_name + ".wav"))
spec, freq, bins = signal.specgram(stim, sampling_rate)
r, t = data.load_raw_responses(Path("starling", "extracellular"), unit="st11_1_2", stimname=stim_name)
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(10, 4), sharex=True)
axes[0].imshow(spec, extent=(bins[0], bins[-1], freq[0], freq[-1]), origin='lower', aspect='auto', cmap='jet')
axes[0].set_ylabel("Frequency (kHz)")
axes[0].set_title("Stimulus spectrogram")
axes[1].plot(t, r[0])
axes[1].set_xlabel("Time (s)")
axes[1].set_title("Response")
axes[1].set_xlim(-2, 12);

The top plot is a **spectrogram** of the stimulus. Spectrograms show the power in the sound as a function of time and frequency. More intense colors indicate more power. We'll discuss spectrograms in more detail later, but for now, all you need to know is that you can read the plot like a musical score.

The bottom plot shows the neural recording. The recording begins before the stimulus starts and continues after it ends. These parts of the response are called the **background** or **spontaneous activity**. The part of the response that's aligned with the stimulus is called the **evoked response**.

If you edit the last line of the cell to change the x-axis limits, you can see that the limits of the two plots are linked. This is because we set `sharex=True` in the call to `plt.subplots`. This is an important function to understand, because it's how you set up a figure with multiple subplots. Remember that you can inspect the documentation for a function by placing the cursor just inside the left parenthesis and hitting `Shift-space`.

**Q3**

1. Most of the recording is low-amplitude **noise**. Zoom in on a section of the response around 0 s by editing the x-axis limits. Describe what this looks like up close.
3. This neuron responded with brief bursts of **action potentials**, which caused high-amplitude **spikes** in the signal around 4 s. Zoom in on these and describe what the spikes look like.
4. Zoom back out a little and see if you can spot some patterns in the spectrogram that preceded the spiking. Do these data suggest that the neuron is responding to a particular kind of sound?

---
There are many sources of variability in the brain, so not every spike in a given trial is necessarily caused by the stimulus. Thus, each stimulus is usually presented 5-20 times to get an average that represents the part of the response that's driven by the stimulus. Let's look at a few responses to the same stimulus:

In [None]:
fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(10, 6), sharex=True)
axes[0].imshow(spec, extent=(bins[0], bins[-1], freq[0], freq[-1]), origin='lower', aspect='auto', cmap='jet')
axes[0].set_ylabel("Frequency (kHz)")
axes[1].plot(t, r[0])
axes[2].plot(t, r[1])
axes[3].plot(t, r[2])
axes[3].set_xlabel("Time (s)")
axes[3].set_xlim(-2, 12)

**Q4:** Zoom in on the spike bursts and compare the responses across the trials. What are the differences? What stays the same?

### Spike sorting

A key first step in analyzing the results of an experiment is to *sort spikes*. This is a process that takes the raw neural recordings, which are densely sampled time series, and extracts the times when a spike ocurred. To be able to say that a set of spikes represents a single neuron, we need to make sure that the waveforms are distinct from the noise. There are many different methods of doing *spike sorting*, but the major steps usually consist of:

1. filter the recording to emphasize fast transients (spikes)
1. identify potential spikes and extract their waveforms
2. cluster similar waveforms together and exclude noise and artifacts

![spike sorting diagram](https://meliza.org/public/courseware/comp-neurosci/images/spike_sorting_diagram.png)


Let's start with spike detection. What we need to do is determine when the signal crosses above a threshold. Setting the threshold is usually done manually. The following code snippet will illustrate how this is done.

In [None]:
fig = plt.figure(figsize=(10,4))
ax1 = plt.subplot2grid((1,4),(0,0),colspan=3)
ax1.set_title("Response")
ax2 = plt.subplot2grid((1,4),(0,3))
ax2.set_title("Spikes")
fig.subplots_adjust(wspace=0.3)

# you can change unit name and stimulus to try with other data if you've fetched the data repository
r, t = data.load_raw_responses(Path("starling", "extracellular"), unit="st11_1_2", stimname=stim_name)

threshold = 6000
graphics.spike_detector(ax1, ax2, r[0], t, threshold)

Adjust the threshold by editing the `threshold` variable (if interactive plotting is working, you can do this directly by moving the `Threshold` slider). Any events that cross this line will be detected. A subset of the detected events will be plotted in the right panel as you adjust the threshold, and you can see the detected events marked with a red dot.

**Q5:** Try adjusting the threshold so that all of the detected events have similar waveforms. How many spikes were evoked by the stimulus? What happens if you set the threshold too low?

---
## Point-Process: Spike Times

Typically, with extracellular data we're only interested in the **times** when the neuron spiked, not the shape of the waveform. There's a (somewhat tedious) analysis called *spike-sorting* that extracts spike times from extracellular recordings. It begins with setting a threshold (as above) but also includes some additional steps to isolate real spikes from noise and from the signals of other neurons. At the end of this analysis, we have a **point process** representation of the neural activity, consisting only of the times when the neuron spiked.

Let's see what these data look like. Run the code snippet below:

In [None]:
from comp_neurosci_uva import pprox
unit = "st11_1_2_1"
stim_name = "A8"

stim, sampling_rate = data.load_timeseries(Path("starling", "stimuli", stim_name + ".wav"))
spec, freq, bins = signal.specgram(stim, sampling_rate)

resp = data.load_pprox(Path("starling", "pprox", unit))
resp_A8 = pprox.select_stimulus(resp, stim_name)
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 6), sharex=True)
axes[0].imshow(spec, extent=(bins[0], bins[-1], freq[0], freq[-1]), origin='lower', aspect='auto', cmap='jet')
axes[0].set_ylabel("Frequency (kHz)")
axes[0].set_title("Stimulus spectrogram")
axes[1].plot(t, r[0])
axes[1].set_title("Raw response - Trial 1")
graphics.plot_raster(axes[2], resp_A8)
axes[2].set_xlabel("Time (s)")
axes[2].set_title("Raster")
axes[2].set_xlim(-2, 12)

Zoom in on some of the bursts and verify that the raster "ticks" are aligned with the spikes in the extracellular waveform.

**Q7:** How does this plot compare to what you saw in the raw recordings? How are the spikes represented in the raster plot? What's the advantage of visualizing the response this way?

### Response histograms

The raster plot is useful for comparing activity across trials, but what if we want to know about the *average* behavior of the neuron? One way of representing this is as a **peri-stimulus time histogram** or **PSTH**. This is a plot that shows the average number of spikes that occurred in a series of time bins. We'll pick bins that span the range of spike times.

In [None]:
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 6), sharex=True)
axes[0].imshow(spec, extent=(bins[0], bins[-1], freq[0], freq[-1]), origin='lower', aspect='auto', cmap='jet')
axes[0].set_ylabel("Frequency (kHz)")
axes[0].set_title("Stimulus spectrogram")
graphics.plot_raster(axes[1], resp_A8)
axes[1].set_title("Raster");
axes[2].set_xlabel("Time (s)")

binsize = 0.025  # s
# calculate the time of the first and last spike
start = min(spikes[0] for spikes in resp_A8)
stop  = max(spikes[-1] for spikes in resp_A8)
# calculate the histogram
rate, edges  = np.histogram(np.concatenate(resp_A8), bins=np.arange(start, stop, binsize))
# plot
p = axes[2].step(edges[:-1], rate / rate.max())
axes[2].set_xlim(-2, 12)


**Q8**: Adjust the `binsize` variable. How does changing this affect the appearance of the histogram? What bin size appears to give the most useful summary of the average behavior? What are the tradeoffs between large and small bin sizes?

---
## Spike Train Statistics

As we've seen from previous examples, much of the data generated by the brain consists of spikes (i.e., action potentials). We all know that neurons spike when they are excited, but what does this mean quantitatively?

In this part of the lesson, we'll dive into a fairly simple but foundational model that attempts to quantify spiking as the function of an underlying **rate**.

Our goals are to:

- understand homogeneous and inhomogeneous Poisson process models
- be able to estimate the latent rate variable in Poisson models

Follow along in the notebook during the lecture, and then work on the cells marked **Q** with help from your instructor. Submit the completed notebook to Collab.

### More point process math

Recall that a point process is an (ordered) sequence of event times:

$$X = \{t_0, t_1, \ldots, t_{N-1}\}$$

We can represent this as a function by making each spike a [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function):

$$\rho(t) = \sum_{i=0}^{N-1} \delta(t - t_i)$$

![spike_train](https://meliza.org/public/courseware/comp-neurosci/images/l5_spike_train.png "spike train delta function")

Because the area under each delta function is 1, this allows us to count spikes or calculate any continuous function of a spike train through integration.

For example, the **rate** is defined as the number of spikes $N$ that occurred in some interval divided by the duration of the interval, $T$:

$$R = \frac{N}{T} = \frac{1}{T} \int_{0}^{T} d\tau\; \rho(\tau)$$

![spike_train_rate](https://meliza.org/public/courseware/comp-neurosci/images/l5_rate_integral.png "spike train rate integral")

### Spiking as a random variable

Neural responses are **stochastic**. Even under "identical" conditions, spike trains will vary from trial to trial.

In other words, the response $\rho(t)$ is a **random variable**. This may be a bit challenging to wrap your head around at first, if you're mostly used to thinking of random variables as having discrete values, like in a coin flip, or scalar values, like measurements of some physical quantity. It's the same basic idea though. Instead of flipping a coin and getting a 0 (heads) or a 1 (tails), you get a sequence of spike times.

There's two ways to think about this outcome space mathematically. One way is to represent it as a space of functions $\rho: \mathbb{R} \rightarrow \mathbb{R}$. Alternatively, we can represent a set of $N$ as an $N$-dimensional space ${t_1,t_2,\ldots,t_N} \in \mathbb{R}^N$. Let's start with the latter.

The probability of of a sequence of $N$ spikes $X = \{t_1,\ldots,t_{N}\}$ is the joint probability density of all the individual spikes: 

$$p(t_1, \ldots, t_{N})$$

This is a messy and large space, so one really big simplification we can make is to assume that all the spikes are independent of each other. This means that the probability distribution for $t_1$ doesn't depend on when any of the other spikes occurred. Equivalently, $p(t_1,t_2) = p(t_1)p(t_2)$. If we apply this to the whole spike train, then its joint distribution is simply the product of the distributions for each spike:

$$p(t_1, \ldots, t_{N-1}) = \prod_{i=1}^{N}p(t_i)$$

This distribution has a special name. When each spike is independent of every other spike, we have a **Poisson process**.

### Homogeneous Poisson Processes

If $t_i$ is independent of all the other spikes, what does it depend on?

In the simplest case, $p(t_i)$ is a constant: the probability of observing a spike at any given time is a single number, which corresponds to the **rate** of spiking, $R$.

If the rate is constant, the Poisson process is **homogeneous**. In this case, in an interval $(t_i, t_i + \Delta)$, we would expect to observe $\lambda = R\Delta$ events. The distribution of the number of events we actually observe, $n$, is given by the Poisson distribution:

$$p(n|\lambda) = \frac{\lambda^n}{n!}\exp(-\lambda)$$

The parameter $\lambda$ is often called the **intensity** of the process.

Let's explore some properties of the Poisson distribution in Python. Here's a graph of the distribution from wikipedia:

![poisson_distro](https://upload.wikimedia.org/wikipedia/commons/1/16/Poisson_pmf.svg)

In [None]:
# import the poisson distro
from comp_neurosci_uva import dists

In [None]:
# first, let's define our *support*: the values over which we want to evalute p(n):
supp = np.arange(0, 20)

# next, we *instantiate* the distribution object with our parameter lambda
dist = dists.poisson(1.0)

# you can get the probability of any value in the distribution with .pmf. Note that we have to use
# pmf (probability mass function) rather than pdf.
print("p(5|lambda=1) =", dist.pmf(5))

# we can also evaluate the distribution over a vector of numbers
prob = dist.pmf(supp)

# and plot the distribution with plt.plot
plt.plot(supp, prob, lw=3);

**Q9** What happens if you evaluate the `prob` distribution for a negative or non-integral numbe? Given the definition of the Poisson distribution, why is this the case?

**Q10** Recall that $\lambda = R\Delta$. If $R = 1$ Hz and you steadily reduce $\Delta$ from 1.0 s to 1.0 microseconds, how does the probability of observing one spike in that interval change? To answer this question, you can write a *for loop* to evaluate and plot $p(y)$ for different values of $\Delta$ in the list below. Does the result make sense? What is the probability of a spike occurring at some *exact* time?

In [None]:
Delta = [1e0, 5e-1, 1e-1, 5e-2, 1e-2, 5e-3, 1e-3, 5e-4, 1e-4, 5e-5, 1e-5, 5e-6, 1e-6]

In addition to plotting the probability distribution, Python can generate random samples (i.e, **draw**) from the Poisson distribution, too.

In [None]:
# note: 'lambda' is a reserved symbol in python
lam = 1.0
dist = dists.poisson(lam)
# rvs stands for random value(s)
n = dist.rvs(1000)

You can then calculate summary statistics on those samples:

In [None]:
print(f"The sample mean when lambda= {lam} is {np.mean(n)}")
print(f"The sample standard deviation when lambda= {lam} is {np.std(n)}")

**Q11**: Determine the following identities through induction:

- Mean: $\mu =$
- Standard deviation: $\sigma =$
- Fano factor $\sigma^2/\mu =$

### From Poisson distribution to Poisson process

How can we generate a series of spike times from the Poisson distribution? The trick is to divide your response interval up into a set of smaller intervals (or **bins**) such that the probability of observing more than one spike in a single bin is very small, then draw from $p(n|\lambda)$ for each bin.

In [None]:
np.random.seed(1)
T     = 100     # s
rate  = 4.0     # Hz
Delta = 0.005   # s
dist   = dists.poisson(rate * Delta)
hom_spikes = dist.rvs(int(T / Delta))
bins   = np.arange(0, T, Delta)

fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(6, 3))
axes.plot(bins, hom_spikes)

The plot of the `spikes` array should only show `0` and `1` values. If it doesn't, try adjusting the `Delta` variable in the code cell above. What direction does it need to change to fix the problem? Why?

### Spike arrays and spike times

You can think of the `spikes` array as a time series representation of the point process. 

To get the actual spike times, we need to find the bins where there is a spike and then look up the times in the `bins` array. This allows us to generate a *raster plot*.

In [None]:
hom_spike_i = np.nonzero(hom_spikes)[0]
hom_spike_t = bins[hom_spike_i]
# here's one way to plot a raster of spike times
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(6, 3))
axes.plot(hom_spike_t, np.zeros_like(hom_spike_t), "|")

### More spike train statistics

**Q12** Here's a slightly harder question about the properties of Poisson processes. Calculate the interspike *intervals* from `spike_t` (hint: look at the documentation for `np.diff`), then plot a histogram. What function does this look like? Calculate the sample mean and variance. How do these relate to the rate of the process? What is the coefficient of variation of the interspike distribution (CV = $\mu / \sigma$)?

### Inhomogeneous Poisson Processes

It's also possible for the rate of a Poisson process to vary in time; that is, for $\lambda$ to be a function of $t$.

$$p(n|\lambda(t)) = \frac{\lambda(t)^n}{n!}\exp(-\lambda(t))$$

As before, we need to discretize time and determine the probability that there is a spike in some interval $(t, t + \Delta)$; the only difference is that some intervals are more likely to have spikes than others.

We could simulate an inhomogeneous Poisson process in much the same way as we did above, but we need to vary $\lambda$ in each bin.

Let's look at a simple example where the intensity linearly ramps up from zero and then back down.

In [None]:
np.random.seed(1)
T     = 100     # s
Delta = 0.001   # s
N     = int(T / Delta)
bins  = np.arange(0, T, Delta)
# rate is now a function of time
inh_rate  = np.concatenate([np.linspace(0.0, 4.0, N//2),
                            np.linspace(4.0, 0.0, N//2)])

# generate N values from a uniform distribution
rand = dists.uniform().rvs(N)
# this is an alternative method of simulating spiking based on the Bernoulli distribution
# compare each value to lambda = rate * Delta; if it's greater, then the bin gets a spike
lam  = inh_rate * Delta
inh_spikes = (inh_rate * Delta) > rand
inh_spike_i = np.nonzero(inh_spikes)[0]
inh_spike_t = bins[inh_spike_i]
# here's one way to plot a raster of spike times
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))
p_rate = axes.plot(bins, inh_rate)
axes.plot(inh_spike_t, np.zeros_like(inh_spike_t), "k|")
axes.legend(p_rate, ["true rate"])
axes.set_ylabel("Rate (Hz)")
axes.set_xlabel("Time (s)")

**Q13** Calculate the interspike intervals for this spike train (`inh_spike_t`). What are the mean, standard deviation, and CV? Is the relationship with the rate the same as you discovered for the homogeneous process?