# 2: Point Process Statistics and Linear Time-Invariant Systems

This week our lab will focus on dealing with point process data in Python and on modeling firing rates as
inhomogeneous Poisson processes.

Work on the cells marked **Q**. Save the completed notebook as an HTML file and submit on Canvas, one submission per group.

In [None]:
# load matplotlib inline mode
%matplotlib inline

# import some useful libraries
import sys
from pathlib import Path
import numpy as np                # numerical analysis linear algebra
import matplotlib.pyplot as plt   # plotting
sys.path.insert(0,"/standard/psyc5270-cdm8j/comp-neurosci")
from comp_neurosci_uva import signal, data, pprox, dists

## Point processes

When we record from neurons, we observe the times when they spike. We can represent these data as an ordered sequence of times in some interval from 0 to $T$:

$$\{0 \leq t_1 < t_2 < \ldots < t_N \leq T\}$$

In contrast to time series, there is not a fixed relationship between the number of events and the duration of the analysis interval.

When we store point processes in Python or another programming language, we typically use an array, but the elements of the array are event times, not measurements.

Because point processes vary in the number of events, multi-channel or multi-trial point-processes are represented by **lists of arrays**, not by 2D arrays.

Let's look at an example. The code in the cell below will load spike times recorded from a single neuron in the starling auditory pallium in response to several conspecific songs. Each song is denoted by a short code. We will look at the responses to `A8`.

In [None]:
resp = data.load_pprox(Path("starling", "pprox", "st11_1_2_1"))
resp_A8 = pprox.select_stimulus(resp, "A8")
resp_A8

The variable `resp_A8` refers to a Python **list**. Lists are like arrays, but they can store heterogeneous data types. The syntax for accessing elements and slices is the same.

### I/O for Point Processes

There is no standard format for point-process data. Because point process data tend to be smaller than time series, text formats are more common than binary. A very simple text format is to put each trial (or channel) on a separate line and separate the events on each line with a space. Take a look at `data/io-examples/st_11_2_1_A8.txt` for an example.

The [PySpike](http://mariomulansky.github.io/PySpike/) library has a function for loading data from such files, but we're going to write our own so that we can learn a bit about basic I/O in Python and looping.

In [None]:
# create a list where we will store our trials
trials = []
# open the file for reading
fp = open(data.data_path / "io-examples" / "st11_1_2_1_A8.txt", mode="r")
# loop through the lines of the file with a for statement
for line in fp:
    # read the line into an array
    arr = np.fromstring(line, sep=" ")
    # append the array to our list
    trials.append(arr)

If you learned how to program in Java or C or another low-level programming language, take a moment to appreciate how simple this task is in Python.

### Point process metadata

As with time series data, it's important to keep track of metadata. Here are some important metadata that need to be associated with point process files:

- type of event (e.g., spike, behavioral action, stimulus start/stop)
- number of channels
- unit scaling (e.g., milliseconds or seconds?)
- start time
- other experimental variables

## Spike Train Statistics

Recall from last week's lab that we can also represent a spike train as an ordered sequence of spike times:

$$X = \{t, t_1, \ldots, t_{N-1}\},$$

and that this is a random variable with the joint probability distribution

$$p(t_1, \ldots, t_{N}).$$

If we assume that each spike is independent of every other spike, then we have a **Poisson process**,

$$p(t_1, \ldots, t_{N-1}) = \prod_{i=1}^{N}p(t_i).$$

This simplifies things a lot because now we just have to figure out what $p(t_i)$ is.

If $p(t_i)$ is constant, the Poisson process is **homogeneous**. In this case, in an interval $(t_i, t_i + \Delta)$, we would expect to observe $\lambda = R\Delta$ events. The distribution of the number of events we actually observe, $n$, is given by the Poisson distribution:

$$p(n|\lambda) = \frac{\lambda^n}{n!}\exp(-\lambda).$$

If $p(t_i)$ depends on time, then the Poisson process is **inhomogeneous** and $\lambda$ is a function of $t$:

$$p(n|\lambda(t)) = \frac{\lambda(t)^n}{n!}\exp(-\lambda(t)).$$

### Estimating spike rates

Let's think about how we can estimate $\lambda$ for Poisson processes.

If we assume that the process is homogeneous over each trial, then we have a simple observational model where the number of spikes is a random sample from the Poisson distribution:

$$p(y_i|\lambda) = \frac{\lambda^n}{n!}\exp(-\lambda).$$

Given a set of trials, we can estimate $\lambda$ from the sample mean of the spike count:

$$\hat{\lambda} = \frac{T}{N}\sum_i y_i,$$

where $N$ is the number of time bins and $T$ is the duration of the observation interval.

The problem is a lot trickier if the process is inhomogeneous, because now we're trying to estimate a continuous function of time, $\lambda(t)$.

In this setting, people usually talk about rate ($r$) rather than intensity ($\lambda$), so we'll use $r(t)$ from here on.

The issue we confront is that $r(t)$ is a continuous function. We can discretize it into small intervals of $(t, t + \Delta)$ and count the number of spike in each interval, but as we make $\Delta$ smaller to get higher temporal resolution, we reach the point at which each bin has either one or zero spikes, which doesn't tell us much about the rate. We can address this problem by averaging across multiple trials. If we use $\langle \rangle$ to denote averaging across trials, this looks like:

$$r(t) = \frac{1}{\Delta} \int_t^{t+\Delta} d\tau\; \langle \rho(t) \rangle$$

You hopefully can see that as $\Delta$ gets smaller, the number of trials you need to average to get a smooth function gets larger. So part of our problem is to determine what $\Delta$ should be. More practically, at what time scale do we think the rate is changing?

There are a number of different ways of approximating $r(t)$. We'll look at a couple.

### Spike time histogram

For historical reasons, this is also called a peri-stimulus spike time histogram (PSTH), even when there isn't a stimulus.

The simplest way of approximating the rate is to divide the interval up into a fixed number of bins of duration $\Delta$ and count how many spikes occurred in each bin. The rate is simply the number of spikes divided by $\Delta$.

To illustrate, let's generate some trials from an inhomogeneous process that ramps up and then down in rate.

In [None]:
np.random.seed(1)
trials= 10
T     = 100     # s
Delta = 0.001   # s
N     = int(T / Delta)
bins  = np.arange(0, T, Delta)
# rate is now a function of time
inh_rate  = np.concatenate([np.linspace(0.0, 4.0, N//2),
                            np.linspace(4.0, 0.0, N//2)])

inh_spikes_v = []
inh_spikes_t = []
# generate 10 trials
for trial in range(trials):
    # generate N values from a uniform distribution
    rand = dists.uniform().rvs(N)
    # this is an alternative method of simulating spiking based on the Bernoulli distribution
    # compare each value to lambda = rate * Delta; if it's greater, then the bin gets a spike
    lam  = inh_rate * Delta
    spike_v = (inh_rate * Delta) > rand
    spike_i = np.nonzero(spike_v)[0]
    spike_t = bins[spike_v]
    inh_spikes_v.append(spike_v)
    inh_spikes_t.append(spike_t)
inh_spikes_v = np.column_stack(inh_spikes_v)

Take a look at the code above and make sure you understand what it's doing. In particular, note that there
are two equivalent representations of the spiking data. 

- `inh_spike_t` contains arrays of spike times. This is akin to our $X = {t_1,\ldots,t_N}$ formalism.
- `inh_spike_v` is a two-dimensional array of 0's and 1's, with the 1's corresponding to times when there were spikes. This is like a time series, a discrete version of $\rho(t) = \sum_{i=1}^N \delta(t - t_i)$. The rows correspond to time and the columns to trials.

Each representation lends itself best to a different kind of plot. The spike times are best plotted as a **raster** in which ticks
indicate the times when the cell spikes.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))

for trial in range(trials):
    spike_t = inh_spikes_t[trial]
    axes.plot(spike_t, np.zeros_like(spike_t) + trial, "k|")

axes.set_ylabel("Rate (Hz)")
axes.set_xlabel("Time (s)");

How could we plot the spike vectors? One option is a **heat map**, where the color indicates the presence or absence of a spike.

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))
axes.imshow(inh_spikes_v.T, interpolation="none", aspect="auto", vmin=0, vmax=1, origin="lower", extent=(0, T, 0, trials))

axes.set_ylabel("Trial")
axes.set_xlabel("Time (s)");

This doesn't look very helpful! The problem is that the time step is really small, so there's 100,000 points in each trial.
When we try to plot this, the resolution of the image is not high enough to represent every spike, so some get left out.

We can plot each vector separately, but what it shows is not very useful. You can sort of tell where the spikes are denser, but not in a way
that's at all quantitative.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))
ax.plot(bins, inh_spikes_v[:, 0])

Let's try generating a histogram with a coarser bin size. For now we'll just use the first trial.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))
ax.plot(bins, inh_rate)
bin_size = 2.0
bin_count = int(T / bin_size)

spike_t = inh_spikes_t[0]
ax.plot(spike_t, np.zeros_like(spike_t), "k|")
r_est, edges  = np.histogram(spike_t, bins=np.arange(0, T + bin_size, bin_size))
p = ax.step(edges[1:], r_est / bin_size)

ax.set_ylabel("Rate (Hz)")
ax.set_xlabel("Time (s)");

The main problem with histograms is that setting the bin size is largely subjective. Try adjusting the `bin size` variable and see what gives you the best tradeoff between variability and temporal resolution.

There is still active development of new methods for adaptively setting bin sizes in timing histograms.

### Smoothing

Another problem with the PSTH is that the count in each bin depends a lot on where the edges of the bins are.

One solution to this problem is to use a **sliding window**. The simplest window is simply a square function with a defined width and a total area equal to 1.0.

For example, here's a 2 s window.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(3, 3))
window_size = 2.0
w_t = np.arange(-10.0, 10.0, Delta)
w = np.zeros_like(w_t)
w[(-window_size/2 < w_t) & (w_t <window_size/2)] = 1. / window_size
ax.plot(w_t, w)
ax.set_xlabel("Time (ms)");

### Convolution

We can express the sliding window operation as a sum of the window function for the values of the spike times.

$$r(t) \approx \sum_{i=0}^{N-1} w(t - t_i)$$

This is equivalent to doing an integral over the response function:

$$r(t) \approx \int_{-\infty}^{\infty} d\tau\; w(\tau) \rho(t - \tau)$$

This integral is also called a linear **convolution** or filter, and we'll be seeing a lot of them.

Numpy has a function that can calculate this convolution:

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))
spike_t = inh_spikes_t[0]
spike_v = inh_spikes_v[:,0]
ax.plot(spike_t, np.zeros_like(spike_t), "k|")
r_est = np.convolve(spike_v, w, mode='same')
ax.plot(bins, inh_rate)
ax.plot(bins, r_est)
ax.set_ylabel("Rate (Hz)")
ax.set_xlabel("Time (s)")

You can use any function as a window as long as it goes to zero outside $\tau = 0$ and its integral is 1.0.

A popular choice is to use a Gaussian, which smooths the function by downweighting points further away from $\tau = 0$

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(3, 3))
sigma = 2.0
w_t = np.arange(-5 * sigma, 5 * sigma, Delta)
w = 1 / np.sqrt(2 * np.pi) / sigma * np.exp(-w_t**2 / 2 / sigma**2)
ax.plot(w_t, w)
ax.set_xlabel("Time (s)")

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 3))

ax.plot(bins, inh_rate)
ax.plot(spike_t, np.zeros_like(spike_t), "k|")

r_est = np.convolve(spike_v, w, mode='same')
p = ax.plot(bins, r_est)

ax.set_ylabel("Rate (Hz)")
ax.set_xlabel("Time (s)")

Try adjusting the `sigma` variable in the cell that creates the Gaussian window, then run both cells. What
value gives a plot that you like the look of?

### Averaging trials

Hopefully, these exercises have illustrated the fundamental tradeoff between variance and temporal resolution. As you increase the bin (or window) size ($\Delta$), the estimated rate becomes less variable, but the temporal resolution decreases. Thus, smoothing can interfere with detecting rapid changes in the underlying rate function.

As noted above, one solution to this problem is to average across trials. In essence, this gives you multiple independent estimates of the rate at any given instant, thereby reducing the amount of smoothing you need.

To apply smoothing to multiple trials, we can either average across trials first and then smooth, or smooth each trial and then average.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(9, 5))

# smooth first
for trial in range(trials):
    r_est = np.convolve(inh_spikes_v[:,trial], w, mode='same')
    axes[0].plot(bins, r_est)
axes[0].set_ylabel("Rate (Hz)")

# average first
spikes_v = inh_spikes_v.mean(1)
r_est = np.convolve(spikes_v, w, mode='same')
axes[1].plot(bins, inh_rate)
axes[1].plot(bins, r_est)
axes[1].set_ylabel("Rate (Hz)")
axes[1].set_xlabel("Time (s)")

If we bin the spikes and average across trials, we get an estimate of the rate in each bin:

### Exercise 1

Consider an inhomogeneous Poisson process with a time-varying rate specified by:

$$r(t) = r_\mathrm{max}[\sin(2\pi\omega t) + 1]$$

Use $\omega$ = 2.1 Hz, $r_\mathrm{max} = 50$ Hz, and a response interval from 0 to 2 s.

**1.1** Plot r(t) with a time step of 1 ms (0.001 s)

**1.2** Generate 20 independently simulated spike trains and plot them as rasters. There is code in previous notebooks you can use to make the raster plot.

**1.3** Using a bin size of 10 ms, calculate the PSTHs averaged from the first 10 trials and the last 10 trials.

**1.4** Now simulate 1000 trials and calculate a PSTH from the first and second half. How do these PSTHs relate to $\lambda(t)$? Do more trials give you a more precise estimate of the rate?

**1.5** Calculate a PSTH *for each trial* of the 1000-trial simulation. This will yield a 1000 x 200 array, with trials along one axis and time along the other. For each time bin, calculate and plot the mean, variance, and Fano factor. Each of these will be a 200-element array.

**1.6** Repeat the analysis in the previous question but with a bin size of 150 ms. What happens to the Fano factor? Why is it important to choose a bin size such that the rate is not changing much within each bin?

## Linear time-invariant systems

Consider a dynamical system with input $x(t)$ and output $y(t)$.

![dynamical system](https://meliza.org/public/courseware/comp-neurosci/images/l6_dynamical_system.png)

The system is **linear** if it obeys the laws of superposition and scaling. That is, if

\begin{align}
a(t) & \rightarrow b(t) \\
c(t) & \rightarrow d(t)
\end{align}

Then the following must be true:

\begin{align}
\alpha a(t) & \rightarrow \alpha b(t) \\
\alpha a(t) + \beta c(t) & \rightarrow \alpha b(t) + \beta d(t)
\end{align}

Furthemore, for the system to be **time-invariant**, shifting the input in time will lead to the same output, shifted in time by an equal amount:

\begin{align}
a(t + \Delta) & \rightarrow b(t + \Delta)
\end{align}

### Impulse Response Functions

How can we characterize the transformation that occurs between the input and output of an LTI system?

It turns out, we only need to know one thing: the system's **impulse response function**. This is the response you would obtain to a small pulse of unit amplitude, $\delta(t) \rightarrow h(t)$.

For example, here's the impulse response function of a tuning fork. When you hit it, it starts oscillating at 100 Hz and then decays (rather quickly, for illustration).

In [None]:
t = np.linspace(0.0, 1.0, 1000)
h = np.sin(100 * t) * np.exp(- 5 * t)
plt.plot(t, h)
plt.xlabel("Time (s)");

Once we know $h(t)$, we can compute the output $y(t)$ to **any** input $x(t)$. Why? Because any signal can be represented as a sum of time-shifted unit impulses of varying amplitude:

$$x(t) \equiv \sum_i \delta(t - \tau_i) x(\tau_i)$$

Recall that $\delta(t)$ is equal to $\infty$ at $t=0$, is zero everywhere else, and that the area under $\delta(t)$ is equal to 1. $\delta(t - \tau_i)$ simply shifts the impulse to $\tau_i$, and $x(\tau_i)$ scales it by the value of $x(t)$ at $t = \tau_i$. 

Although this equation seems trivial, it's the basis of **discretization** and an important operation called **convolution**.

In [None]:
# NB: the python keyword `lambda` allows us to define simple functions inline
f = lambda t: np.sin(2 * np.pi * t)
plt.plot(t, f(t))
tau = np.linspace(0.0, 1.0, 10)
plt.vlines(tau, 0, f(tau))
plt.plot(tau, f(tau), 'ko')
plt.xlabel("Time (s)");

## Convolution

So how do we predict the response of our system from $h(t)$?

Because the system is time-invariant, we know the response to an impulse shifted by $\tau_i$ is just $h(t)$ shifted by the same amount:

\begin{align}
\delta(t - \tau_i) & \rightarrow h(t - \tau_i)
\end{align}

If this impulse is scaled by $x(\tau_i)$, then the output is scaled equally:

\begin{align}
x(\tau_i) \delta(t - \tau_i) & \rightarrow x(\tau_i) h(t - \tau_i)
\end{align}

Finally, because of superposition, if we add together many scaled, time-shifted impulses, the output is simply the sum of the scaled, time-shifted responses:

\begin{align}
\sum_i x(\tau_i) \delta(t - \tau_i) & \rightarrow \sum_i x(\tau_i) h(t - \tau_i)
\end{align}

So,

$$y(t) \equiv \sum_i h(t - \tau_i) x(\tau_i)$$

Here's an illustration for the sine function in the previous cell:

In [None]:
fig, axes = plt.subplots(nrows=3, ncols=1, sharex=True, figsize=(9, 6))
tot = np.zeros(t.size + h.size)
axes[0].vlines(tau, 0, f(tau))
axes[0].plot(tau, f(tau), 'ko')
axes[0].set_title(r"$x(\tau_i)$")
# this loop is implementing the summation in the convolution equation
for tau_i in tau:
    hf = h * f(tau_i)
    axes[1].plot(t + tau_i, hf)
    idx = t.searchsorted(tau_i)
    tot[idx:idx+hf.size] += hf
axes[2].plot(np.linspace(0, 2.0, tot.size), tot)
axes[1].set_title(r"$h(t - \tau_i)x(\tau_i)$")
axes[2].set_title(r"$\sum h(t - \tau_i)x(\tau_i)$")
axes[2].set_xlabel("Time (s)");

Notice how constructive and destructive interference produces a rather complex pattern.

### Convolution (II)

As the spacing between samples becomes infinitesimally small, the convolution sum becomes an integral:

$$y(t) = \int_{-\infty}^\infty h(t - \tau) x(\tau) d\tau$$

The shorthand for convolution is $*$ : $y(t) = (h * t)(t)$

Convolution is commutative:

\begin{align}
(h * t)(t) & = (t * h)(t) \\
\sum_i h(t - \tau_i) x(\tau_i) & = \sum_i h(\tau_i) x(t - \tau_i)
\end{align}

Convolution can be done in any domain.

One way of interpreting the convolution sum is that it tells us that the output is computed by taking a *weighted sum of the present and past input values*. We can see this by writing out the sum:

$$\sum_i h(\tau_i) x(t - \tau_i) = h(0)x(t) + h(1)x(t - 1) + \cdots$$

The system is **causal** if $h(\tau)$ is only greater than zero for $\tau \geq 0$.

In most physical systems, the impulse response decays away with time, so there is a point where we can consider $h(\tau)$ to be essentially zero and truncate the function.


Here are some visual illustrations of convolution from [Wikipedia](https://en.wikipedia.org/wiki/Convolution). Essentially you are taking one of the functions, flipping it in time, sliding it past the other function, and computing the area where the two functions overlap. Convolution is also called **filtering**.

![convolution_animation](https://meliza.org/public/courseware/comp-neurosci/images/l6_convolution_box.gif)

![convolution_diagram](https://meliza.org/public/courseware/comp-neurosci/images/l6_convolution_static.png)

## LTI neuron models

We now have our first model of how sensory neurons respond to stimuli.

In essense, we are representing the neuron as a linear filter that computes a weighted sum of the stimulus as it varies in time.

Although very simple, LTI models (and linear filters) can generate surprisingly complex behavior. The exercises in this notebook will help you explore some of this complexity.

### Exercise 2

We will investigate the properties of two different LTI models. Their impulse response functions are:

\begin{align}
h_1(t) & = 
\frac{t}{\tau_1^2} e^{(-t/\tau_1)} \\
h_2(t) & = 
\frac{t}{\tau_1^2} e^{(-t/\tau_1)} - \frac{t}{\tau_2^2} e^{(-t/\tau_2)}
\end{align}

Both filters are causal, so $h_1(t) = h_2(t) = 0$ for all $t < 0$.

Functions with the general form of $t \exp (-t)$ are called **alpha** functions. To get you started, I've defined a Python *function* that will generate alpha kernels for you:

In [None]:
def alpha(tau, duration, dt):
    """An alpha function kernel with time constant tau, scaled to 
    
    tau: the time constant of the kernel (in units of duration/dt)
    duration: the duration of the support for the kernel
    dt: the sampling interval of the kernel
    
    Returns a tuple (h(t), h(t))
    """
    t = np.arange(0, duration, dt)
    k = t / tau**2 * np.exp(-t / tau)
    return (k, t)

Like mathematical functions, Python functions take one or more **arguments**. The result of applying the function to those arguments is the **return value**. Functions can only return a single value, but we can easily get around this by returning a **tuple**. In Python, a tuple is a kind of list that can't be modified. You can unpack the tuple into separate variables when you call a function by using **deconstruction**, as illustrated in the code cell below:

In [None]:
h1, t = alpha(50, 1000, 1)
plt.plot(t, h1);

**2.1** Let $\tau_1 = 50$ ms and $\tau_2 = 100$ ms. Plot $h_1(t)$ and $h_2(t)$ for $0 < t < 1000$ ms, with a time step of 1 ms. Use the cell above as a model for your code. Note that $h_2(t)$ is the *sum* of $h_1(t)$ and another alpha function.

**2.2** Consider three input signals:

\begin{align}
s_1(t) & = \sin(2 \pi \omega_1 t) \\
s_2(t) & = \sin(2 \pi \omega_2 t) \\
s_3(t) & = \mathrm{sign}\; [s_1(t)]
\end{align}

Let $\omega_1 = 0.3$ Hz and $\omega_2 = 3$ Hz. For $s_3$, `sign` means that the value is 1.0 if $s_1(t) > 0$ and -1.0 if $s_1(t) \leq 0$.

Generate and plot 10 s of data for each signal, using a time step of 1 ms. Keep your time units consistent!

Hint: Use `plt.subplots` to generate a nice grid of plots (see above for examples)

Another hint: Don't reinvent the wheel! See if there's a numpy function that will compute `sign` for you.

**2.3A.** Convolve $s_1$ with the $h_1$ impulse response functions and plot the results. Do this by writing your own loop. See the model above for an example of how to do this.

**2.3B.** Now use `np.convolve` to compute the convolution of $s_1$ and $h_1$. Do you get the same result as when you did it manually? If not, you might need to play with the `mode` argument to `np.convolve`.

**2.3C.** Now do the convolution for each combination of signal ($s_1$, $s_2$, and $s_3$) and kernel ($h_1$ and $h_2$) and plot the results.

Hint: Use `plt.subplots` to generate a grid of 6 panels.

**2.3D.** What do you notice about the response amplitudes? What differences do you see between the outputs of the two filters?

**2.4.** Now let's consider a white noise input:

$$s_4(t) \sim N(0,1)$$

$N$ means that each sample is drawn from a normal distribution with mean 0 and standard deviation 1.0. (Hint: use `np.random.randn`)

Compute and plot $(h_1 * s_4)$ and $(h_2 * s_4)$, with $s_4(t)$ evaluated over a 10 s interval with resolution of 1 ms.

**2.5.** It's a bit hard to compare the results of the convolution in the time domain, so let's see what the spectrum looks like.

We'll discuss spectral analysis in some detail later in the course, but for now I've provided the code you need. Just change the variable names to match what you used in the previous question.

In [None]:
from scipy import signal
# s4 should be your white noise signal
freq, S4 = signal.welch(s4, nperseg=10000, fs=1000)
plt.plot(freq, S4, 'k:', label=r"$s_4$")
# the variable h1s4 should contain the convolution of s_4 with h_1
freq, H1S4 = signal.welch(h1s4, nperseg=10000, fs=1000)
plt.plot(freq, H1S4, label=r"$h_1 * s_4$")
# h2s4 = h_2 * s_4
freq, H2S4 = signal.welch(h2s4, nperseg=10000, fs=1000)
plt.plot(freq, H2S4, label=r"$h_2 * s_4$")
plt.xlim(0, 20)
plt.legend()
plt.xlabel("Frequency (Hz)");

**2.5A** What do you notice about the spectra? Do they seem noisy? 

**2.5B** Use a 1000-s input signal to to get a better estimate of the spectrum.

You can copy the code in the cell above to calculate and plot the spectra, but you'll need to write the code for generating the longer signals.

**2.5C.** Now that we have a nice plot, describe the following:

- the shape of the spectrum for the input signal
- the shape of the spectra for the two convolutions

### Correlation

Is it possible to estimate the impulse response function from the output of an LTI system when then input is *not* an impulse?

Yes! The opposite operation of convolution is called **correlation** or **cross-correlation**

$$(a \star b)(t) = \sum_i a(t + \tau_i) b(t)$$

Notice how similar the definition is to that of convolution. However, there is a critical sign difference: in convolution the "sliding" function is inverted in time, in correlation it is not.

<img src="https://gracula.psyc.virginia.edu/public/courseware/comp-neurosci/images/l6_correlation_static.png" alt="correlation_diagram" style="width: 300px;"/>


**2.6** Compute and plot the correlation between $s_4(t)$ and $(s_4 * h_1)(t)$, and between $s_4(t)$ and $(s_4 * h_2)(t)$.

Use `np.correlate` with the argument `mode="same"`. Note that although *we* know $h_1$ and $h_2$ are causal, the correlation function does not. The output of `np.correlate` therefore contains both the causal ($t \geq 0$) and acausal ($t < 0$) components.

Try using 10 seconds of data first and then 100.

How do the outputs of the correlation compare to the original $h_1$ and $h_2$ kernels?

What happens if you reverse the order of the arguments to `correlate`?