# SYDE 556/750 --- Assignment 1
**Student ID: 20823934**

*Note:* Please include your numerical student ID only, do *not* include your name.

*Note:* Refer to the [PDF](https://github.com/celiasmith/syde556-f22/raw/master/assignments/assignment_01/syde556_assignment_01.pdf) for the full instructions (including some hints), this notebook contains abbreviated instructions only. Cells you need to fill out are marked with a "writing hand" symbol. Of course, you can add new cells in between the instructions, but please leave the instructions intact to facilitate marking.

In [None]:
# Import numpy and matplotlib -- you shouldn't need any other libraries
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize # For question 2.1b)

# Fix the numpy random seed for reproducible results
np.random.seed(18945)

# Some formating options
%config InlineBackend.figure_formats = ['svg']

# 1. Representation of Scalars

## 1.1 Basic encoding and decoding

**a) Computing gain and bias.** In general, for a neuron model $a = G[J]$ (and assuming that the inverse $J = G^{-1}[a]$ exists), solve the following system of equations to compute the gain $\alpha$, and the bias $J^\mathrm{bias}$ given a maximum rate $a^\mathrm{max}$ and an $x$-intercept $\xi$.

$$a^\mathrm{max} = G[\alpha + J^\mathrm{bias}] \,, \quad\quad 0 = G[\alpha \xi + J^\mathrm{bias}] \,.$$

✍ \<YOUR SOLUTION HERE\>

$$a^\mathrm{max} = G[\alpha + J^\mathrm{bias}] \,, \quad\quad 0 = G[\alpha \xi + J^\mathrm{bias}] \implies$$

Take the inverse of both sides of the equations:

$$G^{-1}[a^\mathrm{max}] = \alpha + J^\mathrm{bias} \,, \quad\quad G^{-1}[0] = \alpha \xi + J^\mathrm{bias} \implies$$

Shift the terms so $J^\mathrm{bias}$ is on the left side of both equations. Now the right sides equal and $J^\mathrm{bias}$ can be ignored:

$$J^\mathrm{bias} = G^{-1}[a^\mathrm{max}] - \alpha \,, \quad\quad J^\mathrm{bias} = G^{-1}[0] - \alpha \xi \implies$$

$$G^{-1}[a^\mathrm{max}] - \alpha = G^{-1}[0] - \alpha \xi \implies$$

Shift the terms so $G$ is on the left and $\alpha$ is on the right:

$$G^{-1}[a^\mathrm{max}] - G^{-1}[0] = \alpha - \alpha \xi = \alpha (1 - \xi) \implies$$

Now isolate $\alpha$ by dividing both sides by $(1 - \xi)$:

$$\therefore \alpha = \frac{G^{-1}[a^\mathrm{max}] - G^{-1}[0]}{1 - \xi}$$

Plug the result of $\alpha$ into a previous equation to find $J^\mathrm{bias}$:

$$J^\mathrm{bias} = G^{-1}[a^\mathrm{max}] - \frac{G^{-1}[a^\mathrm{max}] - G^{-1}[0]}{1 - \xi} \implies$$

$$J^\mathrm{bias} = \frac{G^{-1}[a^\mathrm{max}](1 - \xi) - G^{-1}[a^\mathrm{max}] + G^{-1}[0]}{1 - \xi} \implies$$

$$J^\mathrm{bias} = \frac{G^{-1}[a^\mathrm{max}] - \xi G^{-1}[a^\mathrm{max}] - G^{-1}[a^\mathrm{max}] + G^{-1}[0]}{1 - \xi} \implies$$

$$J^\mathrm{bias} = \frac{-\xi G^{-1}[a^\mathrm{max}] + G^{-1}[0]}{1 - \xi} \implies$$

Flip the negative signs:

$$\therefore J^\mathrm{bias} = \frac{\xi G^{-1}[a^\mathrm{max}] - G^{-1}[0]}{\xi - 1}$$


Now, simplify these equations for the specific case $G[J] = \max(J, 0)$.

✍ \<YOUR SOLUTION HERE\>

For the ReLU function $G[J] = \max(J, 0)$, the inverse only exists for $J \ge 0$. In this region, the inverse of ReLU is also ReLU: $G^{-1}[J] = G[J] = max(J, 0)$. In addition, we substitute $J_{th}$ in place of $G^{-1}[0]$:

$$\alpha = \frac{max(a^\mathrm{max}, 0) - J_{th}}{1 - \xi}$$

$$J^\mathrm{bias} = \frac{\xi max(a^\mathrm{max}, 0) - J_{th}}{\xi - 1}$$

We can assume that $J_{th} = 0$ because the only input that results in a ReLU of 0 in the invertible region $J \ge 0$ is 0:

$$\alpha = \frac{max(a^\mathrm{max}, 0)}{1 - \xi}$$

$$J^\mathrm{bias} = \frac{\xi max(a^\mathrm{max}, 0)}{\xi - 1}$$

**b) Neuron tuning curves.** Plot the neuron tuning curves $a_i(x)$ for 16 randomly generated neurons following the intercept and maximum rate distributions described above.

In [None]:
# ✍ <YOUR SOLUTION HERE>

""" Sampling the random variables """

# Set a reliable seed
np.random.seed(30)

# Sample the x-axis at a resolution of 0.05
x = np.linspace(-1, 1, 41)

# Sample firing rates a^max uniformly between 100Hz and 200Hz at x=1
a_max = np.random.uniform(low=100, high=200, size=16)

# Sample x-intercepts \xi uniformly between -0.95 and 0.95
xi = np.random.uniform(low=-0.95, high=0.95, size=16)

# Randomly set encoder e to either +1 or -1 for each neuron
e = np.random.choice([-1, 1], size=16)

""" Computing alpha and J^bias """

alpha = (np.maximum(a_max, 0)) / (1 - xi)
J_bias = (xi * np.maximum(a_max, 0)) / (xi - 1)

""" Calculating the tuning curves """

A = []
for i in range(16):
    a = [np.maximum(e[i] * alpha[i] * x_n + J_bias[i], 0) for x_n in x]
    A.append(a)
A = np.array(A)

""" Plotting the curves """

def plot_A(x, A):
    for i in range(A.shape[0]):
        plt.plot(x, A[i])

    plt.title(f"{A.shape[0]} Tuning Curves")
    plt.xlabel("Represented Value x")
    plt.ylabel("Firing Rate (Hz)")
    plt.show()

plot_A(x, A)

**c) Computing identity decoders.** Compute the optimal identity decoder $\vec d$ for those 16 neurons (as shown in class). Report the value of the individual decoder coefficients. Compute $d$ using the matrix notation mentioned in the course notes. Do not apply any regularization. $A$ is the matrix of activities (the same data used to generate the plot in 1.1b).

In [None]:
# ✍ <YOUR SOLUTION HERE>

# Decoding via matrix notation
D = A @ x @ np.linalg.inv(A @ A.T)
D

# Decoding using the Python code from lecture notes
# D = np.linalg.lstsq(A.T, x.T, rcond=None)[0].T
# D

**d) Evaluating decoding errors.** Compute and plot $\hat{x}=\sum_i d_i a_i(x)$. Overlay on the plot the line $y=x$. Make a separate plot of $x-\hat{x}$ to see what the error looks like. Report the Root Mean Squared Error (RMSE) value.

In [None]:
# ✍ <YOUR SOLUTION HERE>

def plot_decoder_error(x, D, A):
    # Compute the decoded value
    x_hat = D @ A

    # Report the RMSE
    rmse = np.sqrt(np.mean((x_hat - x)**2))
    print(f"RMSE: {rmse}")

    # Make two plots side by side
    fig, axs = plt.subplots(1, 2, figsize=(9, 4))

    """ Compare x_hat to x """

    axs[0].plot(x, x, label="Ideal")
    axs[0].plot(x, x_hat, label="Decoded")
    axs[0].legend()
    axs[0].set_title("Ideal and Decoded Value")
    axs[0].set_xlabel("Represented Value x")
    axs[0].set_ylabel("Decoded Value x_hat")

    """ Plot the error """

    axs[1].plot(x, x_hat - x)
    axs[1].set_title(f"Error (RMSE = {round(rmse, 3)})")
    axs[1].set_xlabel("Represented Value x")
    axs[1].set_ylabel("Decoder Error")

    plt.show()

plot_decoder_error(x, D, A)

**e) Decoding under noise.** Now try decoding under noise. Add random normally distributed noise to $a$ and decode again. The noise is a random variable with mean $\mu=0$ and standard deviation of $\sigma=0.2 \max(A)$ (where $\max(A)$ is the maximum firing rate of all the neurons). Resample this variable for every different $x$ value for every different neuron. Create all the same plots as in part d). Report the RMSE.

In [None]:
# ✍ <YOUR SOLUTION HERE>

""" Adding noise to A """

# Set a reliable seed
np.random.seed(100)

# Generating the noise
mu = 0
sigma = 0.2 * np.max(A)
noise = np.random.normal(mu, sigma, A.shape)

# Add the noise but don't allow negative firing rates
A_noisy = np.maximum(A + noise, 0)
plot_A(x, A_noisy)

""" Decode using the noisy tuning curves """

# Plot the error
plot_decoder_error(x, D, A_noisy)

**f) Accounting for decoder noise.** Recompute the decoder $\vec d$ taking noise into account (i.e., apply the appropriate regularization, as shown in class). Show how these decoders behave when decoding both with and without noise added to $a$ by making the same plots as in d) and e). Report the RMSE for all cases.

In [None]:
# ✍ <YOUR SOLUTION HERE>

# Calculate the parameters
n = A_noisy.shape[0]  # Number of tuning curves a
N = A_noisy.shape[1]  # Number of samples x

# Decoding + regularization via matrix notation
D_reg = A @ x @ np.linalg.inv(A @ A.T + N * np.square(sigma) * np.eye(n))

# Decoding + regularization using the Python code from lecture notes
# D_reg = np.linalg.lstsq(A @ A.T + N * np.square(sigma) * np.eye(n), A @ x.T, rcond=None)[0].T

print("Regularized D, noisy A")
plot_decoder_error(x, D_reg, A_noisy)

print("Regularized D, noise-less A")
plot_decoder_error(x, D_reg, A)

**g) Interpretation.** Show a 2x2 table of the four RMSE values reported in parts d), e), and f). This should show the effects of adding noise and whether the decoders $d$ are computed taking noise into account. Write a few sentences commenting on what the table shows, i.e., what the effect of adding noise to the activities is with respect to the measured error and why accounting for noise when computing the decoders increases/decreases/does not change the measured RMSE.

✍ \<YOUR SOLUTION HERE\>

Table 1 - RMSE for decoders with and without noise
|          | A        | Noisy A |
|----------|----------|----------|
| **D**               | 0.001   | 0.313   |
| **Regularized D**   | 0.040   | 0.187   |

When noise is added to neural activities A, the RMSE increased by 1-2 orders of magnitude. The unregularized decoder D is very accurate for the noise-less A, but produces a lot of error for noisy A. On the other hand, the regularized D decreased the error for noisy A by half (column 2) but increased the error for the noise-less A (column 1). Overall, regularization is a good thing because the amount of error it reduced is much greater than the amount it introduced.

## 1.2 Exploring sources of error

**a) Exploring error due to distortion and noise.** Plot the error due to distortion $E_\mathrm{dist}$ and the error due to noise $E_\mathrm{noise}$ as a function of $n$, the number of neurons. Generate two different loglog plots (one for each type of error) with $n$ values of at least $[4, 8, 16, 32, 64, 128, 256, 512]$. For each $n$ value, do at least $5$ runs and average the results. For each run, different $\alpha$, $J^\mathrm{bias}$, and $e$ values should be generated for each neuron. Compute $d$ taking noise into account, with $\sigma = 0.1 \max(A)$. Show visually that the errors are proportional to $1/n$ or $1/n^2$.

In [None]:
# ✍ <YOUR SOLUTION HERE>

# Calculate E_dist and E_noise for n neurons (1 run)
def run(n=16, N=41):
    # Sample new random variables
    x = np.linspace(-1, 1, N)                               # Sample the x-axis
    a_max = np.random.uniform(low=100, high=200, size=n)    # Sample firing rates
    xi = np.random.uniform(low=-0.95, high=0.95, size=n)    # Sample x-intercepts
    e = np.random.choice([-1, 1], size=n)                   # Set encoder directions

    # Compute alpha and J^bias
    alpha = (np.maximum(a_max, 0)) / (1 - xi)
    J_bias = (xi * np.maximum(a_max, 0)) / (xi - 1)

    # Calculate tuning curves for each sample
    A = []
    for i in range(n):
        a = [np.maximum(e[i] * alpha[i] * x_i + J_bias[i], 0) for x_i in x]
        A.append(a)
    A = np.array(A)

    # Add noise to A
    sigma = 0.1 * np.max(A)
    noise = np.random.normal(0, sigma, A.shape)
    A_noisy = np.maximum(A + noise, 0)

    # Compute D taking noise into account
    D = A @ x @ np.linalg.inv(A @ A.T + N * np.square(sigma) * np.eye(n))

    # Calculate the errors
    x_hat = D @ A_noisy
    E_dist = 0.5 * np.sum((x - x_hat)**2)
    E_noise = 0.5 * sigma**2 * np.sum(D**2)

    return E_dist, E_noise 

# Try different values of n
ns = [2 ** i for i in range(2, 10)]
avg_E_dist = []
avg_E_noise = []
for n in ns:
    # Average the errors over 5 runs
    E_dists = []
    E_noises = []
    for i in range(5):
        E_dist, E_noise = run(n=n)
        E_dists.append(E_dist)
        E_noises.append(E_noise)
    avg_E_dist.append(np.mean(E_dists))
    avg_E_noise.append(np.mean(E_noises))

# Make two plots side by side
fig, axs = plt.subplots(1, 2, figsize=(9, 4))

# Plot E_dist (proportional to 1/n^2)
n_inv2 = n**-2
axs[0].plot(n, n_inv2, label="1/n^2")
axs[0].plot(n, avg_E_dist, label="Neurons")
axs[0].legend()
axs[0].set_title("Error due to static distortion E_dist")
axs[0].set_xlabel("Number of neurons n")
axs[0].set_ylabel("Square error")

# Plot E_noise (proportional to 1/n)
n_inv = n**-1
axs[1].plot(n, n_inv, label="1/n")
axs[1].plot(n, avg_E_noise, label="Neurons")
axs[1].legend()
axs[1].set_title("Error due to noise E_noise")
axs[1].set_xlabel("Number of neurons n")
axs[1].set_ylabel("Square error")

plt.show()

**b) Adapting the noise level.** Repeat part a) with $\sigma = 0.01 \max(A)$.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**c) Interpretation.** What does the difference between the graphs in a) and b) tell us about the sources of error in neural populations?

✍ \<YOUR SOLUTION HERE\>

## 1.3 Leaky Integrate-and-Fire neurons

**a) Computing gain and bias.** As in the second part of 1.1a), given a maximum firing rate $a^\mathrm{max}$ and a bias $J^\mathrm{bias}$, write down the equations for computing $\alpha$ and the $J^\mathrm{bias}$ for this specific neuron model.

✍ \<YOUR SOLUTION HERE\>

**b) Neuron tuning curves.** Generate the same plot as in 1.1b). Use $\tau_\mathrm{ref}=2 \mathrm{ms}$ and $\tau_{RC}=20 \mathrm{ms}$. Use the same distribution of $x$-intercepts and maximum firing rates as in 1.1.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**c) Impact of noise.** Generate the same four plots as in 1.1f) (adding/not adding noise to $A$, accounting/not accounting for noise when computing $\vec d$), and report the RMSE both with and without noise.

In [None]:
# ✍ <YOUR SOLUTION HERE>

# 2. Reperesentation of Vectors

## 2.1 Vector tuning curves

**a) Plotting 2D tuning curves.** Plot the tuning curve of an LIF neuron whose 2D preferred direction vector is at an angle of $\theta=-\pi/4$, has an $x$-intercept at the origin $(0,0)$, and has a maximum firing rate of $100 \mathrm{Hz}$.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**b) Plotting the 2D tuning curve along the unit circle.** Plot the tuning curve for the same neuron as in a), but only considering the points around the unit circle, i.e., sample the activation for different angles $\theta$. Fit a curve of the form $c_1 \cos(c_2\theta+c_3)+c_4$ to the tuning curve and plot it as well.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**c) Discussion.** What makes a cosine a good choice for the curve fit in 2.1b? Why does it differ from the ideal curve?

✍ \<YOUR SOLUTION HERE\>

## 2.2 Vector representation

**a) Choosing encoding vectors.** Generate a set of $100$ random unit vectors uniformly distributed around the unit circle. These will be the encoders $\vec e$ for $100$ neurons. Plot these vectors with a quiver or line plot (i.e., not just points, but lines/arrows to the points).

In [None]:
# ✍ <YOUR SOLUTION HERE>

**b) Computing the identity decoder.** Use LIF neurons with the same properties as in question 1.3. When computing the decoders, take into account noise with $\sigma = 0.2\max(A)$. Plot the decoders in the same way you plotted the encoders.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**c) Discussion.** How do these decoding vectors compare to the encoding vectors?

✍ \<YOUR SOLUTION HERE\>

**d) Testing the decoder.** Generate 20 random $\vec x$ values throughout the unit circle (i.e.,~with different directions and radiuses). For each $\vec x$ value, determine the neural activity $a_i$ for each of the 100 neurons. Now decode these values (i.e. compute $\hat{x} = D \vec a$) using the decoders from part b). Plot the original and decoded values on the same graph in different colours, and compute the RMSE.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**e) Using encoders as decoders.** Repeat part d) but use the *encoders* as decoders. This is what Georgopoulos used in his original approach to decoding information from populations of neurons. Plot the decoded values and compute the RMSE. In addition, recompute the RMSE in both cases, but ignore the magnitude of the decoded vectors by normalizing before computing the RMSE.

In [None]:
# ✍ <YOUR SOLUTION HERE>

**f) Discussion.** When computing the RMSE on the normalized vectors, using the encoders as decoders should result in a larger, yet still surprisingly small error. Thinking about random unit vectors in high dimensional spaces, why is this the case? What are the relative merits of these two approaches to decoding?

✍ \<YOUR SOLUTION HERE\>