# Final Exam (18 pts)

Consider the following task from [Steinmetz et al. (2019)](https://www.nature.com/articles/s41586-019-1787-x): For a repeated series of trials, a mouse is trained to rotate a wheel to indicate whether it perceives a Gabor pattern to the left or right. Spike rates from many cortical neurons are recorded on each trial. The goal is to build a model that can predict the mouse's choice based on the spiking of its cortical neurons.

![](images/gabor.png)

The data:

* `choices`: mouse chooses whether a Gabor stimulus is to the left or right (0 or 1) on each of 276 trials 
* `spikes`: normalized spike rates for each of 691 neurons across the cortex recorded with Neuropixel probes on each trial

In [None]:
import numpy as np

# Data are in the same GitHub folder as this exam notebook.
# These files are also exactly the same as those used for the lecture examples.
spikes = np.load('mouse_cortical_spiking.npy')
choices = np.load('mouse_left_right_choices.npy')

# If you have any issues loading the .npy files above,
# you can try loading the data as text files instead.
# spikes = np.loadtxt('mouse_cortical_spiking.txt')
# choices = np.loadtxt('mouse_left_right_choices.txt')

spikes.shape, choices.shape

---
1. (3 pts) Split the data into training and testing sets. Make sure to shuffle the data when splitting. Place 20% of the data in the test set.

In [None]:
from sklearn.model_selection import train_test_split

...

---
2. (3 pts) Use stratified 10-fold cross validation to select the amount of L2 regularization to include in a logistic regression model for predicting the mouse's choice based on its neural activity. Use accuracy as the scoring metric for ranking models and only use your training dataset for this hyperparameter tuning.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import StratifiedKFold

model = LogisticRegression(penalty='l2', max_iter=1000)

param_grid = {'C': np.logspace(-3, 3, 60)}

# search for the best L2 regularization hyperparameter C

...

# model with tuned L2 regularization hyperparameter C
model = ...
model

---
3. (3 pts) Evaluate the selected model's accuracy on the witheld test dataset.

In [None]:
from sklearn.metrics import accuracy_score

...

---
4. (3 pts) Find the indices of the ten neurons that contribute the most and also the 10 neurons that contribute the least to the model's predictions. *Hint:* The contribution of each neuron is the amplitude (positive or negative) of its associated weight in the model. *Hint:* See `np.argsort`.

In [None]:
# this reduces the model weights to a one-dimensional array
weights = np.squeeze(model.coef_)

weights.shape

In [None]:
...

top10 = ...
bot10 = ...

print(f"Indices of neurons contributing most: {top10}")
print(f"Indices of neurons contributing least: {bot10}")

---
5. (3 pts) Generate two new logistic regression model's for predicting the mouse's choice based on the neural activity in ONLY the ten neurons contributing either the most or the least to the prior model's predictions as identified above. For each of these model's you will need to retune the  L2 regularization penalty using the same strategy as in question #2 above (make sure you only use your training dataset). Then evaluate the accuracy of both models on the witheld test dataset and compare to the accuracy of the model trained on all 691 neurons.

---
6. (3 pts) **I hope that you got a lot out of this course, and that it has provided you with a basic set of tools and understanding that will be a starting point enabling you to dive into the analsis of many different types of data.** Please tell me at least one thing you liked about the course and also one aspect that needs improving. Suggestions welcome!