# Dimensionality Reduction

When working with neuroscientific data, an essential first step before modeling or analysis is **preprocessing**. This includes preparing the data in a way that allows us to extract meaningful patterns while reducing noise and complexity.

One major aspect of preprocessing is **dimensionality reduction**. Neural data is often high-dimensional and complex. Dimensionality reduction techniques help simplify this data by projecting it into a lower-dimensional space, making it easier to visualize, analyze, and interpret.

There are many techniques available, each with different assumptions and use cases. Choosing the right one depends on your **research question**, your **objectives**, and the **nature of your data**.

In this tutorial, we will introduce a few widely used dimensionality reduction methods. These include both linear and non-linear approaches:

### Techniques we will cover

**Linear method:**
- Principal Component Analysis (PCA)

**Non-linear methods:**
- t-SNE (t-distributed Stochastic Neighbor Embedding)
- UMAP (Uniform Manifold Approximation and Projection)

**Non-linear, learning-based method:**
- Variational Autoencoders (VAEs)

### What you will learn

- The basics of Principal Component Analysis (PCA)
- How to apply PCA for dimensionality reduction
- How to reduce dimensionality using t-SNE and UMAP
- How to implement a simple Variational Autoencoder (VAE)
- How to apply these techniques to real neural data recorded from the prefrontal cortex of a monkey during a working memory task
- How to decode the memorized cue from the monkey's brain activity using reduced data

A short presentation will give you an overview of the topic. If you're interested in diving deeper, explore the following resources:

### Further resources

**Detailed tutorial**  
Neuromatch Academy:  
[https://compneuro.neuromatch.io/tutorials/W1D4_DimensionalityReduction/student/W1D4_Intro.html](https://compneuro.neuromatch.io/tutorials/W1D4_DimensionalityReduction/student/W1D4_Intro.html)

**Mathematical foundations of PCA**  
Covariance matrices and eigenvalues (YouTube):  
[https://youtube.com/watch?v=-f6T9--oM0E](https://youtube.com/watch?v=-f6T9--oM0E)

**Intuition for PCA**  
[https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c](https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c)

**Intuition for t-SNE**  
[https://towardsdatascience.com/t-sne-machine-learning-algorithm-a-great-tool-for-dimensionality-reduction-in-python-ec01552f1a1e](https://towardsdatascience.com/t-sne-machine-learning-algorithm-a-great-tool-for-dimensionality-reduction-in-python-ec01552f1a1e)

**UMAP overview**  
[https://towardsdatascience.com/umap-dimensionality-reduction-an-incredibly-robust-machine-learning-algorithm-b5acb01de568](https://towardsdatascience.com/umap-dimensionality-reduction-an-incredibly-robust-machine-learning-algorithm-b5acb01de568)

**Understanding VAEs**  
[https://towardsdatascience.com/variational-bayes-4abdd9eb5c12](https://towardsdatascience.com/variational-bayes-4abdd9eb5c12)


## Principal Component Analysis (PCA)

We begin with one of the most commonly used dimensionality reduction techniques: Principal Component Analysis (PCA). PCA is a linear method that finds the directions (principal components) along which the variance of the data is maximized.

PCA is useful because it simplifies the data by projecting it onto a smaller number of dimensions while preserving as much variance as possible. This often makes it easier to visualize, analyze, and interpret complex neural data.

### Loading and Inspecting the Data

We will use neural data recorded from the prefrontal cortex of a monkey performing a working memory task.

The task structure is as follows: a cue is presented on the screen for 0.5 seconds, followed by a 3-second delay period—a relatively long delay compared to many primate neurophysiology studies. The task involved eight stimuli representing angular locations (0º, 45º, 90º, 135º, 180º, 225º, 270º, and 315º).

Only trials in which the monkeys performed correctly were included in the analysis.

The data is organized in a dictionary, with neural activity arranged in a matrix where each row corresponds to a trial and each column corresponds to a neuron. Each value represents the neuron's average activity during a defined time window within the trial.

Make sure, the data we are working with is in the correct dictionary: ...

For detailed information on the dataset and experimental design, refer to the original paper:  
[Murray et al., 2017, PNAS](https://raw.githubusercontent.com/wimmerlab/MBC_data_analysis/main/A7_DimensionalityReduction/Murray_PNAS_2017.pdf)

the main goal of this analysis is to test whether the population of prefrontal cortex neurons maintains a stable, low-dimensional, representation of the cue during the delay period.
We will reproduce some of the key figures from the Murray et al. 2017 paper, with some simplifications. - ...



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

#load the preprocessed dataset
data = np.load('monkey_prefrontal_data.npy')  #shape: (n_trials, n_neurons)
labels = np.load('monkey_cue_labels.npy')     #categorical labels: cue location, etc.

print(f"Data shape: {data.shape}")

## Applying PCA
We will reduce the data dimensionality by projecting it onto the first two principal components. This will allow us to visualize the trials in a two-dimensional space.

## Applying PCA
We will reduce the data dimensionality by projecting it onto the first two principal components. This will allow us to visualize the trials in a two-dimensional space.

In [None]:
#apply PCA -  this is the only code you need to do all the steps of PCA in once! Implementiation is really easy, but make sure you carefully think about why you apply, in terms of your data and research objectives, and what nr of components to choose!
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data)

#plot
plt.figure(figsize=(8, 6))
scatter = plt.scatter(data_pca[:, 0], data_pca[:, 1], c=labels, cmap='tab10', alpha=0.8)
plt.xlabel('PC 1')
plt.ylabel('PC 2')
plt.title('Neural data projected onto first two principal components')
plt.colorbar(scatter, label='Cue condition')
plt.grid(True)
plt.show()

What does this plot show? Try playing around witrh the number of components. What does it do, why?