# This week: PCA! 

Last week, we ran into a fundamental difference between encoding models and pRF-style fits: that with encoding models, you often end up with a great many beta weights for many potentially correlated model features. This week, we will discuss one approach to interpreting many many beta weights, i.e. doing dimensionality reduction with PCA.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import sys # library for changing system-level settings
import os # library for navigating the operating system, particularly useful for file paths
import json # library to load and save dictionary-like files 
            # NOTE: good for transfer of struct arrays / dicts from matlab->python or vice versa
import h5py # For loading example fMRI data from saved files

# pycortex
import cortex as cx
# PCA function
from sklearn.decomposition import PCA
# Z-score function
from scipy.stats import zscore
# Local utility functions
sys.path.append(os.path.abspath('..'))
import utils

%matplotlib inline

PCA finds a set of dimension (basis vectors) onto which data can be projected. So, first, let's play with the linear algebra concept of *projection*.

# Projection
First: Projection is multiplication. You should not be afraid of the term. It's another concept with a simple 2D analogy that you have to kind of stretch your brain to think about in multiple dimensions. Here, we will demonstrate projection of one vector onto another in 2D. 

First, we'll define two vectors, a and b.

In [None]:
# two 2D vectors
a = np.array([2, 7])
b = np.array([3, 4])
# Plot them!
plt.plot([0, a[0]], [0, a[1]])
plt.plot([0, b[0]], [0, b[1]])
# Label them!
plt.text(a[0], a[1], 'a', fontsize=16)
plt.text(b[0], b[1], 'b', fontsize=16)
# Equal axes make all the following plots nicer
plt.axis('equal');

One preliminary concept is, how long is `a`? and how long is `b`? 

In 2D, this is easy - back to Pythagoras:

### $a^2 + b^2 = c^2$

...so:

### $c = \sqrt{a^2 + b^2}$

This holds in many dimensions, too - so the length of a vector is always:

### $ \displaystyle{L = \sqrt{ \Sigma_{i=0}^nx_i^2}}$

and sure enough, there is a numpy function for this: `np.linalg.norm` computes the length of a vector, or the *vector norm*

In [None]:
length_a = np.linalg.norm(a)
length_b = np.linalg.norm(b)
print(length_a)
print(length_b)

In [None]:
# compute projection of b onto a:
proj_b_to_a = np.sum(b * a) / length_a
print(proj_b_to_a)

Another way to do this is to do a matrix multiplication between the two. For that to work, each has to be 2D.

For this simple demo, we will stick with the variables above, but the matrix form of this (`<array>.T.dot(<array>)`) will be useful when we want to do projections of more vectors in more dimensions.

In [None]:
# Reshape a and b to have a (one-unit long) second dimension, so we can transpose them
a_2d = a.reshape(2, 1)
b_2d = b.reshape(2, 1)
# Matrix multiply & divide by norm of a
proj_b_to_a_fancy = a_2d.T.dot(b_2d) / length_a
# Same answer as above...
print(proj_b_to_a_fancy)

In [None]:
# Plot the vectors again
plt.plot([0, a[0]], [0, a[1]])
plt.plot([0, b[0]], [0, b[1]])
# Label them!
plt.text(a[0], a[1], 'a', fontsize=16)
plt.text(b[0], b[1], 'b', fontsize=16)
# And plot the point at which one projects onto the other
frac_of_a = proj_b_to_a / length_a
c1 = a[0]*frac_of_a
c2 = a[1]*frac_of_a
plt.plot([0, c1], [0, c2], 'r.--')
plt.plot([c1, b[0]], [c2, b[1]], 'r.--')
plt.axis('equal');

You can think of the projection as the "shadow" that b casts on a (when the sun is perpendicular to a)

> Vary the locations of a and b, and see what happens to the projection. As an exercise in plotting, see what happens when you remove the plt.axis('equal') from the cell below

In [None]:
# CHANGE a and b!
a = np.array([2, 7])
b = np.array([3, 4])

# Plot the vectors
plt.plot([0, a[0]], [0, a[1]])
plt.plot([0, b[0]], [0, b[1]])
# Label the vectors
plt.text(a[0], a[1], 'a', fontsize=16)
plt.text(b[0], b[1], 'b', fontsize=16)
# Compute projection
proj_b_to_a = np.sum(b * a) / np.linalg.norm(a)
# And plot the point at which one projects onto the other
frac_of_a = proj_b_to_a / length_a
c1 = a[0]*frac_of_a
c2 = a[1]*frac_of_a
plt.plot([0, c1], [0, c2], 'r.--')
plt.plot([c1, b[0]], [c2, b[1]], 'r.--')
plt.axis('equal');

> Is the projection of b onto a the same as the distance from b to a? Is it the same as the angle between b and a? 

# Data creation 
if you're curious how to create arrays with particular covariance structure

In [None]:
if False:
    # This has already been done, don't overwrite saved data
    n_voxels = 335
    n_features = 44
    u, s, vt = np.linalg.svd(np.random.randn(n_voxels, n_features))

    n = 10
    x = np.linspace(0, 1, n_features)
    w = np.exp(-x**2/ (2*0.05**2))
    plt.plot(x[:n], w[:n], '.-')

    m, n = n_voxels, n_features
    Sd = np.diag(s*w)
    Sd = np.pad(Sd,[(0,m-n),(0,0)],mode='constant')
    y = u.dot(Sd).dot(vt.T)

    plt.imshow(y, aspect='auto')
    plt.xlabel("Features")
    plt.ylabel("Voxels")

    np.save('pca_data.npy', y)

# Doing PCA
... with `sklearn` again! The implemetation of this is super simple:

In [None]:
# Load data
y = np.load('pca_data.npy')

In [None]:
# Create a PCA fitting object from the PCA object in sklearn (imported above)
pca_fake = PCA(whiten=True)
# Fit the PCA algorithm to the data!
pca_fake.fit(y);
# (technically, doing this on y and the covariance matrix of y is approximately the same thing...)

In [None]:
# Components quantify covariance of FEATURES across VOXELS.
print(pca_fake.components_.shape)
# Each ROW of this is a component!

In [None]:
# To see how much variance each component explains, make a scree plot using the explained_variance_ratio_ field
# (as with all objects in sklearn, properties of the fit object with "_" at the end are estimated quantities)
plt.plot(pca_fake.explained_variance_ratio_[:10], 's-')

To show what PCA is doing, let's have a look at the covariance matrix

In [None]:
# Compute the covariance of y across voxels (so the end result is features x features)
y_demean = y - y.mean(0) # subtract off the mean of each column (axis=0 -> mean over columns)
ycov = y_demean.T.dot(y_demean) # dot product
ycov /= (len(y)-1) # normalize by number of elements - 1

In [None]:
# Show what the covariance across voxels
plt.imshow(ycov)
plt.colorbar()

Next is an example of how to reconstruct the features x features covariance matrix using only ONE principal component.

> Reconstruct the covariance matrix with > 1 principal component! See how close the the result ends up looking to the real covariance matrix above. 

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
for nc, ax in enumerate(axs, 1):
    pca_cov = (pca_fake.explained_variance_[:nc, np.newaxis] * pca_fake.components_[:nc]).T.dot(pca_fake.components_[:nc])
    im = ax.imshow(pca_cov, vmin=-0.25, vmax=0.35)
    ax.set_title('Reconstructed with\n%d components'%nc)
#fig.colorbar(im)

In [None]:
# Compute how much variance this PC explains (this is approximate...)
1 - np.var(ycov-pca_cov) / np.var(ycov, ddof=1)

# With real fMRI data

In [None]:
# Load Y variables (fMRI data for 1260 estimation images, 126 validation images)
with h5py.File('/unrshare/LESCROARTSHARE/IntroToEncodingModels/s01_color_natims_data.hdf') as hf:
    Y_est = hf['est'].value
    Y_val = hf['val'].value
    mask = hf['mask'].value   
    
# Load X variable (Semantic category features)
with h5py.File('/unrshare/LESCROARTSHARE/IntroToEncodingModels/color_natims_features_19cat.hdf') as hf:
    X_est = hf['est'].value
    X_val = hf['val'].value

print(Y_est.shape)
print(X_est.shape)
print(Y_val.shape)
print(X_val.shape)
print(mask.shape)

In [None]:
# Compute regression to estimate weights
B = utils.ols(X_est, Y_est)
print(B.shape)

In [None]:
# Estimate predictions
Y_hat = X_val.dot(B)
# Compute prediction accuracy (correlation btw Y_val and Y_hat)
r = utils.column_corr(Y_val, Y_hat)

In [None]:
# Get rid of NaNs (histograms don't like nans)
r_nonans = r[~np.isnan(r)] # (this creates a logical index (~np.isnan(r)) and indexes r with it)
plt.hist(r_nonans, bins=100)
plt.xlabel("Prediction accuracy (r)")
plt.ylabel("Voxels (count)")
_ = plt.annotate('Check out\nthis tail!', (0.5, 200))

### Where did we predict well? 
i.e., where are those voxels in the tail of the distribution for which we have decent predictions?

In [None]:
# Create volume for correlation coefficient, visualize across the brain!
subject = 's01'
transform = 'color_natims'
Vr = cx.Volume(r, subject, transform, mask=mask, cmap='inferno', vmin=0, vmax=0.8)
fig = cx.quickflat.make_figure(Vr, with_curvature=True, with_sulci=True)

Yay, we can predict semantic-y / visual category-y areas! Let's do PCA across only those.

## Voxel selection
For PCA, we want to perform some voxel selection - let's pick only the voxels that we can predict better than r = 0.2

> DO IT.

In [None]:
# Answer

In [None]:
# Answer
# select only voxels with prediction accuracy greater than 0.2
good_voxels = r > 0.2 # Create a logical index for voxel dimension
B_forpca = B[:, good_voxels] # apply it to the voxel dimension of B
print(B_forpca.shape) # End up with far fewer voxels!

## Fit PCA for voxels

In [None]:
# Create PCA object
voxel_pca = PCA(n_components=3)
# Transpose Beta weights to compute PCA across features, not across voxels
# (PCA will collapse across the first dimension of the array - here, we want that to be voxels, 
# in order to find common patterns of feature weights across voxels)
voxel_pca.fit(B_forpca.T)

## Visualize the PC across semantic categories
NOTE: we didn't do this in class, but this is useful!

In [None]:
# Load the names for each semantic feature
sem_feat_names = json.load(open('/unrshare/LESCROARTSHARE/IntroToEncodingModels/color_natims_features_19cat.json'))

In [None]:
# Plot the first PC, sorted by magnitude
fig, ax = plt.subplots(figsize=(12,3))
# Select the first PC
pc1 = voxel_pca.components_[0]
# Get an index for which values in the PC are smallest -> largest
pc1_idx = np.argsort(pc1)
# Plot the PC, sorted by this index
plt.plot(pc1[pc1_idx])
# Set the ticks to the feature names, sorted by the same index
plt.xticks(np.arange(19))
ax.set_xticklabels(np.array(sem_feat_names)[pc1_idx], rotation=90, fontsize=16);
# Add zero line, for reference
plt.axhline(linestyle='--', color='k', lw=0.75)

## Project all voxel weights onto the first PC

In [None]:
voxel_pca.components_.shape

In [None]:
B.shape

In [None]:
# Get rid of NaNs (results of dividing by zero for some voxels, which mess things up)
B_nonan = np.nan_to_num(B)
Bt = voxel_pca.transform(B_nonan.T)
# NOTE: this is equivalent to doing this:
B_demean = B_nonan - voxel_pca.mean_[:, np.newaxis] # extra step of removing mean of voxel weights 
Bt_alt = voxel_pca.components_.dot(B_demean)
print(np.allclose(Bt.T, Bt_alt))

In [None]:
# Show projections of all voxels onto first PC
Vpc1 = cx.Volume(Bt[:,0], subject, transform, cmap='BuWtRd', vmin=-0.4, vmax=0.4)
fig = cx.quickflat.make_figure(Vpc1, with_curvature=True)

In [None]:
# pycortex advertisement (slightly fancier plot w/ voxels w/ poor predictions alpha-d out)
Vpc1a = cx.Volume2D(Bt[:,0], r, subject, transform, cmap='BuBkRd_alpha_2D', vmin=-0.4, vmax=0.4,
                   vmin2=0.2, vmax2=0.8)
fig = cx.quickflat.make_figure(Vpc1a, with_curvature=True, with_colorbar=False)

This is approximately the plot Figure 8 (for S2) from Naselaris et al 2012; it likely differes slightly due to different voxel selection (and colormap choice). 