# Discussion Week 5

In this discussion we review eigenvalues and the SVD. 

You can use the Shared Computing Cluster (SCC) or Google Colab to run this notebook.

The general instructions for running on the SCC are available under General Resources on [Piazza](https://piazza.com/bu/fall2025/ds722/resources).

# Problem: Computing Eigenvalues and Eigenvectors

Compute by hand the eigenvalues and eigenvectors of the matrix:

$$
A = 
\begin{bmatrix}
0 & -1 \\
1 & 0 \\
\end{bmatrix}
$$

Verify your calculations numerically using the `numpy.linalg.eig`. function. The documentation for this function can be found [here](https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html).

Recall the step-by-step instructions for computing the eigenvalues of a $2\times 2$ matrix

$$
A = 
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}.
$$

### Step 1: Compute the Characteristic Polynomial

Find eigenvalues $\lambda $ by solving the characteristic equation:

$$
\det(A - \lambda I) = 0.
$$

This becomes:

$$
\begin{vmatrix}
a - \lambda & b \\
c & d - \lambda
\end{vmatrix}
= (a - \lambda)(d - \lambda) - bc = 0.
$$

Simplify to get a quadratic equation:

$$
\lambda^2 - (a + d)\lambda + (ad - bc) = 0.
$$

### Step 2: Solve for Eigenvalues

Solve the quadratic equation for $\lambda$:

$$
\lambda = \frac{(a + d) \pm \sqrt{(a + d)^2 - 4(ad - bc)}}{2}.
$$

These values are your **eigenvalues**.


### Step 3: Find Eigenvectors

For each eigenvalue $\lambda$, solve:

$$
(A - \lambda I)v= 0.
$$

This produces a system of linear equations. Solve to find the non-zero vector $v$ that satisfies the equation — the corresponding eigenvector.

In [None]:
#TODO Eigenvalues and Eigenvectors

## Problem: PCA vs SVD on the Iris Dataset

In this exercise, you'll perform Principal Component Analysis (PCA) on the classic Iris dataset and compare the results with standard SVD.

### Step 1: Load and Preprocess the Data

- Load the Iris dataset using `sklearn.datasets.load_iris`.
- Extract the feature matrix $A \in \mathbb{R}^{150 \times 4}$.
- Center the data by subtracting the mean of each feature.

### Step 2: Perform PCA

PCA is theoretically explained by computing the eigenvalues and eigenvectors of the covariance matrix of the data. The eigenvectors correspond to the principal components, and the eigenvalues indicate the amount of variance captured by each principal component.

- Perform PCA using `sklearn.decomposition.PCA` from scikit-learn with `n_components=2`. The documentation for PCA can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html).
- The PCA scores are the coordinates of your data in the feature space. Print the first 10 PCA scores. 
- Print the explained variance using the `explained_variance_` attribute of the PCA object.

### Step 3: Perform SVD

- Apply `numpy.linalg.svd` directly to the centered data matrix $A$.
- Use the first two columns of $V$ (right singular vectors) to project the data, i.e., compute the projection as $AV_{[:, :2]}$. Print the first 10 entries of the projected data. Compare the SVD-based projection with the PCA-based projection.
- Compute the explained variance from the singular values. This is done by computing $\sigma_{i}^{2}$ and dividing by (n-1), where n is the number of features. The output should be an array of length 4. Print the first two entries of the explained variance using the singular values returned by the SVD.

### Step 4: Visualization

- Plot the PCA and SVD projections side by side.
- Color the points by their species labels.
- Be aware that the figures may differ by a rotation or reflection, which is expected since PCA and SVD can yield results that are equivalent up to such transformations.


### Test your Understanding

1. Do the PCA scores and the SVD projections look similar?
1. What is the relationship between the PCA scores and the factors $A$ and $V$ in the SVD?
1. What do the singular values represent in the context of PCA?
1. How are the eigenvalues and eigenvectors of the covariance matrix related to the SVD of the centered data matrix?


In [None]:
#TODO Load and preprocess the data

In [None]:
#TODO Perform PCA

In [None]:
#TODO SVD Projection

In [None]:
#TODO visualization