1. Suppose that we express the first feature in millimetres instead. What will be the corresponding sample covariance matrix, Q_{mm, g}?

The corresponding matrix is

Q_{mm,g} = ((8000,440),(440,80))

Considering that the off-diagonal elements contain the covariances for each pair of variables, here millimeters and grams, the covariance changes by a factor of 10 as we change from centimeters to millimeters. In the diagonal, bottom right is the covariance of the gram variable which does not change at all since the distance unit is not part of it. Top left is the covariance for the millimeter variable which now changes by a factor of 100 (10 * 10) as "both" variables in that "pair" change by a factor of 10.

https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/covariance/interpret-the-results/

In [1]:
import numpy as np

In [2]:
Q_cm_g = np.array([[80, 44],[44,80]])
Q_mm_g = np.array([[8000, 440],[440,80]])

2. Largest eigenvalues and corresponding eigenvector

In [15]:
# 2.1 with respect to Q_{cm, g}:

w_cm_g, v_cm_g = np.linalg.eig(Q_cm_g)
print(f"eigenvalues of Q_cm_g..: {w_cm_g}")
print(f"eigenvectors of Q_cm_g.: {v_cm_g}")
print(f"largest eigenvalue lambda_1: {w_cm_g[np.argmax(w_cm_g)]}, eigenvector w_1: {v_cm_g[np.argmax(w_cm_g)]}")
print(f"eigenvalue lambda_2: {w_cm_g[np.argmin(w_cm_g)]}, eigenvalue w_2: {v_cm_g[np.argmin(w_cm_g)]}")

eigenvalues of Q_cm_g..: [124.  36.]
eigenvectors of Q_cm_g.: [[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
largest eigenvalue lambda_1: 124.0, eigenvector w_1: [ 0.70710678 -0.70710678]
eigenvalue lambda_2: 36.0, eigenvalue w_2: [0.70710678 0.70710678]


In [16]:
# 2.2 with respect to Q_{mm, g}:

w_mm_g, v_mm_g = np.linalg.eig(Q_mm_g)
print(f"eigenvalues of Q_mm_g..: {w_mm_g}")
print(f"eigenvectors of Q_mm_g.: {v_mm_g}")
print(f"largest eigenvalue nu_1: {w_mm_g[np.argmax(w_mm_g)]}, eigenvector v_1: {v_mm_g[np.argmax(w_mm_g)]}")
print(f"eigenvalue nu_2: {w_mm_g[np.argmin(w_mm_g)]}, corresponding eigenvalue v_2: {v_mm_g[np.argmin(w_mm_g)]}")

eigenvalues of Q_mm_g..: [8024.36946078   55.63053922]
eigenvectors of Q_mm_g.: [[ 0.99846976 -0.05530039]
 [ 0.05530039  0.99846976]]
largest eigenvalue nu_1: 8024.369460780464, eigenvector v_1: [ 0.99846976 -0.05530039]
eigenvalue nu_2: 55.630539219536686, corresponding eigenvalue v_2: [0.05530039 0.99846976]


3. What is the matrix of weights, W

In [17]:
# 3.1 for the case when the first feature is expressed in centimetres:

# -> W is the matrix of weights whose columns are the eigenvectors of Q, thus:
W_cm_g = v_cm_g.T
W_cm_g


array([[ 0.70710678,  0.70710678],
       [-0.70710678,  0.70710678]])

In [18]:
# 3.2 for the case when the first feature is expressed in millimeters:

# -> W is the matrix of weights whose columns are the eigenvectors of Q, thus:
W_mm_g = v_mm_g.T
W_mm_g

array([[ 0.99846976,  0.05530039],
       [-0.05530039,  0.99846976]])

In [22]:
# W is an orthogonal matrix so the multiplication with its transpose must yield the identity matrix
np.dot(W_mm_g.T, W_mm_g)

array([[1.0000000e+00, 2.9911237e-18],
       [2.9911237e-18, 1.0000000e+00]])

In [27]:
# 4. the weights given to the first column of X' if the first feature is expressed in centimeters:
# the first principal component in Z is the first column vector [[z11],[z21]], which is calculated as
# z11 = x11*w11 + x12*w21
# z21 = x21*w11 + x22*w21
# where X'=[[x11, x12],[x21, x22]] and W=[[w11, w12],[w21, w22]]
# so the weights given to the first column of X', x11 and x21, is w11 only
W_cm_g[:,0][0]

0.7071067811865475

In [28]:
# 4. likewise, the weights given to the first column of X' if the first feature is expressed in millimeters:
# (see above)
# so the weights given to the first column of X', x11 and x21, is w11 only, thus
W_mm_g[:,0][0]

0.9984697628555456

5. Comment on how the change of units affects the first principal component in this particular case and the percentage of total variation accounted for by it.

As we notice above, the weight for the millimeters case is significantly higher than for the centimeter case. This will give the first principal component a higher value (notice, however, the latter is the sum of multiplications between the entire X' (all rows) and the first weight vector (column of W), not just w11) and therefore assign it more variation explainability relevance as opposed to the subsequent principal components. 
We can see that a change of units can have a significant impact when performing PCA on that data and possibly the interpretations that are drawn from it, respectively.

In [33]:
# 6. What would be the sample correlation matrix corresponding to both Q_{cm,g} and Q_{mm,g}?

# calculate the sample correlation matrix given by
# Cor[train] = (Q^(diag))^-1/2 Q (Q^(diag))^-1/2
def sample_corr_matrix(Q):
    Q_diag = np.zeros(Q.shape)
    np.fill_diagonal(Q_diag, np.diagonal(Q))
    Q_diag_sqrt = np.sqrt(Q_diag)
    Q_diag_sqrt_inv = np.linalg.inv(Q_diag_sqrt)
    sample_corr_matrix = np.dot(np.dot(Q_diag_sqrt_inv, Q), Q_diag_sqrt_inv)
    return sample_corr_matrix

Corr_Q_cm_g = sample_corr_matrix(Q_cm_g)
print(f"Corr_Q_cm_g: {Corr_Q_cm_g}")
Corr_Q_mm_g = sample_corr_matrix(Q_mm_g)
print(f"Corr_Q_mm_g: {Corr_Q_mm_g}")
Corr_Q = Corr_Q_mm_g

Corr_Q_cm_g: [[1.   0.55]
 [0.55 1.  ]]
Corr_Q_mm_g: [[1.   0.55]
 [0.55 1.  ]]


We observe that the two sample correlation matrices for the two sample covariance martices are the same.

In [34]:
# 7. What are the largest eigenvalue of the sample correlation matrix, eta_(1), and the corresponding eigenvector, u_(1), 
# and the other eigenvalue, eta_(2), and the corresponding eigenvector, u_(2)?

w_corr, v_corr = np.linalg.eig(Corr_Q)
print(f"eigenvalues of Corr_Q..: {w_corr}")
print(f"eigenvectors of Corr_Q.: {v_mm_g}")
print(f"largest eigenvalue eta_1: {w_corr[np.argmax(w_corr)]}, eigenvector u_1: {v_corr[np.argmax(w_corr)]}")
print(f"eigenvalue eta_2: {w_corr[np.argmin(w_corr)]}, corresponding eigenvalue u_2: {v_corr[np.argmin(w_corr)]}")

eigenvalues of Corr_Q..: [1.55 0.45]
eigenvectors of Corr_Q.: [[ 0.99846976 -0.05530039]
 [ 0.05530039  0.99846976]]
largest eigenvalue eta_1: 1.5499999999999998, eigenvector u_1: [ 0.70710678 -0.70710678]
eigenvalue eta_2: 0.44999999999999996, corresponding eigenvalue u_2: [0.70710678 0.70710678]


We notice that the eigenvectors u_1 and u_2 are practically the same (except rounding difference) as for the sample covariance matrix Q_{cm, g} whereas the eigenvalues are different.

8. Performing PCA on a correlation matrix can be thought of performing PCA on a covariance matrix with appropriately changed units — what is this corresponding change of units?

In this particular example, it would be changing millimeters to centimeters and then performing PCA on the covariance matrix of the centred data matrix X' (with values in centimeters).

9. List two advantages of performing PCA on a correlation matrix as opposed to a covariance matrix.

- as we have seen, it is more robust to a lack of properly chosen units (and appropriate data preprocessing, respectively)
- overall, it can be a more efficient way of doing PCA (as there is less subtle effort required in preprocessing the data)

10. List two advantages of performing PCA on a covariance matrix as opposed to a correlation matrix.

- it is closer to the original data, correlation matrix means one more transformation of the data so to speak (e.g. can introduce rounding differences)
- gives more flexiblity wrt performing PCA, as one might want to differentiate between data with different units
