<div>Carthage University<br>
INSAT<br>
Department of Mathematics & Computer Sciences</div>
<hr>
<div style="text-align:center"><h1>Principle Component Analysis (PCA)</h1></div>
<hr>

# Exercice 2

Consider the following data matrix:

|   |   |   |
|---|---|---|
| 1 | 1 | 3 |
|-1 |-1 | 3 |
| 1 | 1 |-3 |
|-1 |-1 |-3 |
| 1 |-1 | 0 |
|-1 | 1 | 0 |

1. Determine the standardized data matrix Z.
2. Deduce the correlation matrix RX.
3. Determine the spectrum of RX.
4. Deduce the principle components matrix CX.
5. Decide how many principle components we should retain. Justify your decision.
6. Say whether we were able to predict the result of PCA earlier.

In [2]:
# import libraries
from sklearn.preprocessing import StandardScaler # for normalization
from numpy.linalg import eigvals # for eigenvalues
import matplotlib.pyplot as plt # for plotting
from numpy.linalg import eigh   # for eigenvalues and eigenvectors
import numpy as np            # for arrays


In [4]:
data = np.genfromtxt('Practice Sheets\Practice #3\ex2.csv')
print(data)

[[ 1.  1.  3.]
 [-1. -1.  3.]
 [ 1.  1. -3.]
 [-1. -1. -3.]
 [ 1. -1.  0.]
 [-1.  1.  0.]]


In [5]:
object= StandardScaler()
scale = object.fit_transform(data) 
print(scale)

[[ 1.          1.          1.22474487]
 [-1.         -1.          1.22474487]
 [ 1.          1.         -1.22474487]
 [-1.         -1.         -1.22474487]
 [ 1.         -1.          0.        ]
 [-1.          1.          0.        ]]


In [7]:
RX = np.corrcoef(scale, rowvar=False)
RX

array([[1.        , 0.33333333, 0.        ],
       [0.33333333, 1.        , 0.        ],
       [0.        , 0.        , 1.        ]])

In [8]:
spectrum = eigvals(RX)
print(spectrum)

[1.33333333 0.66666667 1.        ]


In [9]:
eigenvalues, eigenvectors = eigh(RX)

In [10]:
print("eigenvalues", eigenvalues)

eigenvalues [0.66666667 1.         1.33333333]


In [11]:
print("eigenvectors", eigenvectors)

eigenvectors [[-0.70710678  0.          0.70710678]
 [ 0.70710678  0.          0.70710678]
 [ 0.          1.          0.        ]]


In [12]:
CX = eigenvectors.T.dot(scale.T).T
print(CX)

[[ 0.          1.22474487  1.41421356]
 [ 0.          1.22474487 -1.41421356]
 [ 0.         -1.22474487  1.41421356]
 [ 0.         -1.22474487 -1.41421356]
 [-1.41421356  0.          0.        ]
 [ 1.41421356  0.          0.        ]]


In [13]:
cumulative_variance = np.cumsum(spectrum) / np.sum(eigenvalues)
print("Cumulative variance:", cumulative_variance)
n_components = np.argmax(cumulative_variance >= 0.95) + 1
print("Number of principal components to retain:", n_components)

Cumulative variance: [0.44444444 0.66666667 1.        ]
Number of principal components to retain: 3


# Exercice 3

Find below the record of the results of 6 cognitive tests conducted on 15 children of 10
years old. Each test has a score out of 5. Tests are as follows: CUB (cubes), PUZ (puzzles), CAL
(mental calculus ), MEM (memory of digits), COM (Comprehension) and VOC (Vocabulary).

|     | CUB | PUZ | CAL | MEM | COM | VOC |
|-----|-----|-----|-----|-----|-----|-----|
| I1  |  5  |  5  |  4  |  0  |  1  |  1  |
| I2  |  4  |  3  |  3  |  2  |  2  |  1  |
| I3  |  2  |  1  |  2  |  3  |  2  |  2  |
| I4  |  5  |  3  |  5  |  3  |  4  |  3  |
| I5  |  4  |  4  |  3  |  2  |  3  |  2  |
| I6  |  2  |  0  |  1  |  3  |  1  |  1  |
| I7  |  3  |  3  |  4  |  2  |  4  |  4  |
| I8  |  1  |  2  |  1  |  4  |  3  |  3  |
| I9  |  0  |  1  |  0  |  3  |  1  |  0  |
| I10 |  2  |  0  |  1  |  3  |  1  |  0  |
| I11 |  1  |  2  |  1  |  1  |  0  |  1  |
| I12 |  4  |  2  |  4  |  2  |  1  |  2  |
| I13 |  3  |  2  |  3  |  3  |  2  |  3  |
| I14 |  1  |  0  |  0  |  3  |  2  |  2  |
| I15 |  2  |  1  |  1  |  2  |  3  |  2  |


In [15]:
data = np.genfromtxt('Practice Sheets\Practice #3\ex3.csv')
data

array([[ 2.4       ,  3.06666667,  1.8       , -2.4       , -1.        ,
        -0.8       ],
       [ 1.4       ,  1.06666667,  0.8       , -0.4       ,  0.        ,
        -0.8       ],
       [-0.6       , -0.93333333, -0.2       ,  0.6       ,  0.        ,
         0.2       ],
       [ 2.4       ,  1.06666667,  2.8       ,  0.6       ,  2.        ,
         1.2       ],
       [ 1.4       ,  2.06666667,  0.8       , -0.4       ,  1.        ,
         0.2       ],
       [-0.6       , -1.93333333, -1.2       ,  0.6       , -1.        ,
        -0.8       ],
       [ 0.4       ,  1.06666667,  1.8       , -0.4       ,  2.        ,
         2.2       ],
       [-1.6       ,  0.06666667, -1.2       ,  1.6       ,  1.        ,
         1.2       ],
       [-2.6       , -0.93333333, -2.2       ,  0.6       , -1.        ,
        -1.8       ],
       [-0.6       , -1.93333333, -1.2       ,  0.6       , -1.        ,
        -1.8       ],
       [-1.6       ,  0.06666667, -1.2       , -1.

In [16]:
object= StandardScaler()
scale = object.fit_transform(data) 
scale

array([[ 1.60356745,  2.13549639,  1.15549333, -2.52050415, -0.8660254 ,
        -0.72231512],
       [ 0.93541435,  0.74278135,  0.51355259, -0.42008403,  0.        ,
        -0.72231512],
       [-0.40089186, -0.64993368, -0.12838815,  0.63012604,  0.        ,
         0.18057878],
       [ 1.60356745,  0.74278135,  1.79743407,  0.63012604,  1.73205081,
         1.08347268],
       [ 0.93541435,  1.43913887,  0.51355259, -0.42008403,  0.8660254 ,
         0.18057878],
       [-0.40089186, -1.3462912 , -0.77032889,  0.63012604, -0.8660254 ,
        -0.72231512],
       [ 0.26726124,  0.74278135,  1.15549333, -0.42008403,  1.73205081,
         1.98636658],
       [-1.06904497,  0.04642383, -0.77032889,  1.6803361 ,  0.8660254 ,
         1.08347268],
       [-1.73719807, -0.64993368, -1.41226963,  0.63012604, -0.8660254 ,
        -1.62520902],
       [-0.40089186, -1.3462912 , -0.77032889,  0.63012604, -0.8660254 ,
        -1.62520902],
       [-1.06904497,  0.04642383, -0.77032889, -1.

In [17]:
RX = np.corrcoef(scale, rowvar=False)
RX

array([[ 1.        ,  0.73203021,  0.92073688, -0.44908871,  0.3086067 ,
         0.27348302],
       [ 0.73203021,  1.        ,  0.75099404, -0.61431021,  0.28142954,
         0.28502742],
       [ 0.92073688,  0.75099404,  1.        , -0.3685477 ,  0.40768712,
         0.48686768],
       [-0.44908871, -0.61431021, -0.3685477 ,  1.        ,  0.30316953,
         0.20228869],
       [ 0.3086067 ,  0.28142954,  0.40768712,  0.30316953,  1.        ,
         0.78192905],
       [ 0.27348302,  0.28502742,  0.48686768,  0.20228869,  0.78192905,
         1.        ]])

In [18]:
spectrum = eigvals(RX)
spectrum

array([3.25811921, 1.83716258, 0.44298514, 0.04003833, 0.16794194,
       0.2537528 ])

In [19]:
eigenvalues, eigenvectors = eigh(RX)
print("eigenvalues",spectrum)

eigenvalues [3.25811921 1.83716258 0.44298514 0.04003833 0.16794194 0.2537528 ]


In [20]:
print("eigenvectors",eigenvectors)

eigenvectors [[ 0.60798795 -0.26950543 -0.02587016 -0.53660379  0.14888433 -0.49692861]
 [ 0.11071088  0.6300059  -0.45221096  0.33370991  0.2126972  -0.47931046]
 [-0.70123699  0.11584291  0.28326753 -0.37314053  0.02880432 -0.52396858]
 [ 0.0859089   0.51456529 -0.19530814 -0.54088231 -0.57998147  0.24650511]
 [-0.19112448 -0.48388171 -0.54218997  0.1726045  -0.56272899 -0.29816727]
 [ 0.28714375  0.13461376  0.61844988  0.3729802  -0.52794057 -0.31482039]]


In [21]:
CX = eigenvectors.T.dot(scale.T).T
CX

array([[ 0.14267434,  0.07191725, -0.16475613,  0.36540155,  3.05676612,
        -2.56156368],
       [ 0.04733533, -0.0380454 , -0.57929029, -0.48789388,  0.93702898,
        -0.96609577],
       [-0.11967613,  0.032257  ,  0.25652026, -0.22733412, -0.66242001,
         0.67648541],
       [-0.16903506, -0.1240099 , -0.26031949, -0.92105278, -1.46363838,
        -2.79689126],
       [ 0.21817136,  0.10315204, -0.80534646,  0.23072895,  0.12113036,
        -1.84233706],
       [ 0.15964043, -0.18330472,  0.30073546, -0.70642284,  0.13498795,
         1.88908347],
       [-0.3623026 , -0.25709211,  0.3559192 ,  0.94034989, -1.54865603,
        -2.33961725],
       [ 0.18530585,  0.81956268, -0.33920621,  0.52131625, -2.20538921,
         0.72750744],
       [-0.38483104,  0.41964018, -0.71983016,  0.1457981 ,  0.54232985,
         2.839967  ],
       [-0.09961991, -0.30484666, -0.25765916, -1.04318439,  0.61166227,
         2.17333288],
       [-0.10732604,  0.21243543,  0.56799912,  1.

In [22]:
cumulative_variance = np.cumsum(spectrum) / np.sum(eigenvalues)
print("Cumulative variance:", cumulative_variance)
n_components = np.argmax(cumulative_variance >= 0.75) + 1
print("Number of principal components to retain:", n_components)

Cumulative variance: [0.54301987 0.84921363 0.92304449 0.92971754 0.95770787 1.        ]
Number of principal components to retain: 2
