# Aufgabe 1: SVD & PCA

$$
\mathbf{X}\in\R^{1000\times30}~\text{mit}~\sigma_j = \begin{cases} 2^8-j &,~j=1,\ldots,20 \\ 0 &,~j>20\end{cases}
$$

In [1]:
import math
import numpy as np
import sympy

np.set_printoptions(precision=4, suppress=True)

In [2]:
sigma = []

for j in range(1, 20+1):
    sigma.append(2 ** (8 - j))

sigma = np.concatenate((sigma, np.zeros(10)))
print(sigma)

[128.      64.      32.      16.       8.       4.       2.       1.
   0.5      0.25     0.125    0.0625   0.0312   0.0156   0.0078   0.0039
   0.002    0.001    0.0005   0.0002   0.       0.       0.       0.
   0.       0.       0.       0.       0.       0.    ]


## 2- und Frobenius-Norm

In [3]:
X_2 = np.max(sigma)
print(f'X_2 = {X_2}')

X_2 = 128.0


In [4]:
X_F = sum((2 ** (8 - j))**2 for j in range(1, 20+1))
print(f'X_F = sqrt({X_F:.4f})', end=' ')
X_F = np.sqrt(X_F)

# alternativ
# X_F = np.sqrt(2**16 * 1/4 * ((1/4)**20 - 1) / (1/4 - 1))

print(f'= {X_F:.4f}')

X_F = sqrt(21845.3333) = 147.8017


## Effektive Konditionszahl $\kappa_2(\mathbf{X})$

$$
\kappa_2(\mathbf{X}) = \frac{\max \sigma_j}{\min \sigma_j}~\text{mit}~\sigma_j\ne0
$$

In [5]:
sigma_non_zero = sigma[np.nonzero(sigma)]
cond2_X = np.max(sigma_non_zero) / np.min(sigma_non_zero)
print(f'cond2(X) = {cond2_X}')

cond2(X) = 524288.0


## Exakter Rang

In [6]:
sigma_non_zero.shape[0]

20

## Numerischer Rang

in Single Precision ($\texttt{eps} = 1.19209 \cdot 10^{-7}$)

In [7]:
eps = 1.19209 * 10**(-7)
val = sigma[0] * 1000 * eps
round(val, 4)

0.0153

In [8]:
j = sympy.Symbol('j')
j = sympy.solve(2 ** (8 - j) > val)
j = math.floor(j.rhs)
print(f'rank_ε = {j}')

rank_ε = 14


## Relativer Fehler in 2-Norm der Rang-5 Best-Approximation

*Berechnen Sie den relativen Fehler in der 2-Matrixnorm, $||\mathbf{X} - \mathbf{X}_r||_2 / ||\mathbf{X}||_2$, einer Rang-5 Best-Approximation.*

$$
\frac{|| \mathbf{X} - \mathbf{X}_r ||_2}{|| \mathbf{X} ||_2} = \frac{\sigma_j}{\sigma_m} < 10^{-1}
$$

In [9]:
tol = 10**(-1)

r = 0
sigma_max = np.max(sigma)
for i in range(0, len(sigma_non_zero)):
    val = sigma_non_zero[i] / sigma_max
    if val < tol:
        print(f'i = {i:2}:\t{val:.4f} < {tol:.4f}')
        r = i
        break
    else:
        print(f'i = {i:2}:\t{val:.4f} ≥ {tol:.4f}')

print(f'==> r = {r}')

i =  0:	1.0000 ≥ 0.1000
i =  1:	0.5000 ≥ 0.1000
i =  2:	0.2500 ≥ 0.1000
i =  3:	0.1250 ≥ 0.1000
i =  4:	0.0625 < 0.1000
==> r = 4


## Totale Varianz

Annahme: $\mathbf{X}$ enthält zentrierte Daten

$$
T = \frac{1}{n-1} ||\mathbf{X}||_F^2
$$

In [10]:
n = 1000
T = 1 / (n - 1) * X_F**2
print(f'T = {T:.4f}')

T = 21.8672


## 80% Varianz Hauptkomponenten

*In wie vielen Hauptkomponenten liegen 80% der Totalen Varianz?*

$$
\frac{\sum_{i=1}^{q\le\ell} \sigma_j^2}{\sum_{i=1}^{\ell} \sigma_j^2} > \operatorname{Tol} = 0.8
$$

In [11]:
tol = 0.8

j = 0
l = len(sigma_non_zero)

# konstant --> nur 1x berechnen
denominator = sum(sigma[i]**2 for i in range (1, l))

for q in range(1, l):
    numerator = sum(sigma[i]**2 for i in range(1, q+1))
    val = numerator / denominator
    if val > tol:
        print(f'q = {q}:\t{val:.4f} > {tol:.4f}')
        j = q
        break
    else:
        print(f'q = {q}:\t{val:.4f} ≤ {tol:.4f}')

print(f'==> j = {j}')

q = 1:	0.7500 ≤ 0.8000
q = 2:	0.9375 > 0.8000
==> j = 2
