# Homework set 2

Please **submit this Jupyter notebook through Canvas** no later than **Thursday November 14**. **Submit the notebook file with your answers (as .ipynb file) and a pdf printout. The pdf version can be used by the teachers to provide feedback. On canvas there are hints about creating a nice pdf version.**

Before you hand in, please make sure the notebook runs, by running "Restart kernel and run all cells..." from the Kernel menu.

Homework is in **groups of two**, and you are expected to hand in original work. Work that is copied from another group will not be accepted.

# Exercise 0
Write down the names + student ID of the people in your group.

## Authors

- Tycho Stam  (13303147)
- Henry Zwart (15393879)

Run the following cell to import NumPy, Matplotlib and some other functions.

In [None]:
import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt

---

# Very short introduction to Matplotlib

`matplotlib` is a useful package for visualizing data using Python. Run the first cell below to plot $\sqrt{x}, x, x^2, x^3$ for $x \in [1, 10]$.

In [None]:
x = np.linspace(1, 10, 10)  # 10 points evenly between 1 and 10.
print(x)
plt.plot(x, x**0.5, label=r"$x^{1/2}$")
plt.plot(x, x**1, label=r"$x$")
plt.plot(x, x**2, label=r"$x^2$")
plt.plot(x, x**3, label=r"$x^3$")
plt.legend()
plt.show()

When visualizing functions where $y$ has many different orders of magnitude, a logarithmic scale is useful:

In [None]:
x = np.linspace(1, 10, 10)
plt.semilogy(x, x**0.5, label=r"$x^{1/2}$")
plt.semilogy(x, x**1, label=r"$x$")
plt.semilogy(x, x**2, label=r"$x^2$")
plt.semilogy(x, x**3, label=r"$x^3$")
plt.legend()
plt.show()

When also the $x$-axis contains many orders of magnitude, a log-log plot is most useful:

In [None]:
x = np.logspace(1, 10, 10, base=10)  # 10 points evenly between 10^1 and 10^10.
print(x)

plt.plot(x, x**0.5, label=r"$x^{1/2}$")
plt.plot(x, x**1, label=r"$x$")
plt.plot(x, x**2, label=r"$x^2$")
plt.plot(x, x**3, label=r"$x^3$")
plt.legend()
plt.show()

plt.loglog(x, x**0.5, label=r"$x^{1/2}$")
plt.loglog(x, x**1, label=r"$x$")
plt.loglog(x, x**2, label=r"$x^2$")
plt.loglog(x, x**3, label=r"$x^3$")
plt.legend()
plt.show()

## Plots of arbitrary curves in the $(x,y)$ plane

So far, in all our plots, $y$ was a function of $x$. But this is not the only possibility. One can draw arbitrary curves in the $(x,y)$ plane. Next are two examples

In [None]:
# plot a triangle
x = [0, 3, 0, 0]
y = [0, 0, 4, 0]
plt.plot(x, y)
# set aspect ratio to one
plt.gca().set_aspect("equal")

In [None]:
# we plot three quarters of a circle from (0,-1) going counter clockwise to (-1,0)
t = np.linspace(-0.5 * np.pi, np.pi, 271)
x = np.cos(t)
y = np.sin(t)
plt.plot(x, y)
# set the aspectratio to 1 so that it truely looks like a circle
plt.gca().set_aspect("equal")

---

# Exercise 1

Show that the Gram–Schmidt orthogonalization of an $m \times m$ matrix requires approximately $m^3$ multiplications and $m^3$ additions.


\begin{align}
&\textbf{for } k = 1 \textbf{ to } m \\
&\quad q_k = a_k \\
&\quad \textbf{for } j = 1 \textbf{ to } k - 1 \\
&\quad \quad r_{jk} = q_j^T a_k \\
&\quad \quad q_k = q_k - r_{jk} q_j \\
&\quad \textbf{end} \\
&\quad r_{kk} = \| q_k \|_2 \\
&\quad \textbf{if } r_{kk} = 0 \textbf{ then stop} \\
&\quad q_k = q_k / r_{kk} \\
&\textbf{end}
\end{align}

$ T = \text{total time} = m \cdot T_1$

$T_1 = \text{time taking for the outerloop (the space between line 2 up to line 9)} $

$T_2 = \text{Second for loop (the space between line 4 to line 5)}$

## Multiplications

\begin{align}
T_2 &= m \cdot 2m \\
T_1 &= T_2 + m + m \quad \text{(from line 7 and 9)} \\
T &= m \cdot T_1 \\
&= m(m \cdot 2m + 2m) \\
&= 2m^3 + 2m^2 \approx O(m^3)
\end{align}



## Additions

\begin{align}
T_2 &= m \cdot 2m \\
T_1 &= T_2 + m \quad \text{(from line 7)} \\
T &= m \cdot T_1 \\
&= m(m \cdot 2m + m) \\
&= 2m^3 + m^2 \approx O(m^3)
\end{align}

-----
# Exercise 2
We want to reconstruct a function $s(r)$ (also called the signal in this exercise), $t \in [0,1]$, from data given by
$$d(t) = \int_0^t s(r) \, dr + \text{noise}.$$
We assume the data is given at $n$ equally space time points $t_j = j h$, $h = \frac{1}{n}$, $j=1,2, \ldots, n$. The data is therefore a vector $d = [d_1, \ldots, d_n]$, where $d_j$ denotes the value at $t_j$. 
The signal $s$ is to be reconstructed at time points 
$t_{j-1/2} = (j-1/2)h$ for $j = 1,2, \ldots, n$. It is described by a vector $s = [s_1, \ldots, s_n]$ with $s_j$ the value at $t_{j-1/2}$.
Numerical integration is described in Chapter 8 of the book by Heath. Using the composite midpoint rule, the vectors $s$ and $d$ are related by
$$d = A \cdot s + \text{noise}$$
where
$$A = \begin{bmatrix} 
h & 0 & 0 & \ldots & 0 \\
h & h & 0 & \ldots & 0 \\
h & h & h & \ddots &  \vdots \\
\vdots & \vdots & \ddots & \ddots & 0 \\
h & h & \ldots & h  & h 
\end{bmatrix}.$$


## (a)
As a test signal we take
$$s_{\rm true}(t) = \left\{
\begin{array}{ll} 
1 & \text{if $0.05 \le |t-1/2|<0.15$}\\
0.7 & \text{if $|t-1/2|<0.05$}\\
0 & \text{otherwise} \end{array} 
\right. .$$ 
Generate data $d_0$ without noise and data $d_\epsilon$ with noise, where the noise is normally distributed, with mean zero and standard deviation $\epsilon = 0.005$.
Take for example $n=100$. Plot the data.

In [None]:
n = 100
h = 1 / n
H = np.full((n, n), h)
A = np.tril(H)


def s_true(t):
    like_t = np.zeros_like(t)
    like_t[np.abs(t - 0.5) <= 0.15] = 1
    like_t[np.abs(t - 0.5) < 0.05] = 0.7

    return like_t


t = np.arange(1, n + 1, dtype=float) * h
s = s_true(t)

In [None]:
d_0 = np.empty(n)
d_0 = A @ s

fig, axis = plt.subplots(2)
axis[0].plot(t, s)
axis[1].plot(t, d_0, color="red")
axis[1].set_xlabel("t")
axis[0].set_ylabel(r"$s(t)$")
axis[1].set_ylabel(r"$d_0(t)$")
plt.show()

In [None]:
d_epsilon = np.empty(n)
epsilon = 0.005
noise = np.random.normal(scale=epsilon, size=n)
d_epsilon = A @ s + noise

fig, axis = plt.subplots(2)
axis[0].plot(t, s)
axis[1].plot(t, d_epsilon, color="red")
axis[1].set_xlabel("t")
axis[0].set_ylabel(r"$s(t)$")
axis[1].set_ylabel(r"$d_\epsilon(t)$")
plt.show()

## (b)
Try to determine $s$ from $d_0$ by inverting the matrix $A$, ignoring the noise term.
Do the same with $d_\epsilon$ instead of $d_0$. Plot the results. What do you observe about the errors in the inversion?

You may use a library routine for matrix inversion/solving a linear system.

In [None]:
s_prime = la.inv(A) @ d_0
fig, axis = plt.subplots(2, 2, figsize=(10, 6))
axis[0, 0].plot(t, s)
axis[1, 0].plot(t, s_prime, color="red")
axis[0, 0].set_ylabel(r"$s(t)$")
axis[0, 1].set_ylabel(r"$\hat{s}_0(t) - s(t)$")
axis[1, 0].set_ylabel(r"$\hat{s}_0(t) = A^{-1}d_0(t)$")

axis[0, 1].plot(t, s_prime - s)
axis[0, 1].axvline(x=0.35, color="k", linestyle="-")
axis[0, 1].axvline(x=0.45, color="k", linestyle="-")
axis[0, 1].axvline(x=0.55, color="k", linestyle="-")
axis[0, 1].axvline(x=0.65, color="k", linestyle="-")


fig.tight_layout()
plt.show()

In [None]:
s_prime = la.inv(A) @ d_epsilon

fig, axis = plt.subplots(2, 2, figsize=(10, 6))
axis[0, 0].plot(t, s)
axis[1, 0].plot(t, s_prime, color="red")
axis[0, 0].set_ylabel(r"$s(t)$")
axis[0, 1].set_ylabel(r"$\hat{s}_\epsilon(t) - s(t)$")
axis[1, 0].set_ylabel(r"$\hat{s}_\epsilon(t) = A^{-1}d_\epsilon(t)$")

axis[0, 1].plot(t, s_prime - s)


fig.tight_layout()
plt.show()

In [None]:
print(f"Condtion number: {la.norm(A, np.inf) * la.norm(la.inv(A), np.inf):.2f}")

For $d_0$, the absolute error increases for increase in $t$. The errors are particularly pronounced around regions where the function changes sharply. However, the black vertical lines on the error plot illustrate that these spikes in the error do not correspond exactly to the sharp changes.

Because of the noise with $d_{\epsilon}$, and the assumption that we can ignore it when inverting, the plot doesn't resemble the original graph. The condition number of $A$ is:
$$
\|A\| \cdot \|A^{-1}\|_{\infty} = 200, \text{for n=100} \\
$$
Since the error in $d_{\epsilon} \sim \mathcal{N} (As, 0.005)$, the condition number amplifies the error by a factor $200$.
For error within one $\sigma (\pm 0.005)$, this is increased to $\pm 1$. This explains the noisy final estimate, which lies between $-1$ and $2$.

## (c)
One way to address the issue just observed is by truncated SVD regularization. Suppose $A = U \Sigma V^T$ is the singular value decomposition of $A$. 

Express the inverse $A^{-1}$ in terms of $U, V$ and $\Sigma$, or in terms of $U$, $V$ and the singular values.

Let $k$ be some integer less or equal than $n$. Denote by $B_k$ the matrix that is obtained from $A$ by setting the smallest $n-k$ singular values to zero (and keeping $U$, $V$ and the first $k$ singular values the same). 

In truncated SVD regularization, an estimate for $s$ is obtained by applying the pseudoinverse $B_k^{+}$ to the data (instead of the true inverse $A^{-1}$) (see section 3.6 of Heath). Try truncated SVD regularization for various values of $k$. Show that for certain values of $k$ the result obtained by truncated SVD regularization is a "better" approximation of the true signal than the result obtained by the true inverse $A^{-1}$.
Note that better can mean different things, it can mean "visually better" or "quantitatively better" in some norm to be specified, try to be precise in what you write down.

What happens if you choose $k$ too small?

You may use library routines to compute the SVD.

----
When $k$ is to small we include the low magnitude singluar values which reflect the noise in the data. As a result we get a noisy estimate for the inverse.
In particular, when $k=0$ (or nearby), the output is similar to that of $A^{-1}$.

In [None]:
U, Sigma, VT = la.svd(A)
np.isclose(A, U @ np.diag(Sigma) @ VT).all()

In [None]:
U, Sigma, VT = la.svd(A)
Sigma_p = Sigma.copy()
nonzero = ~np.isclose(Sigma, 0)
Sigma_p[nonzero] = 1 / Sigma_p[nonzero]
A_range = [0, 15, 45, 80, 99]

fig, axis = plt.subplots(1, 2, figsize=(10, 6))

for k in A_range:
    Sigma_k_p = np.diag(np.where(np.arange(n) < n - k, Sigma_p, 0))
    B_k_p = VT.T @ Sigma_k_p @ U.T

    s_est = B_k_p @ d_epsilon
    s_prime = la.inv(A) @ d_epsilon
    axis[0].plot(t, s_est, label=f"k={k}")
    axis[1].plot(t, s_prime, label=f"k={k}", color="red")

    print(f"k={k}: 2-norm={la.norm(s_est - s, ord=2):.2f}")
    legend = axis[0].legend(loc="upper right")

plt.tight_layout()

In [None]:
k_range = np.arange(n)
error_values = np.empty_like(k_range, dtype=np.float64)


for k in k_range:
    Sigma_k_p = np.diag(np.where(np.arange(n) < n - k, Sigma_p, 0))
    B_k_p = VT.T @ Sigma_k_p @ U.T

    s_est = B_k_p @ d_epsilon
    s_prime = la.inv(A) @ d_epsilon
    error_values[k] = la.norm(s_est - s, ord=2)

fig, ax = plt.subplots()
ax.plot(k_range, error_values)
ax.set_ylim(0, None)
ax.set_xlim(0, 99)
ax.set_xlabel(r"$k$")
ax.set_ylabel(r"$||s - \hat{s}||_2$")
fig.tight_layout()
plt.show()

## (d)

Make a plot of the singular values of $A$. Explain your findings in (b) and (c). Part of your explanation should refer to the plot.

In [None]:
fig, ax = plt.subplots()
sigma_index = np.arange(1, n + 1)
ax.scatter(sigma_index, Sigma, s=5)
ax.set_ylim(0, None)
ax.set_xlim(0, 99)
ax.set_xlabel(r"$n - k$")
ax.set_ylabel(r"$\Sigma_k$")

print(f"Condition number: {Sigma.max() / Sigma.min()}")

As seen in the figure above, the values of $\Sigma_k$​ decrease quickly as $k$ increases, eventually flattening out near zero as $k \rightarrow n$.
A similar patern is seen in the figure for c\), whereby the optimal $k$ is around $80$.
When $k$ is larger than $80$ more of the large magnitute value are being excluded, resulting in less values that can be used to construct $d_{\epsilon}$

Furthermore, the difference in magnitude between the largest and smallest singular values supports our earlier discussion on the condition number, which may 
also be defined as the ratio of these values: $\sigma_{max}/\sigma{min} = 127.9$. This is of a similar magnitude to $200$, the 
condition number computed as $\|A\|\|A^{-1}\|_\infty$. 