Exercise 11: Low-rank approximations in the Ising model
====================================
<img src="ising.png" style="max-width:30%; float:right; padding-left:30pt">

Let us return to our study of the [Ising model]. As reminder: the Ising mode is a $N \times N$
(in our case `40 x 40`) square grid of Ising spins, $\sigma_{i,j}$, which can either be $+1$ or $-1$
at any given moment.  The potential energy $U$ encoded in each configuration is given by:
$$
   U = -\sum_{i,j} \big[ \sigma_{i,j} \sigma_{i,j+1} + \sigma_{i,j} \sigma_{i+1,j} \big],
$$
which is competing with the kinetic energy due to temperature $T$, which is randomly flipping spins.

The dataset is almost the same as for the last exercise, but I have split it by temperature: there
are $N_T = 10$ temperatures, $T = 0.25, 0.5, \ldots, 3.75$, stored in the array `temp`.
For each temperature, there are $N = 16000$
observations, so the `spins` tensor is now of shape $N_T \times N \times L \times L$.

The idea of this lecture is to explore different low-rank approximations to this tensor.

[Ising model]: https://en.wikipedia.org/wiki/Ising_model


In [None]:
import numpy as np
import os
import matplotlib.pyplot as pl

In [None]:
# Load the dataset from a binary file
with np.load("../shared/ising.npz") as _datafile:
    spins = _datafile["spins"].reshape(10, 16_000, 40, 40)
    temp = _datafile["temp"][::16_000]

In [None]:
spins.shape

In [None]:
temp

In [None]:
temp.shape

Part 1. Compressing the design matrix for low T
--------------------------------
First, let us analyze the singular value decomposition of the "design matrices" $X$
for each temperature

For this, make a new tensor `X`, which contains all 10 temperatures.  It should "flatten" out the
grid of spins into a single dimension (in other words, the spins are the features in our design
matrix).

Also, restrict yourself to the first 1,000 observations in each temperature (otherwise the SVD will
become too expensive)

In [None]:
#X = ???
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
X.shape

In [None]:
assert X.shape == (10, 1000, 1600)


Perform a **thin SVD** of the design matrix for $T = 1.75$.
Plot the singular values on a logarithmic scale.

Reminder: thin SVDs can be computed by passing `full_matrices=False`.

In [None]:
# SVD
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Plot
# YOUR CODE HERE
raise NotImplementedError()

Let us now do a truncated SVD approximation: 
$$ 
\hat X_K = \sum_{k=0}^{K-1} s_k \vec u_k \vec v_k^T
$$
with $K = 1$ (rank-1 approximation). Compute this approximation and store it in `X1`.

Also, infer the relative error of this approximation
$$
    \epsilon_1 = \frac{\Vert X - \hat X_1\Vert}{\Vert X\Vert}
$$
and print it.  Compute this quantity **only** from the singular values.

In [None]:
# X1 = ???
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert X1.shape == (1000, 1600)
assert np.allclose(X1 / X1[:1], X1[:,:1] / X1[0,0]), "not rank-1"
assert np.allclose(X1 / X1[:,:1], X1[:1] / X1[0,0]), "not rank-1"

Let us now analyze this approximation visually. Make a figure with three false color plots
as subplots, showing the following:

 1. the actual design matrix $X$
 2. the rank-1 approximation $\hat X_1$
 3. the difference: $\tfrac12 (X - \hat X_1)$. (the one-half is there so that the result is again in the range $[-1,1]$)
Add titles and a colorbar.

Hint: I find plotting this with the `'binary'` colormap gives a most appealing picture, but you can use any one.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Let us analyze these plots:

 1. Observe the data $X$ mostly consists of "stripes".  Translate these back to our field
    of Ising spins: what does these stripes correspond to?  How is this related to
    the chosen temperature? (Hint: think about the last exercise)
    
 2. Now lets talk about the model $\hat X_1$. How do the dominant left ($u_k$) and right ($v_k$) 
    singular vector relate to the spin configurations?
    
 3. Using the previous points, explain why the model has such a good performance in this case.
    What is "missing"?

YOUR ANSWER HERE

Part 2. Compressing the design matrix for higher T
----------------------------------------
Let us redo our analysis for $T = 2.5$.

Again perform a singular value decomposition of the design matrix $X$ for $T = 2.5$.
(You may want to use different variables for the result of the SVD).

Make a plot with two lines, one for the previous case ($T = 1.75$), one for this case.
In both cases plot the **normalized** singular values $s_k/s_0$ on a log scale.

Observe the very different behaviour.  Also, print the relative error
of a rank-1 approximation.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Since a rank-1 approximation is so poor, let us construct a **rank-10**
approxiamtion $\hat X_{10}$ to this higher temperature case.
Store this approximation in `Xhat`.

Afterwards, repeat the falso plot with the three panels of data, model
and difference in this case.

In [None]:
# Xhat = ???
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Let us analyze the data:

 1. First, compare the data to the low-temperature case above. What changed?

 2. Think of the data $X$ (first panel) as the sum of model plus difference (other panels).
    Which qualitative features of the data are captured by the model, and what mainly is "left over"?
    
 3. Observe that the singular values of $X$ decay very differently in the case of
    the ordered phase and the disordered phase.  Discuss the implications of this
    for "compressing" the data.  Use it to resolve the following apparent contradiction: 
    **"Randomness is information"**.
    

YOUR ANSWER HERE

Part 3: Principal Component Analysis
------------------------------------------------

Finally, let us perform a Principal Component Analysis (PCA).
For this, we are going to use sklearn again, specifically the [PCA]
class.

Use the design matrix for $T = 1.75$ and perform a PCA with two
components.

[PCA]: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

In [None]:
import sklearn.decomposition

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Reduce the design matrix $X$ to the principal space $\tilde X = X W$
and store it in `Xred`.
Make a 2D scatter plot, where each observation is again a point,
the first component refers to the first principal component and
the second component to the second principal component.

**Hint**: The matrix $W^T$ is is stored as `components_` in the PCA class. Note the transpose!

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Speculate on what the first principal component may mainly correspond to?

YOUR ANSWER HERE