In [1]:
using LinearAlgebra, NBInclude, IterativeSolvers, BlockArrays

In [2]:
@nbinclude("helper_functions.ipynb")

In [3]:
Random.seed!(0);

## Vanilla MatrixIRLS for matrix completion (PCA)

_**Note:** Vanilla = The weighted least squares step is calculated directly._

### Sources
 - Preprint paper by Christian Kümmerle & Claudio Verdun: https://arxiv.org/pdf/0912.3599.pdf
 - GitHub repo of the preprint paper: https://github.com/ckuemmerle/MatrixIRLS

### Algorithm
 - **Input:** Sampling operator $\Phi$, observations $\mathbf{y} \in \mathbb{C}^m$, rank estimate $\tilde{r}$, iteration number $N$.
 - Initialize $k=0, \epsilon_0 = \infty, W^{(0)} = Id.$
 - **for $k=1$ to $N$ do**
    1. **Solve weighted least squares:** Use a *conjugate gradient method* to solve $$\mathbf{X}^{(k)} = argmin \langle \mathbf{X}, W^{(k-1)}(\mathbf{X}) \rangle \text{ subject to } \Phi(\mathbf{X}) = \mathbf{y}.$$
    2. **Update smoothing:** Compute $\tilde{r}+1$-th singular value of $\mathbf{X}^{(k)}$ to update $$\epsilon_k = min\left(\epsilon_{k-1}, \sigma_{\tilde{r}+1}(\mathbf{X}^{(k)})\right).$$
    3. **Update weight operator:** For $r_k := \left\vert\{i \in [d] : \sigma_i(\mathbf{X}^{(k)}) > \epsilon_k\}\right\vert$, compute the first $r_k$ singular values $\sigma_i^{(k)} := \sigma_i^{(k)}(\mathbf{X}^{(k)})$ and matrices $\mathbf{U}^{(k)} \in \mathbb{R}^{d_1 \times r_k}$ and $\mathbf{V}^{(k)} \in \mathbb{R}^{d_2 \times r_k}$ with leading $r_k$ left/right singular vectors of $\mathbf{X}^{(k)}$ to update $W^{(k)}$: $$W^{(k)}(\mathbf{Z}) = \mathbf{U}^{(k)} \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]\mathbf{V}^{(k)*},$$ where $\circ$ denotes the entrywise product of two matrices, and $\mathbf{H}_k \in \mathbb{R}^{d_1 \times d_2}$ matrix defined as $$(\mathbf{H}_k)_{ij} := \left(\max(\sigma_i^{(k)}, \epsilon^{(k)}\max(\sigma_j^{(k)}, \epsilon^{(k)}\right)^{-1} : \forall i \in [d_1] \text{ and } \forall j \in [d_2].$$
 - **end**
 - **Output**: $\mathbf{X}^{(k)}$

### Transformation of operator W

Get matrix $\mathbf{\tilde{W}} \in \mathbb{C}^{d_1 d_2 \times d_1 d_2}$ such that $\left[\mathbf{W}^{(k)}(\mathbf{Z})\right]_{vec} = \mathbf{\tilde{W}}^{(k)} \mathbf{Z}_{vec}$, where $(\cdot)_{vec}$ is the vectorization operator. To do so, we need the "vec-trick": $$(\mathbf{AXB})_{vec} = (\mathbf{B}^T \otimes \mathbf{A}) \cdot (\mathbf{X})_{vec}$$

Applying it to our problem:
$$
\begin{align}
    \left[\mathbf{W}^{(k)}(\mathbf{Z})\right]_{vec} &= \left[\mathbf{U}^{(k)} \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]\mathbf{V}^{(k)*}\right]_{vec} \\
    &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]_{vec} \\
    &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) diag\left((\mathbf{H}_k)_{vec}\right) (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})_{vec} \\
    &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) diag\left((\mathbf{H}_k)_{vec}\right) (\mathbf{V}^{(k)T} \otimes \mathbf{U}^{(k)*}) \mathbf{Z}_{vec} \\
    \mathbf{\tilde{W}} &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) diag\left((\mathbf{H}_k)_{vec}\right) (\mathbf{V}^{(k)T} \otimes \mathbf{U}^{(k)*})
\end{align}
$$

_**Notation:** $\otimes$ denotes Kronecker product, $diag$ operator creates a diagonal matrix form a vector, $(\mathbf{X})_{vec}$ is the vectorization operator formed by stacking the columns of $\mathbf{X}$ into a single column vector formed by stacking the columns of $\mathbf{X}$ into a single column vector, and $\mathbf{\bar{V}}$ is the conjugate of matrix $\mathbf{V}$._

### Solution for weighted least squares

Source of description below: [Linearly Constrained Least Squares (LLS)](https://lls.readthedocs.io/en/latest/math.html)

**Linearly constrained least squares** (or LCLS) problems have the general form:
$minimize \Vert \mathbf{Ax} - \mathbf{b} \Vert_2^2 \text{ subject to } \mathbf{Cx} = \mathbf{d},$
where the unknown variable $\mathbf{x}$ is a vector of size $n$. The values for $\mathbf{A}$, $\mathbf{b}$, $\mathbf{C}$, and $\mathbf{d}$ are given and have sizes $m\times n$, $m$, $p\times n$, and $p$, respectively. There is a unique solution to the LCLS problem if and only if there is a unique solution to the following system of linear equations in the variable $\mathbf{x}$ and a new variable $\mathbf{z}$:
$$\begin{bmatrix} 2\mathbf{A}^T\mathbf{A} & \mathbf{C}^* \\ \mathbf{C} & \mathbf{0} \end{bmatrix}
  \begin{bmatrix} \mathbf{x} \\ \mathbf{z} \end{bmatrix} =
  \begin{bmatrix} 2\mathbf{A}^*\mathbf{b} \\ \mathbf{d} \end{bmatrix};$$
i.e., the matrix on the left is invertible. This occurs when the matrix $\mathbf{C}$ has independent rows, and the matrix $\begin{bmatrix} \mathbf{A}\\ \mathbf{C}\end{bmatrix}$ has indepedent columns.

In our case, $\mathbf{A} = \mathbf{\tilde{W}}^{1/2}$, $\mathbf{b} = \mathbf{0}$, $\mathbf{C} = \Phi$, and $\mathbf{d} = \mathbf{y}$; therefore, $$min_{x} \Vert \mathbf{\tilde{W}}^{1/2} \mathbf{x} - \mathbf{b} \Vert_2 \text{ s.t. } \Phi \mathbf{x} = \mathbf{y}$$ can be solved as $$\begin{bmatrix} 2 \mathbf{\tilde{W}} & \Phi^*\\ \Phi & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ \mathbf{z} \end{bmatrix} = \begin{bmatrix} \mathbf{0} \\ \mathbf{y} \end{bmatrix}.$$ 

### Technical details

In [34]:
function vanilla_MatrixIRLS_for_PCA(
        Xᴳᵀ::AbstractArray,                     # ground truth for MSE evaluation
        y::AbstractArray,                       # under-sampled data
        Φ::AbstractArray;                       # sampling matrix
        img_size::NTuple = size(Xᴳᵀ),           # size of output matrix
        r̃::Int = 0,                             # rank estimate of solution
        maxIter::Union{Int, Nothing} = nothing, # number of CG iteration steps
        N::Int = 1000,                          # number of iterations
        verbose::Bool = false)                  # print rank and loss value in each iteration
    
    # Initialize variables
    dType = eltype(y)
    d₁, d₂ = img_size
    r̃ == 0 && (r̃ = rank(Xᴳᵀ))
    maxIter = maxIter isa Nothing ? r̃*(r̃+d₁+d₂) : maxIter
    ϵₖ = Inf
    Xᵏ = reshape(Φ' * y, d₁, d₂) # Initial guess: fill missing values with zeros
    σ, k = 0, 0                  # I just want them to be available outside of the loop
    
    verbose && (table = DebugTableModule.DebugTable(
        ("k", () -> k, 3), ("rank(Xᵏ)", () -> rank(Xᵏ, atol=1e-3), 3),
        ("‖Xᴳᵀ - Xᵏ‖₂", () -> opnorm(Xᴳᵀ - Xᵏ, 2), 3), ("σ₁", () -> σ[1]),
        ("σᵣ₊₁", () -> σ[r̃+1]), ("ϵₖ", () -> ϵₖ)))
    
    while k <= N && ϵₖ > 1e-3
        
        # Find leading rₖ left/right singular vectors of Xᵏ and calculate all singular values
        F = svd(Xᵏ)
        Uᵏ, σ, Vᵏ = F.U, F.S, F.V
        
        # Print some info
        verbose && printRow(table)
        
        # Step 2.
        ϵₖ = min(ϵₖ, σ[r̃+1])
        
        # Step 3.
        Hᵏ = [1 / (max(σ[i], ϵₖ) * max(σ[j], ϵₖ))  for i in 1:d₁, j in 1:d₂]
        W̃ᵏ = kron(conj(Vᵏ), Uᵏ) * Diagonal(vec(Hᵏ)) * kron(transpose(Vᵏ), Uᵏ')
        
        # Step 1.
        A = PseudoBlockArray{dType}(undef, [size(W̃ᵏ,1), size(Φ, 1)], [size(W̃ᵏ,1), size(Φ, 1)])
            A[Block(1,1)] = 2W̃ᵏ
            A[Block(2,1)] = Φ
            A[Block(1,2)] = Φ'
            A[Block(2,2)] .= 0
        b = PseudoBlockArray{dType}(undef, [size(W̃ᵏ,1), size(Φ, 1)])
            b[Block(1)] .= 0
            b[Block(2)] = vec(y)
        xz = PseudoBlockArray{dType}(undef, [size(W̃ᵏ,1), size(Φ, 1)])
            xz[Block(1)] = vec(Xᵏ)
            xz[Block(2)] .= 0
        cg!(xz, A, b, maxiter = maxIter)
        Xᵏ = reshape(xz[Block(1)], d₁, d₂)
        
        k += 1
    end
    
    # Print some info
    verbose && printRow(table, last = true)
    
    return Xᵏ
end

vanilla_MatrixIRLS_for_PCA (generic function with 1 method)

# Numerical Experiments

### General parameters

In [5]:
d₁, d₂ = 50, 50     # Matrix dimensions
r = 7               # Desired rank
dType = ComplexF64; # Type of matrix elements

### Generate Model

#### Sampling Mask ($\Phi$)

_**Requirement towards the sampling mask:** It must have at least $r$ non-zero entries in each row and each column._

In [6]:
df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(1.05 * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = HadamardProd_to_MatrixMult(Φᴹ)
@show r
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=1))))
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=2))))

r = 7
minimum number of non-zero entries in each column: 9
minimum number of non-zero entries in each column: 9


### Generate Data

Create a random rank-$r$ matrix $L_0 \in \mathbb{C}^{d_1 \times d_2}$ such that $L_0 = U_0 V_0^*$, where $U_0 \in \mathbb{C}^{d_1 \times r}$ and $V_0 \in \mathbb{C}^{d_2 \times r}$, and then sub-sample this low-rank matrix.

In [7]:
L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
@show size(L₀)
@show rank(L₀)

y = Φ * vec(L₀)
@show size(y);

size(L₀) = (50, 50)
rank(L₀) = 7
size(y) = (683,)


### Running The Reconstruction

In [35]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      51.429 │   23.947 │   13.997 │      Inf │
│   1 │       50 │      45.493 │   34.012 │   10.687 │   13.997 │
│   2 │       50 │      39.913 │   46.980 │    8.003 │   10.687 │
│   3 │       50 │      35.877 │   56.017 │    5.916 │    8.003 │
│   4 │       50 │      32.801 │   60.756 │    4.561 │    5.916 │
│   5 │       50 │      30.720 │   63.535 │    3.884 │    4.561 │
│   6 │       50 │      29.202 │   65.422 │    3.326 │    3.884 │
│   7 │       50 │      27.600 │   66.714 │    2.663 │    3.326 │
│   8 │       50 │      25.752 │   67.623 │    1.930 │    2.663 │
│   9 │       50 │      23.815 │   68.335 │    1.461 │    1.930 │
│  10 │       50 │      22.146 │   68.809 │    1.126 │    1.461 │
│  11 │       50 │      21.085 │   69.085 │    0.892 │    1.126 │
│  12 │   

In [38]:
F = svd(Xᵏ)
Uᵏ, σ, Vᵏ = F.U[1:r, :], F.S, F.V[:, 1:r]
I - Uᵏ*Uᵏ'

7×7 Array{Complex{Float64},2}:
  2.22045e-16-0.0im          …  -4.51028e-16+1.30104e-16im
  2.17655e-16-2.29225e-16im     -1.38778e-17-1.9082e-16im
  2.17425e-16+4.5242e-16im      -3.46945e-17+1.94289e-16im
 -4.03631e-18-1.80852e-16im        1.249e-16-1.38778e-17im
  6.41848e-17+5.55112e-17im      5.55112e-17+1.73472e-18im
  1.00614e-16+1.33574e-16im  …   2.35055e-16+1.52656e-16im
 -4.51028e-16-1.30104e-16im      4.44089e-16-0.0im

In [37]:
rank(Xᵏ, atol = 1e-3)

50

In [39]:
(I - Vᵏ*Vᵏ')

50×50 Array{Complex{Float64},2}:
    0.967078-0.0im         …    0.0117538+0.0119564im
 -0.00379408+0.0231714im      0.000680182-0.0146773im
   0.0615862+0.0111868im        0.0606428-0.0203278im
  0.00557669-0.00142833im      -0.0232873-0.0107233im
  0.00797296-0.0169562im       -0.0193126-0.0227838im
  0.00434892+0.0291389im   …    0.0177355+0.0734454im
 -0.00493565+0.00169272im         0.01768-0.00730127im
  -0.0153717-0.0179205im       0.00548028+0.0371015im
  -0.0216707+0.0230143im       -0.0258822+0.00638702im
   0.0101892+0.00428649im      0.00311701+0.0158587im
   0.0450396+0.00847354im  …   -0.0320322-0.00422681im
  -0.0139845-0.0130443im       -0.0144006+0.0122325im
  -0.0237575+0.011446im       -0.00777417-0.00300377im
            ⋮              ⋱  
  -0.0475235-0.00331277im       0.0224203+0.00893941im
  -0.0422149-0.040164im        0.00981512+0.0288209im
   0.0185604-0.0124237im   …   -0.0154805-0.0307538im
  0.00201165+0.0113448im       0.00852849-0.0165531im
   -0.017185+