In [1]:
using LinearAlgebra, NBInclude, IterativeSolvers, BlockArrays

In [2]:
@nbinclude("helper_functions.ipynb")

## Vanilla MatrixIRLS for matrix completion (PCA)

_**Note:** Vanilla = The weighted least squares step is calculated directly._

### Sources
 - Preprint paper by Christian Kümmerle & Claudio Verdun: https://arxiv.org/pdf/0912.3599.pdf
 - GitHub repo of the preprint paper: https://github.com/ckuemmerle/MatrixIRLS

### Algorithm
 - **Input:** Sampling operator $\Phi$, observations $\mathbf{y} \in \mathbb{C}^m$, rank estimate $\tilde{r}$, iteration number $N$.
 - Initialize $k=0, \epsilon_0 = \infty, W^{(0)} = Id.$
 - **for $k=1$ to $N$ do**
    1. **Solve weighted least squares:** Use a *conjugate gradient method* to solve $$\mathbf{X}^{(k)} = argmin \langle \mathbf{X}, W^{(k-1)}(\mathbf{X}) \rangle \text{ subject to } \Phi(\mathbf{X}) = \mathbf{y}.$$
    2. **Update smoothing:** Compute $\tilde{r}+1$-th singular value of $\mathbf{X}^{(k)}$ to update $$\epsilon_k = min\left(\epsilon_{k-1}, \sigma_{\tilde{r}+1}(\mathbf{X}^{(k)})\right).$$
    3. **Update weight operator:** For $r_k := \left\vert\{i \in [d] : \sigma_i(\mathbf{X}^{(k)}) > \epsilon_k\}\right\vert$, compute the first $r_k$ singular values $\sigma_i^{(k)} := \sigma_i^{(k)}(\mathbf{X}^{(k)})$ and matrices $\mathbf{U}^{(k)} \in \mathbb{R}^{d_1 \times r_k}$ and $\mathbf{V}^{(k)} \in \mathbb{R}^{d_2 \times r_k}$ with leading $r_k$ left/right singular vectors of $\mathbf{X}^{(k)}$ to update $W^{(k)}$: $$W^{(k)}(\mathbf{Z}) = \mathbf{U}^{(k)} \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]\mathbf{V}^{(k)*},$$ where $\circ$ denotes the entrywise product of two matrices, and $\mathbf{H}_k \in \mathbb{R}^{d_1 \times d_2}$ matrix defined as $$(\mathbf{H}_k)_{ij} := \left(\max(\sigma_i^{(k)}, \epsilon^{(k)}\max(\sigma_j^{(k)}, \epsilon^{(k)}\right)^{-1} : \forall i \in [d_1] \text{ and } \forall j \in [d_2].$$
 - **end**
 - **Output**: $\mathbf{X}^{(k)}$

### Transformation of operator W

Get matrix $\mathbf{\tilde{W}} \in \mathbb{C}^{d_1 d_2 \times d_1 d_2}$ such that $\left[\mathbf{W}^{(k)}(\mathbf{Z})\right]_{vec} = \mathbf{\tilde{W}}^{(k)} \mathbf{Z}_{vec}$, where $(\cdot)_{vec}$ is the vectorization operator. To do so, we need the "vec-trick": $$(\mathbf{AXB})_{vec} = (\mathbf{B}^T \otimes \mathbf{A}) \cdot (\mathbf{X})_{vec}$$

Applying it to our problem:
$$
\begin{align}
    \left[\mathbf{W}^{(k)}(\mathbf{Z})\right]_{vec} &= \left[\mathbf{U}^{(k)} \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]\mathbf{V}^{(k)*}\right]_{vec} \\
    &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]_{vec} \\
    &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) diag\left((\mathbf{H}_k)_{vec}\right) (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})_{vec} \\
    &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) diag\left((\mathbf{H}_k)_{vec}\right) (\mathbf{V}^{(k)T} \otimes \mathbf{U}^{(k)*}) \mathbf{Z}_{vec} \\
    \mathbf{\tilde{W}} &= (\mathbf{\bar{V}}^{(k)} \otimes \mathbf{U}^{(k)}) diag\left((\mathbf{H}_k)_{vec}\right) (\mathbf{V}^{(k)T} \otimes \mathbf{U}^{(k)*})
\end{align}
$$

_**Notation:** $\otimes$ denotes Kronecker product, $diag$ operator creates a diagonal matrix form a vector, $(\mathbf{X})_{vec}$ is the vectorization operator formed by stacking the columns of $\mathbf{X}$ into a single column vector formed by stacking the columns of $\mathbf{X}$ into a single column vector, and $\mathbf{\bar{V}}$ is the conjugate of matrix $\mathbf{V}$._

### Solution for weighted least squares

Source of description below: [Linearly Constrained Least Squares (LLS)](https://lls.readthedocs.io/en/latest/math.html)

**Linearly constrained least squares** (or LCLS) problems have the general form:
$minimize \Vert \mathbf{Ax} - \mathbf{b} \Vert_2^2 \text{ subject to } \mathbf{Cx} = \mathbf{d},$
where the unknown variable $\mathbf{x}$ is a vector of size $n$. The values for $\mathbf{A}$, $\mathbf{b}$, $\mathbf{C}$, and $\mathbf{d}$ are given and have sizes $m\times n$, $m$, $p\times n$, and $p$, respectively. There is a unique solution to the LCLS problem if and only if there is a unique solution to the following system of linear equations in the variable $\mathbf{x}$ and a new variable $\mathbf{z}$:
$$\begin{bmatrix} 2\mathbf{A}^T\mathbf{A} & \mathbf{C}^* \\ \mathbf{C} & \mathbf{0} \end{bmatrix}
  \begin{bmatrix} \mathbf{x} \\ \mathbf{z} \end{bmatrix} =
  \begin{bmatrix} 2\mathbf{A}^*\mathbf{b} \\ \mathbf{d} \end{bmatrix};$$
i.e., the matrix on the left is invertible. This occurs when the matrix $\mathbf{C}$ has independent rows, and the matrix $\begin{bmatrix} \mathbf{A}\\ \mathbf{C}\end{bmatrix}$ has indepedent columns.

In our case, $\mathbf{A} = \mathbf{\tilde{W}}^{1/2}$, $\mathbf{b} = \mathbf{0}$, $\mathbf{C} = \Phi$, and $\mathbf{d} = \mathbf{y}$; therefore, $$min_{x} \Vert \mathbf{\tilde{W}}^{1/2} \mathbf{x} - \mathbf{b} \Vert_2 \text{ s.t. } \Phi \mathbf{x} = \mathbf{y}$$ can be solved as $$\begin{bmatrix} 2 \mathbf{\tilde{W}} & \Phi^*\\ \Phi & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ \mathbf{z} \end{bmatrix} = \begin{bmatrix} \mathbf{0} \\ \mathbf{y} \end{bmatrix}.$$ 

### Technical details

In [36]:
function vanilla_MatrixIRLS_for_PCA(
        Xᴳᵀ::AbstractArray,                     # ground truth for MSE evaluation
        y::AbstractArray,                       # under-sampled data
        Φ::AbstractArray;                       # sampling matrix
        img_size::NTuple = size(Xᴳᵀ),           # size of output matrix
        r̃::Int = 0,                             # rank estimate of solution
        maxIter::Union{Int, Nothing} = nothing, # number of CG iteration steps
        N::Int = 1000,                          # number of iterations
        verbose::Bool = false)                  # print rank and loss value in each iteration
    
    # Initialize variables
    dType = eltype(y)
    d₁, d₂ = img_size
    r̃ == 0 && (r̃ = rank(Xᴳᵀ))
    maxIter = maxIter isa Nothing ? r̃*(r̃+d₁+d₂) : maxIter
    ϵₖ = Inf
    Xᵏ = reshape(Φ' * y, d₁, d₂) # Initial guess: fill missing values with zeros
    σ, k = 0, 0                  # I just want them to be available outside of the loop
    same_ϵ_count = 0             # Stop criterion: ϵ doesn't change over 10 iterations
    
    verbose && (table = DebugTableModule.DebugTable(
        ("k", () -> k, 3), ("rank(Xᵏ)", () -> rank(Xᵏ, atol=1e-3), 3),
        ("‖Xᴳᵀ - Xᵏ‖₂", () -> opnorm(Xᴳᵀ - Xᵏ, 2), 3), ("σ₁", () -> σ[1]),
        ("σᵣ₊₁", () -> σ[r̃+1]), ("ϵₖ", () -> ϵₖ)))
    
    while k <= N && same_ϵ_count < 10
        
        # Find leading rₖ left/right singular vectors of Xᵏ and calculate all singular values
        F = svd(Xᵏ)
        Uᵏ, σ, Vᵏ = F.U, F.S, F.V
        
        # Print some info
        verbose && printRow(table)
        
        # Step 2.
        same_ϵ_count = ϵₖ < σ[r̃+1] ? same_ϵ_count + 1 : 0
        ϵₖ = min(ϵₖ, σ[r̃+1])
        
        # Step 3.
        Hᵏ = [1 / (max(σ[i], ϵₖ) * max(σ[j], ϵₖ))  for i in 1:d₁, j in 1:d₂]
        W̃ᵏ = kron(conj(Vᵏ), Uᵏ) * Diagonal(vec(Hᵏ)) * kron(transpose(Vᵏ), Uᵏ')
        
        # Step 1.
        A = PseudoBlockArray{dType}(undef, [size(W̃ᵏ,1), size(Φ, 1)], [size(W̃ᵏ,1), size(Φ, 1)])
            A[Block(1,1)] = 2W̃ᵏ
            A[Block(2,1)] = Φ
            A[Block(1,2)] = Φ'
            A[Block(2,2)] .= 0
        b = PseudoBlockArray{dType}(undef, [size(W̃ᵏ,1), size(Φ, 1)])
            b[Block(1)] .= 0
            b[Block(2)] = vec(y)
        xz = PseudoBlockArray{dType}(undef, [size(W̃ᵏ,1), size(Φ, 1)])
            xz[Block(1)] = vec(Xᵏ)
            xz[Block(2)] .= 0
        #cg!(xz, A, b, tol=1e-14, maxiter = maxIter)
        xz .= Array(A) \ Array(b) # it is faster and more accurate than conjugate gradient
        Xᵏ = reshape(xz[Block(1)], d₁, d₂)
        
        k += 1
    end
    
    # Print some info
    verbose && printRow(table, last = true)
    
    return Xᵏ
end

vanilla_MatrixIRLS_for_PCA (generic function with 1 method)

# Numerical Experiments

In [37]:
Random.seed!(0);

## 1. Easy Problem in Real Domain

### General parameters

In [38]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = Float64    # Type of matrix elements
ρ = 1.5;           # 1 -> sampling at theoretical minimum

### Generate Model

#### Sampling Mask ($\Phi$)

_**Requirement towards the sampling mask:** It must have at least $r$ non-zero entries in each row and each column._

In [39]:
df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = HadamardProd_to_MatrixMult(Φᴹ)
@show r
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=1))))
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=2))))

r = 7
minimum number of non-zero entries in each column: 14
minimum number of non-zero entries in each column: 12


### Generate Data

Create a random rank-$r$ matrix $L_0 \in \mathbb{C}^{d_1 \times d_2}$ such that $L_0 = U_0 V_0^*$, where $U_0 \in \mathbb{C}^{d_1 \times r}$ and $V_0 \in \mathbb{C}^{d_2 \times r}$, and then sub-sample this low-rank matrix.

In [40]:
L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
@show size(L₀)
@show rank(L₀)

y = Φ * vec(L₀)
@show size(y);

size(L₀) = (50, 50)
rank(L₀) = 7
size(y) = (976,)


### Running The Reconstruction

In [41]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      46.502 │   30.111 │   16.838 │      Inf │
│   1 │       50 │      38.650 │   41.684 │   11.616 │   16.838 │
│   2 │       50 │      32.409 │   54.038 │    8.091 │   11.616 │
│   3 │       50 │      25.793 │   61.424 │    5.352 │    8.091 │
│   4 │       50 │      19.867 │   64.471 │    3.085 │    5.352 │
│   5 │       50 │      14.475 │   66.417 │    2.079 │    3.085 │
│   6 │       50 │       9.959 │   67.864 │    1.386 │    2.079 │
│   7 │       50 │       6.408 │   68.811 │    0.772 │    1.386 │
│   8 │       50 │       3.502 │   69.354 │    0.350 │    0.772 │
│   9 │       49 │       1.377 │   69.619 │    0.126 │    0.350 │
│  10 │       43 │       0.278 │   69.704 │    0.023 │    0.126 │
│  11 │        8 │       0.013 │   69.716 │ 1.01e-03 │    0.023 │
│  12 │   

## 2. Easy Problem in Complex Domain

In [42]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = ComplexF64 # Type of matrix elements
ρ = 1.5            # 1 -> sampling at theoretical minimum

df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φ = HadamardProd_to_MatrixMult(generateΦ(d₁, d₂, r, m))

L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
y = Φ * vec(L₀);

### Running The Reconstruction

In [43]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      46.188 │   32.810 │   18.019 │      Inf │
│   1 │       50 │      37.158 │   47.708 │   12.549 │   18.019 │
│   2 │       50 │      30.921 │   60.684 │    7.951 │   12.549 │
│   3 │       50 │      23.812 │   66.908 │    4.668 │    7.951 │
│   4 │       50 │      16.888 │   69.502 │    2.655 │    4.668 │
│   5 │       50 │      10.643 │   70.679 │    1.353 │    2.655 │
│   6 │       50 │       5.385 │   71.452 │    0.587 │    1.353 │
│   7 │       50 │       1.832 │   72.024 │    0.195 │    0.587 │
│   8 │       48 │       0.296 │   72.291 │    0.033 │    0.195 │
│   9 │        8 │       0.010 │   72.341 │ 1.18e-03 │    0.033 │
│  10 │        7 │    1.42e-05 │   72.342 │ 1.61e-06 │ 1.18e-03 │
│  11 │        7 │    3.01e-11 │   72.342 │ 3.19e-12 │ 1.61e-06 │
│  12 │   

## 3. Difficult Problem in Real Domain

In [44]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = Float64    # Type of matrix elements
ρ = 1.05           # 1 -> sampling at theoretical minimum

df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φ = HadamardProd_to_MatrixMult(generateΦ(d₁, d₂, r, m))

L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
y = Φ * vec(L₀);

### Running The Reconstruction

In [45]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      51.489 │   25.825 │   15.854 │      Inf │
│   1 │       50 │      47.972 │   34.523 │   12.902 │   15.854 │
│   2 │       50 │      44.630 │   45.709 │   10.284 │   12.902 │
│   3 │       50 │      42.897 │   56.132 │    7.809 │   10.284 │
│   4 │       50 │      41.301 │   62.999 │    5.729 │    7.809 │
│   5 │       50 │      39.890 │   66.197 │    4.732 │    5.729 │
│   6 │       50 │      39.043 │   67.341 │    4.172 │    4.732 │
│   7 │       50 │      38.358 │   67.938 │    3.663 │    4.172 │
│   8 │       50 │      37.685 │   68.349 │    3.123 │    3.663 │
│   9 │       50 │      36.998 │   68.545 │    2.688 │    3.123 │
│  10 │       50 │      36.314 │   68.482 │    2.503 │    2.688 │
│  11 │       50 │      35.653 │   68.254 │    2.306 │    2.503 │
│  12 │   

## 4. Difficult Problem in Complex Domain

In [46]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = ComplexF64 # Type of matrix elements
ρ = 1.05           # 1 -> sampling at theoretical minimum

df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φ = HadamardProd_to_MatrixMult(generateΦ(d₁, d₂, r, m))

L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
y = Φ * vec(L₀);

### Running The Reconstruction

In [47]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      55.976 │   23.837 │   13.988 │      Inf │
│   1 │       50 │      48.466 │   34.466 │   11.432 │   13.988 │
│   2 │       50 │      40.172 │   47.620 │    8.923 │   11.432 │
│   3 │       50 │      37.308 │   57.347 │    6.451 │    8.923 │
│   4 │       50 │      36.682 │   62.492 │    4.793 │    6.451 │
│   5 │       50 │      35.665 │   65.079 │    3.807 │    4.793 │
│   6 │       50 │      34.095 │   66.537 │    3.018 │    3.807 │
│   7 │       50 │      32.217 │   67.601 │    2.362 │    3.018 │
│   8 │       50 │      30.352 │   68.511 │    1.900 │    2.362 │
│   9 │       50 │      28.761 │   69.299 │    1.611 │    1.900 │
│  10 │       50 │      27.551 │   70.010 │    1.377 │    1.611 │
│  11 │       50 │      26.637 │   70.692 │    1.167 │    1.377 │
│  12 │   