In [1]:
using LinearAlgebra, NBInclude, IterativeSolvers, FunctionOperators, Printf, Plots, LaTeXStrings,
    BlockArrays, Crayons

In [2]:
@nbinclude("helper_functions.ipynb")

In [3]:
Random.seed!(123);

### Vanilla MatrixIRLS for matrix completion (PCA) with p = 0

https://mediatum.ub.tum.de/doc/1521436/1521436.pdf

_**Note:** Vanilla = Conjugate gradient step (2.90) is solved directly without reducing the problem to a lower-dimensional projection space via the Sherman-Morrison-Woodbury form._

#### Solution for the minimization problem in the conjugate gradient step

Source of description below: [Linearly Constrained Least Squares (LLS)](https://lls.readthedocs.io/en/latest/math.html)

LLS solves **linearly constrained least squares** (or LCLS) problems, which have the form:
$minimize \Vert Ax - b \Vert_2^2 \text{ subject to } Cx = d,$
where the unknown variable $x$ is a vector of size $n$. The values for $A$, $b$, $C$, and $d$ are given and have sizes $m\times n$, $m$, $p\times n$, and $p$, respectively. LLS finds a value for $x$ that satisfies the linear equality constraints $Cx = d$ and minimizes the objective, the sum of the squares of the entries of $Ax - b$.

There is a unique solution to the LCLS problem if and only if there is a unique solution to the following system of linear equations in the variable $x$ and a new variable $z$:
$$\begin{bmatrix} 2A^TA & C^T \\ C & 0 \end{bmatrix}
  \begin{bmatrix} x \\ z \end{bmatrix} =
  \begin{bmatrix} 2A^Tb \\ d \end{bmatrix};$$
i.e., the matrix on the left is invertible. This occurs when the matrix $C$ has independent rows, and the matrix $\begin{bmatrix} A\\ C\end{bmatrix}$ has indepedent columns.

In our case, $A = W^{1/2}$, $b = 0$, $C = \Phi$, and $d = y$; therefore,
$$min_x \Vert W^{1/2} x - b \Vert_2 \text{ s.t. } \Phi x = y$$ can be solved as $$\begin{bmatrix} 2 W & \Phi^*\\ \Phi & 0 \end{bmatrix} \begin{bmatrix} x \\ z \end{bmatrix} = \begin{bmatrix} 0 \\ y \end{bmatrix}.$$ This system has a unique solution when the matrix $\Phi$ has independent rows (this is always true for sampling matrices), and the matrix $\begin{bmatrix} W^{1/2} \\ \Phi \end{bmatrix}$ has indepedent columns (not true in general).

In [4]:
function vanilla_MatrixIRLS_for_PCA(
        Xᴳᵀ::AbstractArray,                     # ground truth for MSE evaluation
        y::AbstractArray,                       # under-sampled data
        Φ::AbstractArray;                       # sampling matrix
        img_size::NTuple = size(Xᴳᵀ),           # size of output matrix
        r̃::Int = 0,                             # rank estimate of solution
        maxIter::Union{Int, Nothing} = nothing, # number of CG iteration steps
        N::Int = 10,                            # number of iterations
        verbose::Bool = false)                  # print rank and loss value in each iteration
    
    # Initialize variables
    dType = eltype(y)
    d₁, d₂ = img_size
    r̃ == 0 && (r̃ = rank(Xᴳᵀ))
    maxIter = maxIter isa Nothing ? d₁*d₂+size(y,1) : maxIter #r̃*(r̃+d₁+d₂)
    ϵᵏ = Inf
    Xᵏ = reshape(Φ' * y, d₁, d₂)
    σ, iter = 0, 2 # I just want to make them available outside of the loop
    
    table = DebugTableModule.DebugTable(
        ("k", nothing, 3), ("rank(Xᵏ)", () -> rank(Xᵏ), 3),
        ("‖Xᴳᵀ - Xᵏ‖₂", () -> opnorm(Xᴳᵀ - Xᵏ, 2), 3), ("σ₁", () -> σ[1]),
        ("σᵣ₊₁", () -> σ[r̃+1]), ("ϵᵏ", () -> ϵᵏ))
    
    for k in 1:N
"""
    2. Find best rank-(r̃ + 1) approximation of Xᵏ to obtain
        𝒯ᵣ(Xᵏ) = Uᵏ * diag(σᵢᵏ)ᵢ₌₁ʳ * Vᵏ' and σᵣ₊₁ᵏ 
"""
        F = svd(Xᵏ)
        Uᵏ, σ, Vᵏ = F.U[:, 1:r̃+1], F.S, F.V[:, 1:r̃+1]
        
"""     update smoothing:                                 (2.91) """
        ϵᵏ = min(ϵᵏ, σ[r̃+1])
        
"""
    3. Update Wᵏ as in (2.57), using parameters ϵ = ϵᵏ and p in (2.58) and (2.59), and the
        information Uᵏ , Vᵏ and σ₁ᵏ, ..., σᵣ₊₁ᵏ from item 2.

        (Lines below are based on Remark 2.3.2, the special case for p = 0)
"""
        Hᵏ = [1 / (max(σ[i], ϵᵏ) * max(σ[j], ϵᵏ))  for i in 1:r̃+1, j in 1:r̃+1]
        # Wᵏ = FunctionOperator{dType}(name = "Wᵏ", inDims = (d₁, d₂), outDims = (d₁, d₂),
        #    forw = Z -> Uᵏ * (Hᵏ .* (Uᵏ' * Z * Vᵏ)) * Vᵏ')
        Dᵏ = Diagonal(vec(Hᵏ))
        Wᵏ = kron(Vᵏ, Uᵏ) * Dᵏ * kron(Vᵏ, Uᵏ)'
        
"""
    1. Use a conjugate gradient method to solve linearly constrained quadratic program
         Xᵏ = arg minₓ ⟨X,Wᵏ⁻¹(X)⟩ s.t. Φ(X) = y         (2.90)
"""
        if k == 1
            checkingMatrix = PseudoBlockArray{dType}(undef,
                [size(Wᵏ,1), size(Φ, 1)], [size(Wᵏ,1)])
            checkingMatrix[Block(1,1)] = Wᵏ
            checkingMatrix[Block(2,1)] = Φ
            if rank(Array(checkingMatrix)) != size(Wᵏ,2) # We require full column-rank
                println(crayon"#ff0000",
                    "Warning! The [Wᵏ; Φ] matrix doesn't have full column-rank " * 
                    "(rank = $(rank(Array(checkingMatrix))) instead of $(size(Wᵏ,2)))!",
                    crayon"#000000")
            end
        end
        
        # Print some info
        verbose && DebugTableModule.printRow(table, "k" => k-1)
        
        A = PseudoBlockArray{dType}(undef, [size(Wᵏ,1), size(Φ, 1)], [size(Wᵏ,1), size(Φ, 1)])
            A[Block(1,1)] = 2Wᵏ
            A[Block(2,1)] = Φ
            A[Block(1,2)] = Φ'
            A[Block(2,2)] .= 0
        b = PseudoBlockArray{dType}(undef, [size(Wᵏ,1), size(Φ, 1)])
            b[Block(1)] .= 0
            b[Block(2)] = vec(y)
        xz = PseudoBlockArray{dType}(undef, [size(Wᵏ,1), size(Φ, 1)])
            xz[Block(1)] = vec(Xᵏ)
            xz[Block(2)] .= 0
        cg!(xz, A, b, maxiter = maxIter)
        Xᵏ = reshape(xz[Block(1)], d₁, d₂)
        #println(crayon"#00aa00", "\tmax(z) = ",
        #    @sprintf("%.5E", maximum(abs, xz[Block(2)])), crayon"#000000")
    end
    
    # Print some info
    iter = N
    verbose && DebugTableModule.printRow(table, "k" => N, last = true)
    
    Xᵏ
end

vanilla_MatrixIRLS_for_PCA (generic function with 1 method)

# Numerical Experiments

### General parameters

In [5]:
# Matrix dimensions
d₁, d₂ = 20, 20
n = min(d₁, d₂)
# Rank and number of non-zero elements in sparse component
r, k = 5, 0
# Type of matrix elements
dType = ComplexF64;

## Understanding and Enhancing Data Recovery Algorithms

*From Noise-Blind Sparse Recovery to Reweighted Methods for Low-Rank Matrix Optimization*

*by Christian Kümmerle*

https://mediatum.ub.tum.de/doc/1521436/1521436.pdf

### Generate Data

#### Gaussian Low Rank Matrix
Corresponding Matlab function: https://github.com/ckuemmerle/hm_irls/blob/master/sample_X0_lowrank.m

In [6]:
L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
@show size(L₀)
@show rank(L₀);

size(L₀) = (20, 20)
rank(L₀) = 5


#### Sampling Mask ($\Phi$)
Corresponding Matlab function: https://github.com/ckuemmerle/hm_irls/blob/master/sample_phi_MatrixCompletion.m

_**Note:** There is a difference in the way how the Christian's Matlab function and my Julia function satisfies the requirement of having at least $r$ non-zero entries in each row and each column._

In [7]:
df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(1.05 * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = maskToMatrix(Φᴹ)
#Φ = FunctionOperator{dType}(name = "Φ", inDims = (d₁, d₂), outDims = (d₁, d₂),
#    forw = (b,x) -> b .= Φᴹ .* x, backw = (b,x) -> b .= x)
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=1))))
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=2))))

minimum number of non-zero entries in each column: 7
minimum number of non-zero entries in each column: 5


#### Subsampling The Ground Truth Matrix

In [8]:
y = Φ * vec(L₀)
@show size(y)
#@show rank(y);

size(y) = (183,)


### Running The Reconstruction

In [9]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, N = 100, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵᵏ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       20 │      43.079 │   33.641 │   13.223 │   13.223 │
│   1 │       20 │      80.429 │   69.245 │   38.241 │   13.223 │
│   2 │       20 │      66.892 │   60.785 │   28.204 │   13.223 │
│   3 │       20 │      57.237 │   52.834 │   25.777 │   13.223 │
│   4 │       20 │      66.310 │   52.125 │   26.378 │   13.223 │
│   5 │       20 │      58.920 │   49.377 │   23.177 │   13.223 │
│   6 │       20 │      58.559 │   48.025 │   21.423 │   13.223 │
│   7 │       20 │      60.558 │   48.611 │   24.072 │   13.223 │
│   8 │       20 │      57.530 │   44.743 │   20.550 │   13.223 │
│   9 │       20 │      55.568 │   47.369 │   20.904 │   13.223 │
│  10 │       20 │      55.161 │   46.687 │   20.407 │   13.223 │
│  11 │       20 │      52.541 │   49.443 │   20.209 │   13.223 │
│  12 │   

# Robust Principal Component Analysis?
*by Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright*  
https://arxiv.org/pdf/0912.3599.pdf

#### 4.1 Exact recovery from varying fractions of error

We first verify the correct recovery phenomenon of Theorem 1.1 on randomly generated problems. We consider square matrices of varying dimension $n = 500, \ldots , 3000$. We generate a rank-$r$ matrix $L_0$ as a product $L_0 = XY^∗$ where $X$ and $Y$ are $n \times r$ matrices with entries independently sampled
from a $\mathcal{N}(0,1/n)$ distribution. $S_0$ is generated by choosing a support set $\Omega$ of size $k$ uniformly at random, and setting $S_0 = \mathcal{P}_\Omega E$, where $E$ is a matrix with independent Bernoulli $\pm 1$ entries. Table 1 (top) reports the results with $r = rank(L_0) = 0.05 \times n$ and $k = \Vert S_0 \Vert_0 = 0.05 \times n^2$. Table 1 (bottom) reports the results for a more challenging scenario, $rank(L_0) = 0.05 \times n$ and $k = 0.10 \times n^2$. In all cases, we set $\lambda = 1 \cdot \sqrt{n}$. Notice that in all cases, solving the convex PCP gives a result $(L, S)$ with the correct rank and sparsity. Moreover, the relative error $\frac{\Vert L - L_0 \Vert_F}{\Vert L_0 \Vert_F}$ is small, less than $10^{-5}$ in all examples considered.

<center><img src="table_1.png" /></center>

### Generate Data

_**Note:** In this notebook we deal only with PCA (simple matrix completion); therefore, there is no sparse component in the ground truth matrix._

In [13]:
L₀ = generateLowRankComponent_Candes(n, r, dType)
@show size(L₀)
@show rank(L₀);

size(L₀) = (20, 20)
rank(L₀) = 5


#### Sampling Mask ($\Phi$)

Using the earlier generated sampling mask

#### Subsampling The Ground Truth Matrix

In [16]:
y = Φ * vec(L₀)
@show size(y);

size(y) = (183,)


### Running The Reconstruction

In [17]:
@time vanilla_MatrixIRLS_for_PCA(L₀, y, Φ, N = 80, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵᵏ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       20 │       0.077 │    0.071 │    0.035 │    0.035 │
│   1 │       20 │       0.119 │    0.109 │    0.052 │    0.035 │
│   2 │       20 │       0.095 │    0.103 │    0.050 │    0.035 │
│   3 │       20 │       0.100 │    0.095 │    0.050 │    0.035 │
│   4 │       20 │       0.101 │    0.092 │    0.048 │    0.035 │
│   5 │       20 │       0.102 │    0.094 │    0.047 │    0.035 │
│   6 │       20 │       0.093 │    0.098 │    0.046 │    0.035 │
│   7 │       20 │       0.103 │    0.097 │    0.051 │    0.035 │
│   8 │       20 │       0.099 │    0.095 │    0.047 │    0.035 │
│   9 │       20 │       0.100 │    0.101 │    0.047 │    0.035 │
│  10 │       20 │       0.100 │    0.102 │    0.048 │    0.035 │
│  11 │       20 │       0.100 │    0.100 │    0.049 │    0.035 │
│  12 │   