In [1]:
using LinearAlgebra, NBInclude, IterativeSolvers, FunctionOperators, BenchmarkTools

In [2]:
include("inplace_svd.jl")
@nbinclude("helper_functions.ipynb")
using .DebugTableModule # located in helper_functions.ipynb

## Performant MatrixIRLS for matrix completion

_**Note:** Performant = It uses the same optimizations as the fancy version + optimize the implementation by inplace operations._

### Sources
 - Preprint paper by Christian Kümmerle & Claudio Verdun: https://arxiv.org/pdf/0912.3599.pdf
 - GitHub repo of the preprint paper: https://github.com/ckuemmerle/MatrixIRLS

### Algorithm
 - **Input:** Sampling operator $\Phi$, observations $\mathbf{y} \in \mathbb{C}^m$, rank estimate $\tilde{r}$, iteration number $N$.
 - Initialize $k=0, \epsilon_0 = \infty, W^{(0)} = Id.$
 - **for $k=1$ to $N$ do**
    1. **Solve weighted least squares:** Use a *conjugate gradient method* to solve $$\mathbf{X}^{(k)} = argmin \langle \mathbf{X}, W^{(k-1)}(\mathbf{X}) \rangle \text{ subject to } \Phi(\mathbf{X}) = \mathbf{y}.$$
    2. **Update smoothing:** Compute $\tilde{r}+1$-th singular value of $\mathbf{X}^{(k)}$ to update $$\epsilon_k = min\left(\epsilon_{k-1}, \sigma_{\tilde{r}+1}(\mathbf{X}^{(k)})\right).$$
    3. **Update weight operator:** For $r_k := \left\vert\{i \in [d] : \sigma_i(\mathbf{X}^{(k)}) > \epsilon_k\}\right\vert$, compute the first $r_k$ singular values $\sigma_i^{(k)} := \sigma_i^{(k)}(\mathbf{X}^{(k)})$ and matrices $\mathbf{U}^{(k)} \in \mathbb{C}^{d_1 \times r_k}$ and $\mathbf{V}^{(k)} \in \mathbb{C}^{d_2 \times r_k}$ with leading $r_k$ left/right singular vectors of $\mathbf{X}^{(k)}$ to update $W^{(k)}$: $$W^{(k)}(\mathbf{Z}) = \mathbf{U}^{(k)} \left[ \mathbf{H}_k \circ (\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})\right]\mathbf{V}^{(k)*},$$ where $\circ$ denotes the entrywise product of two matrices, and $\mathbf{H}_k \in \mathbb{C}^{d_1 \times d_2}$ matrix defined as $$(\mathbf{H}_k)_{ij} := \left(\max(\sigma_i^{(k)}, \epsilon^{(k)}\max(\sigma_j^{(k)}, \epsilon^{(k)}\right)^{-1} : \forall i \in [d_1] \text{ and } \forall j \in [d_2].$$
 - **end**
 - **Output**: $\mathbf{X}^{(k)}$

### Technical details

#### Optimization of the computation of the weighted least squares
In order to reduce the computational complexity, the weighted least squares step in the algorithm above is computed in a lower dimensional space.
 - For a given rank rk, we recall that the best rank-$r_k$ approximation of a matrix $X(k)$ can be written such that $$\mathcal{T}_{r_k}(\mathbf{X}^{(k)}) := \underset{\mathbf{Z}:rank(\mathbf{Z}) < r_k}{\arg\min} \Vert\mathbf{Z} - \mathbf{Z}^{(k)} \Vert = \mathbf{U}^{(k)} \boldsymbol{\Sigma}^{(k)} \mathbf{V}^{(k)*},$$ where $\Vert \cdot \Vert$ can be any unitary invarian norm.
 - Let now $$T_k := T_{\mathcal{T}_{r_k}(\mathbf{X}^{(k)})}\mathcal{M}_{r_k} :=  \left\{\mathbf{U}^{(k)} \Gamma_1 \mathbf{V}^{(k)*} + \mathbf{U}^{(k)} \Gamma_2 (\mathbf{I} - \mathbf{V}^{(k)} \mathbf{V}^{(k)*}) + (\mathbf{I} - \mathbf{U}^{(k)} \mathbf{U}^{(k)*}) \Gamma_3 \mathbf{V}^{(k)*} : \\ \Gamma_1 \in \mathbb{C}^{r_k \times r_k}, \Gamma_2 \in \mathbb{C}^{r_k \times d_2}, \Gamma_2 \mathbf{V}^{(k)} = \mathbf{0}, \Gamma_3 \in \mathbb{C}^{d_1 \times r_k}, \mathbf{U}^{(k)*}\Gamma_3 = \mathbf{0}\right\}$$ be the tangent space of the manifold of rank-$r_k$ matrices $\mathcal{M}_{r_k}$ of dimension $(d_1 \times d_2)$ at $\mathcal{T}_{r_k}(\mathbf{X}^{(k)})$.
 - For practical considerations we need to introduce vector space $$S_k := \left\{ \gamma = (\gamma_1^T, \gamma_2^T, \gamma_3^T)^T \in \mathbb{C}^{d_1 + d_2 + r_k} : \Gamma_1 = (\gamma_1)_{mat} \in \mathbb{C}^{r_k \times r_k}, \Gamma_2 = (\gamma_2)_{mat} \in \mathbb{C}^{r_k \times d_2}, \Gamma_2 \mathbf{V}^{(k)} = \mathbf{0}, \Gamma_3 = (\gamma_3)_{mat} \in \mathbb{C}^{d_1 \times r_k}, \mathbf{U}^{(k)*}\Gamma_3 = \mathbf{0}\right\} \subset \mathbb{C}^{r_k(d_1 + d_2 + r_k)},$$ where $mat$ is the matricization operator of appropriate dimension that stacks column after column according to the desired dimensions.
 - We can now identify a structure in $W^{(k)}$ that enables us to write it more compactly: Let $P_{T_k}: S_k \rightarrow T_k$ be the parametrization operator such that $$P_{T_k}(\gamma) := \mathbf{U}^{(k)} \Gamma_1 \mathbf{V}^{(k)*} + \mathbf{U}^{(k)} \Gamma_2 (\mathbf{I} - \mathbf{V}^{(k)} \mathbf{V}^{(k)*}) + (\mathbf{I} - \mathbf{U}^{(k)} \mathbf{U}^{(k)*}) \Gamma_3 \mathbf{V}^{(k)*} : \gamma \in S_k$$.
 - As we know that $\Gamma_2 \mathbf{V}^{(k)} = \mathbf{0}$, and $\mathbf{U}^{(k)*}\Gamma_3 = \mathbf{0}$, we can simplify the parametrization operator: $$P_{T_k}(\gamma) = \mathbf{U}^{(k)} \Gamma_1 \mathbf{V}^{(k)*} + \mathbf{U}^{(k)} \Gamma_2 + \Gamma_3 \mathbf{V}^{(k)*}$$
 - Its adjoint operator $$P_{T_k}^*(\mathbf{Z}) = \left((\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)})_{vec}^T, (\mathbf{U}^{(k)*} Z (\mathbf{I} - \mathbf{V}^{(k)} \mathbf{V}^{(k)*}))_{vec}^T, ((\mathbf{I} - \mathbf{U}^{(k)} \mathbf{U}^{(k)*}) \mathbf{Z} \mathbf{V}^{(k)*})_{vec}^T\right)^T : \mathbf{Z} \in \mathbb{C}^{d_1 \times d_2}$$. 

 - Let us divide matrix $\mathbf{H}^{(k)}$ into four blocks $$\mathbf{H}^{(k)} = \begin{bmatrix}\mathbf{H}^{(k)}_{1,1} & \mathbf{H}^{(k)}_{1,2} \\ \mathbf{H}^{(k)}_{2,1} & \epsilon_k^{-2} \mathbf{1}\end{bmatrix},$$ such that $\mathbf{H}^{(k)}_{1,1} \in \mathbb{C}^{r_k \times r_k}$ with $$(\mathbf{H}^{(k)}_{1,1})_{i,j} = \left(\sigma_i \sigma_j \right)^{-1},$$ and define diagonal matrix $\mathbf{D}^{(k)} \in \mathbb{C}^{r_k \times r_k}$ as $$\mathbf{D}_{i,i}^{(k)} := \left(\sigma_i^{(k)} \epsilon_k \right)^{-1}.$$
 - As all columns of $\mathbf{H}^{(k)}_{1,2}$ / all rows of $\mathbf{H}^{(k)}_{2,1}$ has the same values as the diagonal of $\mathbf{D}^{(k)}$, we can replace the element-wise multiplication by block $\mathbf{H}^{(k)}_{1,2}$ / $\mathbf{H}^{(k)}_{2,1}$ with left / right matrix multiplication by $\mathbf{D}^{(k)}$: $$\begin{equation*}
\begin{split}
W^{(k)}(\mathbf{Z}) &= \mathbf{U}_k \left[\mathbf{H}_k \circ (\mathbf{U}_k^{*} \mathbf{Z} \mathbf{V}_k)\right] \mathbf{V}_k^{*} \\
&=\begin{bmatrix} 
    \mathbf{U}^{(k)} & \mathbf{U}_{\perp}^{(k)}
\end{bmatrix}
\left(\mathbf{H}_k
\circ
\begin{bmatrix}
\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)} &  \mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}_{\perp}^{(k)} \\
\mathbf{U}_{\perp}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)} & \mathbf{U}_{\perp}^{(k)*} \mathbf{Z} \mathbf{V}_{\perp}^{(k)} 
\end{bmatrix}
\right)
\begin{bmatrix} 
    \mathbf{V}^{(k)*} \\ \mathbf{V}_{\perp}^{(k)*}
\end{bmatrix} \\
&= \begin{bmatrix} 
    \mathbf{U}^{(k)} & \mathbf{U}_{\perp}^{(k)}
\end{bmatrix}
\left(
\begin{bmatrix}
\mathbf{H}^{(k)} &  \mathbf{H}_{1,2}^{(k)} \\
\mathbf{H}_{2,1}^{(k)} & \epsilon_k^{-2} \mathbf{1}
\end{bmatrix} 
\circ 
\begin{bmatrix}
\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)} &  \mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}_{\perp}^{(k)} \\
\mathbf{U}_{\perp}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)} & \mathbf{U}_{\perp}^{(k)*} \mathbf{Z} \mathbf{V}_{\perp}^{(k)} 
\end{bmatrix}
\right)
\begin{bmatrix} 
    \mathbf{V}^{(k)*} \\ \mathbf{V}_{\perp}^{(k)*}
\end{bmatrix}  \\
&=
\begin{bmatrix} 
    \mathbf{U}^{(k)} & \mathbf{U}_{\perp}^{(k)}
\end{bmatrix}
\begin{bmatrix}
\mathbf{H}^{(k)} \circ \left(\mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)}\right) &  \mathbf{D}^{(k)} \mathbf{U}^{(k)*} \mathbf{Z} \mathbf{V}_{\perp}^{(k)} \\
\mathbf{U}_{\perp}^{(k)*} \mathbf{Z} \mathbf{V}^{(k)}\mathbf{D}^{(k)}  & \epsilon_k^{-2} \mathbf{U}_{\perp}^{(k)*} \mathbf{Z} \mathbf{V}_{\perp}^{(k)} 
\end{bmatrix}
\begin{bmatrix} 
    \mathbf{V}^{(k)*} \\ \mathbf{V}_{\perp}^{(k)*}
\end{bmatrix},
\end{split}
\end{equation*}$$
 - Finally, we define $\mathbf{D}_{S_k} \in \mathbb{C}^{r_k(d_1 + d_2 + r_k) \times r_k(d_1 + d_2 + r_k)}$ as a diagonal matrix with diagonal entries that are equal to entries of $\mathbf{H}^{(k)}_{1,1}$ or $\mathbf{D}^{(k)}$: $$\mathbf{D}_{S_k} = \begin{bmatrix} diag\left((\mathbf{H}^{(k)}_{1,1})_{vec}\right) & & 0 \\ & \mathbf{D}^{(k)} \otimes \mathbf{I}_{(d_1 \times d_1)} & \\ 0 & & \mathbf{I}_{(d_2 \times d_2)} \otimes \mathbf{D}^{(k)}\end{bmatrix},$$ where $diag$ transforms a vector to a diagonal matrix, $\otimes$ denotes the Kronecker-product, and $\mathbf{I}_{(d_1 \times d_1)}, \mathbf{I}_{(d_2 \times d_2)}$ are identity matrices of size $(d_1 \times d_1)$ and $(d_2 \times d_2)$.
   - It's easy to see that $diag\left((\mathbf{H}^{(k)}_{1,1})_{vec}\right) (\mathbf{M})_{vec} = \left(\mathbf{H}^{(k)}_{1,1} \circ \mathbf{M}\right)_{vec}$.
   - Kronecker-product is needed to transform diagonal-matrix&ndash;matrix multiplication to diagonal-matrix&ndash;vectorized-matrix multiplication. E.g. $\mathbf{AB} = (\mathbf{I} \otimes \mathbf{A})\mathbf{B}$ when $\mathbf{A}$ is a diagonal matrix, and $\mathbf{AB} = (\mathbf{B}^* \otimes \mathbf{I})\mathbf{A}$ when $\mathbf{B}$ is a diagonal matrix.

 - Using the definitions above, we can re-formulate $\mathbf{W}^{(k)}$ as $$\mathbf{W}^{(k)} = P_{T_k} \left(\mathbf{D}_{S_k} - \epsilon_k^{-2} \mathbf{I}_{S_k}\right)P_{T_k}^* + \epsilon_k^{-2} \mathbf{I},$$ and we can summarize the optimized implementation of the conjugate gradient step for matrix completion:
   1. Calculate $P^*_{T_k} \Phi^*(\mathbf{y}) \in S_k$
   2. Solve $\left(\frac{\epsilon_k^2 \mathbf{I}_{S_k}}{\mathbf{D}_{S_k}^{-1} - \epsilon_k^2 \mathbf{I}_{S_k}} + P^*_{T_k} \Phi^* \Phi P_{T_k}\right)\gamma_k = P^*_{T_k} \Phi^*(\mathbf{y})$ for $\gamma_k \in S_k$ by conjugate gradient method.
   3. Calculate residual $\mathbf{r}_k := y - \Phi P_{T_k} \gamma_k \in \mathbb{C}^m$
   4. Calculate $\tilde{\gamma}_k = \left(\frac{\mathbf{D}_{S_k}^{-1}}{\mathbf{D}_{S_k}^{-1} - \epsilon_k^2 \mathbf{I}_{S_k}}\right)\gamma_k - P^*_{T_k} \Phi^*(r_k) \in S_k$.
   5. Obtain an implicit representation of the new iterate $\mathbf{X}^{(k+1)} \in \mathbb{C}^{d_1 \times d_2}$ such that $\mathbf{X}^{(k+1)} = \Phi^*(\mathbf{r}_k) + P_{T_k}(\tilde{\gamma}_k)$. 

In [3]:
function update_H!(H, σ)
    for ind in CartesianIndices(H)
        i, j = ind[1], ind[2]
        H[ind] = 1 / (σ[i] * σ[j])
    end
end

function update_D!(D, σ, ϵᵏ)
    for j in eachindex(D)
        D[j] = 1 / (σ[j] * ϵᵏ)
    end
end

split(γ, r̃, d₁, d₂) = @views begin
    γ₁ = reshape(γ[1:r̃^2], r̃, r̃)
    γ₂ = reshape(γ[r̃^2+1:r̃*(r̃+d₂)], r̃, d₂)
    γ₃ = reshape(γ[r̃*(r̃+d₂)+1:r̃*(r̃+d₁+d₂)], d₁, r̃)
    γ₁, γ₂, γ₃
end

split (generic function with 1 method)

In [4]:
function get_P_operator(Uᵏ, Vᵏ, Vtᵏ, tempᵈ¹ˣᵈ²)
    
    r̃ = size(Uᵏ, 2)
    d₁, d₂ = size(tempᵈ¹ˣᵈ²)
    tempᵈ¹ˣʳ, tempʳˣᵈ² = similar(Uᵏ, d₁, r̃), similar(Uᵏ, r̃, d₂)
    
    I_VV, I_UU = similar(Uᵏ, d₂, d₂), similar(Uᵏ, d₁, d₁)
    Iᵈ¹ˣᵈ¹, Iᵈ²ˣᵈ² = Diagonal(ones(d₁)), Diagonal(ones(d₂))
    
    Pᵏ = FunctionOperator{dType}(name="Pᵏ", inDims = (r̃*(r̃+d₁+d₂),), outDims = (d₁, d₂),
        forw = (b,γ) -> begin
                γ₁, γ₂, γ₃ = split(γ, r̃, d₁, d₂)
                # (Uᵏ * γ₁ + γ₃) * Vᵏ' + Uᵏ * γ₂
                mul!(tempᵈ¹ˣʳ, Uᵏ, γ₁)
                tempᵈ¹ˣʳ .+= γ₃
                mul!(b, tempᵈ¹ˣʳ, Vtᵏ)
                mul!(tempᵈ¹ˣᵈ², Uᵏ, γ₂)
                b .+= tempᵈ¹ˣᵈ²
            end,
        backw = (γ,Φᵃy) -> begin
                γ₁, γ₂, γ₃ = split(γ, r̃, d₁, d₂)
                # γ₁ .= Uᵏ' * Φᵃy * Vᵏ
                mul!(tempᵈ¹ˣʳ, Φᵃy, Vᵏ)
                mul!(γ₁, Uᵏ', tempᵈ¹ˣʳ)
                # γ₃ .= (I - Uᵏ*Uᵏ') * Φᵃy * Vᵏ
                I_UU .= Iᵈ¹ˣᵈ¹ .- mul!(I_UU, Uᵏ, Uᵏ')
                mul!(γ₃, I_UU, tempᵈ¹ˣʳ)
                # γ₂ .= Uᵏ' * Φᵃy * (I - Vᵏ*Vᵏ')
                mul!(tempʳˣᵈ², Uᵏ', Φᵃy)
                I_VV .= Iᵈ²ˣᵈ² .- mul!(I_VV, Vᵏ, Vtᵏ)
                mul!(γ₂, tempʳˣᵈ², I_VV)
                #γ = vcat(vec(γ₁), vec(γ₂), vec(γ₃)) # we don't need this line as γ₁, γ₂, γ₃ are views
                γ
            end)
    
    Pᵏ
end

get_P_operator (generic function with 1 method)

In [5]:
function get_CG_operator_function(PᵃΦᵃΦP, Hᵏ₁₁, Dᵏ, tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾, tempʳˣʳ, d₁, d₂)
    r̃ = size(Hᵏ₁₁, 1)
    tempʳ = @view tempʳˣʳ[1:r̃]
    (Hᵏ₁₁, Dᵏ, ϵₖ) ->
        FunctionOperator{dType}(name = "CG_op", inDims = (r̃*(r̃+d₁+d₂),), outDims = (r̃*(r̃+d₁+d₂),),
            forw = (b, γ) ->  begin
                #=I⁽ᵈ¹ˣᵈ¹⁾, I⁽ᵈ²ˣᵈ²⁾ = Diagonal(ones(d₁)), Diagonal(ones(d₂))
                D_Sₖ = Diagonal( vcat( vec(Hᵏ₁₁), diag(kron(Diagonal(Dᵏ), I⁽ᵈ¹ˣᵈ¹⁾)), diag(kron(I⁽ᵈ²ˣᵈ²⁾, Diagonal(Dᵏ))) ) )
                D_Sₖ⁻¹ = I / D_Sₖ
                b .= (ϵₖ^2 * I / (D_Sₖ⁻¹ - ϵₖ^2 * I)) * γ + PᵃΦᵃΦP * γ=#
                # An efficient implementation for:
                # b .= (ϵₖ^2 * I / (D_Sₖ⁻¹ - ϵₖ^2 * I)) * γ + Pᵏ' * Φ' * Φ * Pᵏ * γ
                mul!(tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾, PᵃΦᵃΦP, γ)
                γ₁, γ₂, γ₃ = split(γ, r̃, d₁, d₂)
                b₁, b₂, b₃ = split(b, r̃, d₁, d₂)
                tempʳˣʳ .= ϵₖ^2 ./ (1 ./ Hᵏ₁₁ .- ϵₖ^2)
                b₁ .= tempʳˣʳ .* γ₁
                tempʳ .= ϵₖ^2 ./ (1 ./ Dᵏ .- ϵₖ^2)
                b₂ .= reshape(tempʳ, :, 1) .* γ₂ # broadcasting
                b₃ .= reshape(tempʳ, 1, :) .* γ₃ # broadcasting
                # b₁, b₂, b₃ are views, so they update b
                b .+= tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾
            end)
end

get_CG_operator_function (generic function with 1 method)

In [6]:
function get_map_to_γ̃ₖ_function(PᵃΦᵃ, Hᵏ₁₁, Dᵏ, tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾, tempʳˣʳ, d₁, d₂)
    r̃ = size(Hᵏ₁₁, 1)
    tempʳ = @view tempʳˣʳ[1:r̃]
    (γᵏ, rᵏ, ϵₖ) -> begin
        # An efficient implementation for:
        # γ̃ₖ = (D_Sₖ⁻¹ / (D_Sₖ⁻¹ - ϵₖ^2 * I)) * γᵏ - Pᵏ' * Φ' * rᵏ
        mul!(tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾, PᵃΦᵃ, rᵏ)
        γ₁, γ₂, γ₃ = split(γᵏ, r̃, d₁, d₂)
        tempʳˣʳ .= (1 ./ Hᵏ₁₁) ./ (1 ./ Hᵏ₁₁ .- ϵₖ^2)
        γ₁ .= tempʳˣʳ .* γ₁
        tempʳ .= (1 ./ Dᵏ) ./ (1 ./ Dᵏ .- ϵₖ^2)
        γ₂ .*= reshape(tempʳ, :, 1) # broadcasting
        γ₃ .*= reshape(tempʳ, 1, :) # broadcasting
        # γ₁, γ₂, γ₃ are views, so they update γᵏ
        γᵏ .-= tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾
        γᵏ
    end
end

get_map_to_γ̃ₖ_function (generic function with 1 method)

In [7]:
function performant_MatrixIRLS_for_PCA(
        Xᴳᵀ::AbstractArray,                     # ground truth for MSE evaluation
        y::AbstractArray,                       # under-sampled data
        Φ::FunctionOperator;                    # sampling operator
        img_size::NTuple = size(Xᴳᵀ),           # size of output matrix
        r̃::Int = 0,                             # rank estimate of solution
        maxIter::Union{Int, Nothing} = nothing, # number of CG iteration steps
        N::Int = 1000,                          # number of iterations
        verbose::Bool = false)                  # print rank and loss value in each iteration
    
    # Initialize variables
    dType = eltype(y)
    d₁, d₂ = img_size
    r̃ == 0 && (r̃ = rank(Xᴳᵀ))
    maxIter = maxIter isa Nothing ? r̃*(r̃+d₁+d₂) : maxIter
    ϵₖ = Inf
    Xᵏ = Φ' * y      # Initial guess: fill missing values with zeros
    σ, k = 0, 0      # I just want them to be available outside of the loop
    same_ϵ_count = 0 # Stop criterion: ϵ doesn't change over 10 iterations
    cg_tol = eps(real(dType))*4 # tolerance for the conjugate gradient solver (4 is experimental value)
    
    # Preallocate arrays
    F = svd(Xᵏ)
    Uᵏ, σ, Vᵏ, Vtᵏ = F.U[:, 1:r̃], F.S, F.V[:, 1:r̃], F.Vt[1:r̃, :]
    Hᵏ₁₁, Dᵏ = similar(y, real(dType), r̃, r̃), similar(y, real(dType), r̃)
    rᵏ, tempᵈ¹ˣᵈ², tempʳˣʳ = similar(y), similar(y, d₁, d₂), similar(Hᵏ₁₁)
    b, γᵏ, tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾ = [Vector{dType}(undef, r̃*(r̃+d₁+d₂)) for _ in 1:4]
    statevars = IterativeSolvers.CGStateVariables(similar(γᵏ), similar(γᵏ), similar(γᵏ))
    
    # Create operators
    Pᵏ = get_P_operator(Uᵏ, Vᵏ, Vtᵏ, tempᵈ¹ˣᵈ²)
    PᵃΦᵃΦP, ΦP, PᵃΦᵃ = Pᵏ' * Φ' * Φ * Pᵏ, Φ * Pᵏ, Pᵏ' * Φ'
    get_CG_op = get_CG_operator_function(PᵃΦᵃΦP, Hᵏ₁₁, Dᵏ, tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾, tempʳˣʳ, d₁, d₂)
    map_to_γ̃ₖ = get_map_to_γ̃ₖ_function(PᵃΦᵃ, Hᵏ₁₁, Dᵏ, tempʳ⁽ʳ⁺ᵈ¹⁺ᵈ²⁾, tempʳˣʳ, d₁, d₂)
    
    verbose && (table = DebugTable(
        ("k", () -> k, 3), ("rank(Xᵏ)", () -> rank(Xᵏ, atol=1e-3), 3),
        ("‖Xᴳᵀ - Xᵏ‖₂", () -> opnorm(Xᴳᵀ - Xᵏ, 2), 3), ("σ₁", () -> σ[1]),
        ("σᵣ₊₁", () -> σ[r̃+1]), ("ϵₖ", () -> ϵₖ)))
    
    while k <= N
        
        # Find leading rₖ left/right singular vectors of Xᵏ and calculate all singular values
        svd!(tempᵈ¹ˣᵈ² .= Xᵏ, F)
        @views begin Uᵏ .= F.U[:, 1:r̃]; Vᵏ .=  F.V[:, 1:r̃]; Vtᵏ .= F.Vt[1:r̃, :]; end
        
        # Print some info
        verbose && printRow(table)
        
        # Step 2.
        same_ϵ_count = ϵₖ < σ[r̃+1] ? same_ϵ_count + 1 : 0
        ϵₖ = min(ϵₖ, σ[r̃+1])
        
        # Additonal exit conditions
        (same_ϵ_count == 10 || ϵₖ < √(eps(real(dType)))) && break
        
        # Step 3.
        update_H!(Hᵏ₁₁, σ)
        update_D!(Dᵏ, σ, ϵₖ)
        # We skip calculating Wᵏ in favour of the optimized implementation
        
        # Pᵏ, PᵃΦᵃΦP, ΦP, PᵃΦᵃ operators will be automatically updated
        # as they are linked to Hᵏ₁₁ and Dᵏ, so they always use the most recent values
        CG_op = get_CG_op(Hᵏ₁₁, Dᵏ, ϵₖ)
        
        # Step 1.
        mul!(b, PᵃΦᵃ, y)             # right hand side for CG
        mul!(γᵏ, Pᵏ', Xᵏ)            # initial value for CG
        cg!(γᵏ, CG_op, b, tol=cg_tol, statevars = statevars, maxiter = maxIter)
        rᵏ .= y .- mul!(rᵏ, ΦP,  γᵏ) # rᵏ = y - Φ * Pᵏ * γᵏ
        # Xᵏ = Φ' * rᵏ + Pᵏ * γ̃ₖ
        mul!(Xᵏ, Pᵏ, map_to_γ̃ₖ(γᵏ, rᵏ, ϵₖ))
        Xᵏ .+= mul!(tempᵈ¹ˣᵈ², Φ', rᵏ)
        
        k += 1
    end
    
    # Print some info
    verbose && finishTable(table)
    
    return Xᵏ
end

performant_MatrixIRLS_for_PCA (generic function with 1 method)

# Numerical Experiments

In [8]:
Random.seed!(0);

## 1. Easy Problem in Real Domain

### General parameters

In [9]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = Float64    # Type of matrix elements
ρ = 1.5;           # 1 -> sampling at theoretical minimum

### Generate Model

#### Sampling Mask ($\Phi$)

_**Requirement towards the sampling mask:** It must have at least $r$ non-zero entries in each row and each column._

In [10]:
df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = FunctionOperator{dType}(name = "Φ", inDims = (d₁, d₂), outDims = (d₁, d₂),
    forw = (b,x) -> b .= Φᴹ .* x, backw = (b,x) -> b .= x)
@show r
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=1))))
println("minimum number of non-zero entries in each column: ", Int(minimum(sum(Φᴹ, dims=2))))

r = 7
minimum number of non-zero entries in each column: 14
minimum number of non-zero entries in each column: 12


### Generate Data

Create a random rank-$r$ matrix $L_0 \in \mathbb{C}^{d_1 \times d_2}$ such that $L_0 = U_0 V_0^*$, where $U_0 \in \mathbb{C}^{d_1 \times r}$ and $V_0 \in \mathbb{C}^{d_2 \times r}$, and then sub-sample this low-rank matrix.

In [11]:
L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
@show size(L₀)
@show rank(L₀)

y = Φ * L₀
@show rank(y);

size(L₀) = (50, 50)
rank(L₀) = 7
rank(y) = 50


### Running The Reconstruction

In [12]:
performant_MatrixIRLS_for_PCA(L₀, y, Φ, N = 2) # for better measurement of time I run the function to compile it
@time performant_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      46.502 │   30.111 │   16.838 │      Inf │
│   1 │       50 │      38.650 │   41.684 │   11.616 │   16.838 │
│   2 │       50 │      32.409 │   54.038 │    8.091 │   11.616 │
│   3 │       50 │      25.793 │   61.424 │    5.352 │    8.091 │
│   4 │       50 │      19.867 │   64.471 │    3.085 │    5.352 │
│   5 │       50 │      14.475 │   66.417 │    2.079 │    3.085 │
│   6 │       50 │       9.959 │   67.864 │    1.386 │    2.079 │
│   7 │       50 │       6.408 │   68.811 │    0.772 │    1.386 │
│   8 │       50 │       3.502 │   69.354 │    0.350 │    0.772 │
│   9 │       49 │       1.377 │   69.619 │    0.126 │    0.350 │
│  10 │       43 │       0.278 │   69.704 │    0.023 │    0.126 │
│  11 │        8 │       0.013 │   69.716 │ 1.01e-03 │    0.023 │
│  12 │   

## 2. Easy Problem in Complex Domain

In [13]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = ComplexF64 # Type of matrix elements
ρ = 1.5            # 1 -> sampling at theoretical minimum

df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = FunctionOperator{dType}(name = "Φ", inDims = (d₁, d₂), outDims = (d₁, d₂),
    forw = (b,x) -> b .= Φᴹ .* x, backw = (b,x) -> b .= x)

L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
y = Φ * L₀;

### Running The Reconstruction

In [14]:
performant_MatrixIRLS_for_PCA(L₀, y, Φ, N = 2) # for better measurement of time I run the function to compile it
@time performant_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      46.188 │   32.810 │   18.019 │      Inf │
│   1 │       50 │      37.158 │   47.708 │   12.549 │   18.019 │
│   2 │       50 │      30.921 │   60.684 │    7.951 │   12.549 │
│   3 │       50 │      23.812 │   66.908 │    4.668 │    7.951 │
│   4 │       50 │      16.888 │   69.502 │    2.655 │    4.668 │
│   5 │       50 │      10.643 │   70.679 │    1.353 │    2.655 │
│   6 │       50 │       5.385 │   71.452 │    0.587 │    1.353 │
│   7 │       50 │       1.832 │   72.024 │    0.195 │    0.587 │
│   8 │       48 │       0.296 │   72.291 │    0.033 │    0.195 │
│   9 │        8 │       0.010 │   72.341 │ 1.18e-03 │    0.033 │
│  10 │        7 │    1.42e-05 │   72.342 │ 1.61e-06 │ 1.18e-03 │
│  11 │        7 │    3.02e-11 │   72.342 │ 3.19e-12 │ 1.61e-06 │
└─────┴───

## 3. Difficult Problem in Real Domain

In [15]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = Float64    # Type of matrix elements
ρ = 1.05           # 1 -> sampling at theoretical minimum

df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = FunctionOperator{dType}(name = "Φ", inDims = (d₁, d₂), outDims = (d₁, d₂),
    forw = (b,x) -> b .= Φᴹ .* x, backw = (b,x) -> b .= x)

L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
y = Φ * L₀;

### Running The Reconstruction

In [16]:
@time performant_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      51.489 │   25.825 │   15.854 │      Inf │
│   1 │       50 │      47.972 │   34.523 │   12.902 │   15.854 │
│   2 │       50 │      44.630 │   45.709 │   10.284 │   12.902 │
│   3 │       50 │      42.897 │   56.132 │    7.809 │   10.284 │
│   4 │       50 │      41.301 │   62.999 │    5.729 │    7.809 │
│   5 │       50 │      39.890 │   66.197 │    4.732 │    5.729 │
│   6 │       50 │      39.043 │   67.341 │    4.172 │    4.732 │
│   7 │       50 │      38.358 │   67.938 │    3.663 │    4.172 │
│   8 │       50 │      37.685 │   68.349 │    3.123 │    3.663 │
│   9 │       50 │      36.998 │   68.545 │    2.688 │    3.123 │
│  10 │       50 │      36.314 │   68.482 │    2.503 │    2.688 │
│  11 │       50 │      35.653 │   68.254 │    2.306 │    2.503 │
│  12 │   

## 4. Difficult Problem in Complex Domain

In [17]:
d₁, d₂ = 50, 50    # Matrix dimensions
r = 7              # Desired rank
dType = ComplexF64 # Type of matrix elements
ρ = 1.05           # 1 -> sampling at theoretical minimum

df = r * (d₁ + d₂ - r) # Number of degrees of freedom of the setting
m = floor(Int, min(ρ * df, d₁ * d₂))
Φᴹ = generateΦ(d₁, d₂, r, m)
Φ = FunctionOperator{dType}(name = "Φ", inDims = (d₁, d₂), outDims = (d₁, d₂),
    forw = (b,x) -> b .= Φᴹ .* x, backw = (b,x) -> b .= x)

L₀ = generateLowRankComponent_Christian(d₁, d₂, r, dType)
y = Φ * L₀;

### Running The Reconstruction

In [18]:
@time performant_MatrixIRLS_for_PCA(L₀, y, Φ, verbose = true);

┌─────┬──────────┬─────────────┬──────────┬──────────┬──────────┐
│  k  │ rank(Xᵏ) │ ‖Xᴳᵀ - Xᵏ‖₂ │    σ₁    │   σᵣ₊₁   │    ϵₖ    │
├─────┼──────────┼─────────────┼──────────┼──────────┼──────────┤
│   0 │       50 │      55.976 │   23.837 │   13.988 │      Inf │
│   1 │       50 │      48.466 │   34.466 │   11.432 │   13.988 │
│   2 │       50 │      40.172 │   47.620 │    8.923 │   11.432 │
│   3 │       50 │      37.308 │   57.347 │    6.451 │    8.923 │
│   4 │       50 │      36.682 │   62.492 │    4.793 │    6.451 │
│   5 │       50 │      35.665 │   65.079 │    3.807 │    4.793 │
│   6 │       50 │      34.095 │   66.537 │    3.018 │    3.807 │
│   7 │       50 │      32.217 │   67.601 │    2.362 │    3.018 │
│   8 │       50 │      30.352 │   68.511 │    1.900 │    2.362 │
│   9 │       50 │      28.761 │   69.299 │    1.611 │    1.900 │
│  10 │       50 │      27.551 │   70.010 │    1.377 │    1.611 │
│  11 │       50 │      26.637 │   70.692 │    1.167 │    1.377 │
│  12 │   