#### Test Pipeline
This is a Julia Notebook to check the correctness of the C++ code. 

The notebook proceeds as follows: 
- Generate a synthetic instance
- Run the original implementation of the algorithm (pure Julia)
- Run the C++ implementation of the algorithm

#### Step 1: Data Generation

Generates a ground truth covariance matrix S of the form $I + \beta x_1 x_1^\top + \beta x_2 x_2^\top$, where 
- $x_1, x_2$ are $k$-sparse, with non-overlapping support 
- $\beta$ controls the signal-to-noise ratio

Then, samples $n$ multivariate normal observation from $\mathcal{N}(0,S)$ and constructs the empirical covariance matrix $\Sigma$

In [1]:
using Random, LinearAlgebra
Random.seed!(1234)

p = 10 #Dimension
r = 2 #Number of sparse PCs
k = 4 #Sparsity of each PC
β = 1 #Signal strength


x1 = zeros(p); x1[1:k] = sign.(rand(k) .- .5)
x2 = zeros(p); x2[(k+1):(k+k)] = sign.(rand(k) .- .5)

# x1[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap)] = sign.(rand(k_overlap) .- .5)
# x2[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap_half)] = -x1[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap_half)]
# x2[(2*k_nonoverlapping+k_overlap_half+1):(2*k_nonoverlapping+k_overlap)] = x1[(2*k_nonoverlapping+k_overlap_half+1):(2*k_nonoverlapping+k_overlap)]

@assert sum(abs.(x1) .> 0) == k
@assert sum(abs.(x2) .> 0) == k
@assert abs(dot(x1,x2)) ≤ 1e-10

shufflecoords = randperm(p)
x1 = x1[shufflecoords]; x2=x2[shufflecoords] 

x1 /= norm(x1); x2 /= norm(x2) 

S = β*x1*x1'+β*x2*x2'+ Matrix(1.0*I, p, p)
S = (S + S')/2

10×10 Matrix{Float64}:
  1.25   0.25  0.0  0.0   0.0    0.0    0.0   -0.25  -0.25   0.0
  0.25   1.25  0.0  0.0   0.0    0.0    0.0   -0.25  -0.25   0.0
  0.0    0.0   1.0  0.0   0.0    0.0    0.0    0.0    0.0    0.0
  0.0    0.0   0.0  1.0   0.0    0.0    0.0    0.0    0.0    0.0
  0.0    0.0   0.0  0.0   1.25   0.25  -0.25   0.0    0.0   -0.25
  0.0    0.0   0.0  0.0   0.25   1.25  -0.25   0.0    0.0   -0.25
  0.0    0.0   0.0  0.0  -0.25  -0.25   1.25   0.0    0.0    0.25
 -0.25  -0.25  0.0  0.0   0.0    0.0    0.0    1.25   0.25   0.0
 -0.25  -0.25  0.0  0.0   0.0    0.0    0.0    0.25   1.25   0.0
  0.0    0.0   0.0  0.0  -0.25  -0.25   0.25   0.0    0.0    1.25

In [14]:
n = 300 #Sample size

Random.seed!(1234)

using Distributions
d = MvNormal(zeros(p), S)
X = rand(d, n) #p by N matrix of observations

Sn = cov(X') #Sample covariance matrix

10×10 Matrix{Float64}:
  1.10043     0.261692     0.0609114   …  -0.229016   -0.227339     0.0693287
  0.261692    1.25939      0.00937561     -0.325147   -0.260874    -0.145351
  0.0609114   0.00937561   0.992215        0.0789801   0.0943653   -0.0379832
  0.024551    0.0228289   -0.0464934      -0.0552941  -0.0018719   -0.0609979
  0.0586004  -0.0370394   -0.0558297      -0.0717183  -0.0164141   -0.370927
  0.0690882   0.0219004    0.0192386   …   0.0794647  -0.00801687  -0.182411
 -0.0073494  -0.0526744   -0.0306221      -0.0868641  -0.0757406    0.223864
 -0.229016   -0.325147     0.0789801       1.45057     0.300336    -0.0717103
 -0.227339   -0.260874     0.0943653       0.300336    1.37019      0.0775116
  0.0693287  -0.145351    -0.0379832      -0.0717103   0.0775116    1.39523

#### Step 2: Julia benchmark

Applies the Julia code to $S$ (true covariance matrix) and $\Sigma$ (emprirical covariance matrix)

In [3]:
include("algorithm2.jl")

findmultPCs_deflation (generic function with 1 method)

In [15]:
ofv_best, violation_best, runtime, x_best = findmultPCs_deflation(Sn, r, [k,k]; numIters = 10, verbose = true, violation_tolerance = 1e-4 )

x_best

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern: [4, 4]

  Iteration |      Objective value |   Orthogonality Violation |       Time 
          1 |                0.337 |                  2.00e+00 |      0.000 
          2 |                0.309 |                  7.27e-01 |      0.001 
          3 |                0.333 |                  2.22e-16 |      0.001 
          4 |                0.333 |                  3.33e-16 |      0.001 
          5 |                0.333 |                  3.33e-16 |      0.002 
          6 |                0.333 |                  0.00e+00 |      0.002 
          7 |                0.333 |                  2.22e-16 |      0.003 
          8 |                0.333 |                  4.44e-16 |      0.003 


10×2 Matrix{Float64}:
  0.0        0.363441
  0.0        0.482893
  0.0        0.0
  0.0        0.0
  0.470507   0.0
  0.495916   0.0
 -0.466884   0.0
  0.0       -0.600025
  0.0       -0.524113
 -0.56099    0.0

In [12]:
[x1 x2]

10×2 Matrix{Float64}:
 -0.5   0.0
 -0.5   0.0
  0.0   0.0
  0.0   0.0
  0.0   0.5
  0.0   0.5
  0.0  -0.5
  0.5   0.0
  0.5   0.0
  0.0  -0.5

#### Step 3: R/C++ implementation

Applies the R code to $S$ (true covariance matrix) and $\Sigma$ (emprirical covariance matrix)

In [8]:
using RCall

R"""library(msPCA)"""
# R"""
# library(devtools)
# reload(pkg = "msPCA", quiet = FALSE)"""


RObject{StrSxp}
[1] "sPCAmPC"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
[7] "methods"   "base"     


In [16]:
@time R"""

TestMat <- $Sn

results <- cpp_findmultPCs_deflation(TestMat, 2, c(4, 4), numIters=10)
"""

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern:  4 4
  Iteration |      Objective value |   Orthogonality Violation |       Time
          1 |                0.337 |                  2.00e+00 |      0.016
          2 |                0.309 |                  7.27e-01 |      0.031
          3 |                0.333 |                  3.33e-16 |      0.045
          4 |                0.333 |                  2.22e-16 |      0.059
          5 |                0.333 |                  0.00e+00 |      0.072
          6 |                0.333 |                  2.22e-16 |      0.085
          7 |                0.333 |                  1.11e-16 |      0.099
          8 |                0.333 |                  6.66e-16 |      0.112


  0.112801 seconds (340 allocations: 16.758 KiB)


RObject{VecSxp}
$objective_value
[1] 4.190207

$orthogonality_violation
[1] 3.330669e-16

$runtime
[1] 0.112

$x_best
            [,1]       [,2]
 [1,]  0.0000000  0.3726750
 [2,]  0.0000000  0.4911464
 [3,]  0.0000000  0.0000000
 [4,]  0.0000000  0.0000000
 [5,]  0.4898685  0.0000000
 [6,]  0.4893779  0.0000000
 [7,] -0.4575199  0.0000000
 [8,]  0.0000000 -0.5932694
 [9,]  0.0000000 -0.5176098
[10,] -0.5578653  0.0000000



In [9]:
x_best

10×2 Matrix{Float64}:
  0.0       -0.421568
  0.0       -0.529844
  0.0        0.0
  0.0        0.0
 -0.458548   0.0
 -0.468789   0.0
  0.536703   0.0
  0.0        0.543248
  0.0        0.496414
  0.530962   0.0

In [15]:
@time ofv_best, violation_best, runtime, x_best = findmultPCs_deflation(Sn, r, [k,k]; numIters = 10, verbose = true, violation_tolerance = 1e-4 )

x_best

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern: [4, 4]

  Iteration |      Objective value |   Orthogonality Violation |       Time 
          1 |                0.351 |                  2.00e+00 |      0.000 
          2 |                0.351 |                  2.00e+00 |      0.001 
          3 |                0.351 |                  2.00e+00 |      0.001 
          4 |                0.351 |                  2.00e+00 |      0.001 
          5 |                0.314 |                  8.20e-01 |      0.002 
          6 |                0.336 |                  3.33e-16 |      0.003 
          7 |                0.336 |                  1.11e-16 |      0.004 
          8 |                0.336 |                  3.33e-16 |      0.005 
          9 |                0.336 |                  2.22e-16 |      0.006 
  0.006895 seconds (34.23 k allocations: 2.758 MiB)


10×2 Matrix{Float64}:
  0.0       -0.421568
  0.0       -0.529844
  0.0        0.0
  0.0        0.0
  0.458548   0.0
  0.468789   0.0
 -0.536703   0.0
  0.0        0.543248
  0.0        0.496414
 -0.530962   0.0

In [13]:
[x1 x2]

10×2 Matrix{Float64}:
 -0.5   0.0
  0.0  -0.5
  0.0   0.5
  0.0  -0.5
  0.0   0.0
 -0.5   0.0
  0.0  -0.5
  0.0   0.0
  0.5   0.0
 -0.5   0.0