#### Test Pipeline
This is a Julia Notebook to check the correctness of the C++ code. 

The notebook proceeds as follows: 
- Generate a synthetic instance
- Run the original implementation of the algorithm (pure Julia)
- Run the C++ implementation of the algorithm

#### Step 1: Data Generation

Generates a ground truth covariance matrix S of the form $I + \beta x_1 x_1^\top + \beta x_2 x_2^\top$, where 
- $x_1, x_2$ are $k$-sparse, with non-overlapping support 
- $\beta$ controls the signal-to-noise ratio

Then, samples $n$ multivariate normal observation from $\mathcal{N}(0,S)$ and constructs the empirical covariance matrix $\Sigma$

In [1]:
using Random, LinearAlgebra
p = 10 #Dimension
r = 2 #Number of sparse PCs
k = 4 #Sparsity of each PC
β = 1 #Signal strength


x1 = zeros(p); x1[1:k] = sign.(rand(k) .- .5)
x2 = zeros(p); x2[(k+1):(k+k)] = sign.(rand(k) .- .5)

# x1[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap)] = sign.(rand(k_overlap) .- .5)
# x2[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap_half)] = -x1[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap_half)]
# x2[(2*k_nonoverlapping+k_overlap_half+1):(2*k_nonoverlapping+k_overlap)] = x1[(2*k_nonoverlapping+k_overlap_half+1):(2*k_nonoverlapping+k_overlap)]

@assert sum(abs.(x1) .> 0) == k
@assert sum(abs.(x2) .> 0) == k
@assert abs(dot(x1,x2)) ≤ 1e-10

shufflecoords = randperm(p)
x1 = x1[shufflecoords]; x2=x2[shufflecoords] 

x1 /= norm(x1); x2 /= norm(x2) 

S = β*x1*x1'+β*x2*x2'+ Matrix(1.0*I, p, p)
S = (S + S')/2

10×10 Matrix{Float64}:
 1.0   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   0.0
 0.0   1.25   0.0    0.25  -0.25   0.25   0.0    0.0    0.0   0.0
 0.0   0.0    1.25   0.0    0.0    0.0   -0.25   0.25  -0.25  0.0
 0.0   0.25   0.0    1.25  -0.25   0.25   0.0    0.0    0.0   0.0
 0.0  -0.25   0.0   -0.25   1.25  -0.25   0.0    0.0    0.0   0.0
 0.0   0.25   0.0    0.25  -0.25   1.25   0.0    0.0    0.0   0.0
 0.0   0.0   -0.25   0.0    0.0    0.0    1.25  -0.25   0.25  0.0
 0.0   0.0    0.25   0.0    0.0    0.0   -0.25   1.25  -0.25  0.0
 0.0   0.0   -0.25   0.0    0.0    0.0    0.25  -0.25   1.25  0.0
 0.0   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   1.0

In [2]:
n = 1000 #Sample size

Random.seed!(1234)

using Distributions
d = MvNormal(zeros(p), S)
X = rand(d, n) #p by N matrix of observations

Sn = cov(X') #Sample covariance matrix

10×10 Matrix{Float64}:
  0.959347      0.0300463   …  -0.0139849   0.01331     0.0673849
  0.0300463     1.26725        -0.0996556  -0.0326079  -0.0730411
 -0.0275862    -0.0761811       0.297874   -0.268228    0.0234533
  0.0283325     0.259159       -0.0522333  -0.0160083  -0.0388555
 -0.0286119    -0.187683       -0.0115174   0.079388    0.00164511
  0.0195264     0.274873    …  -0.0579623  -0.0404154   0.0746167
  0.000424525  -0.00145207     -0.358957    0.216079    0.0221255
 -0.0139849    -0.0996556       1.35438    -0.274658   -0.0638955
  0.01331      -0.0326079      -0.274658    1.25414     0.0152739
  0.0673849    -0.0730411      -0.0638955   0.0152739   1.08632

In [3]:
# show(stdout, "text/plain", Sn)

In [4]:
# [x1 x2]

In [5]:
# [k, k]

#### Step 2: Julia benchmark

Applies the Julia code to $S$ (true covariance matrix) and $\Sigma$ (emprirical covariance matrix)

In [6]:
include("algorithm2.jl")

findmultPCs_deflation (generic function with 1 method)

In [14]:
ofv_best, violation_best, runtime, x_best = findmultPCs_deflation(Sn, r, [k,k]; numIters = 5, verbose = true, violation_tolerance = 1e-4 )

x_best

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern: [4, 4]

  Iteration |      Objective value |   Orthogonality Violation |       Time 
          1 |                0.354 |                  2.00e+00 |      0.027 


          2 |                0.354 |                  2.00e+00 |      0.043 
          3 |                0.354 |                  2.00e+00 |      0.064 
          4 |                0.354 |                  2.00e+00 |      0.089 
          5 |                0.341 |                  4.44e-16 |      0.104 


10×2 Matrix{Float64}:
  0.0        0.0
 -0.487808   0.0
  0.0       -0.48863
 -0.507242   0.0
  0.444067   0.0
 -0.554574   0.0
  0.0        0.523244
  0.0       -0.550406
  0.0        0.429546
  0.0        0.0

In [9]:
# show(stdout, "text/plain", x_best)

#### Step 3: R/C++ implementation

Applies the R code to $S$ (true covariance matrix) and $\Sigma$ (emprirical covariance matrix)

In [8]:
using RCall

R"""library(sPCAmPC)"""
# R"""
# library(devtools)
# reload(pkg = "sPCAmPC", quiet = FALSE)"""


RObject{StrSxp}
[1] "sPCAmPC"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
[7] "methods"   "base"     


In [22]:
Sn2 = round.(Sn,digits=3)

10×10 Matrix{Float64}:
  0.959   0.03   -0.028   0.028  -0.029  …   0.0    -0.014   0.013   0.067
  0.03    1.267  -0.076   0.259  -0.188     -0.001  -0.1    -0.033  -0.073
 -0.028  -0.076   1.287  -0.01    0.009     -0.295   0.298  -0.268   0.023
  0.028   0.259  -0.01    1.235  -0.252     -0.039  -0.052  -0.016  -0.039
 -0.029  -0.188   0.009  -0.252   1.201      0.058  -0.012   0.079   0.002
  0.02    0.275  -0.036   0.288  -0.26   …  -0.033  -0.058  -0.04    0.075
  0.0    -0.001  -0.295  -0.039   0.058      1.344  -0.359   0.216   0.022
 -0.014  -0.1     0.298  -0.052  -0.012     -0.359   1.354  -0.275  -0.064
  0.013  -0.033  -0.268  -0.016   0.079      0.216  -0.275   1.254   0.015
  0.067  -0.073   0.023  -0.039   0.002      0.022  -0.064   0.015   1.086

In [24]:
R"""

TestMat <- $Sn2 

results <- cpp_findmultPCs_deflation(TestMat, 2, c(4, 4), numIters=7)
"""

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern:  4 4

  Iteration |      Objective value |   Orthogonality Violation |       Time
          1 |                0.354 |                  2.00e+00 |      0.019
          2 |                0.354 |                  2.00e+00 |      0.036
          3 |                0.354 |                  2.00e+00 |      0.051
          4 |                0.354 |                  2.00e+00 |      0.066
          5 |                0.354 |                  2.00e+00 |      0.080
          6 |                0.317 |                  9.85e-01 |      0.097
          7 |                0.341 |                  2.22e-16 |      0.111


RObject{VecSxp}
$objective_value
[1] 4.194444

$orthogonality_violation
[1] 2.220446e-16

$runtime
[1] 0.111

$x_best
            [,1]       [,2]
 [1,]  0.0000000  0.0000000
 [2,]  0.4878160  0.0000000
 [3,]  0.0000000 -0.4885468
 [4,]  0.5069928  0.0000000
 [5,] -0.4440590  0.0000000
 [6,]  0.5548021  0.0000000
 [7,]  0.0000000  0.5232810
 [8,]  0.0000000 -0.5504493
 [9,]  0.0000000  0.4295400
[10,]  0.0000000  0.0000000



In [12]:
x_best

10×2 Matrix{Float64}:
 0.0        0.0
 0.0        0.448173
 0.519934   0.0
 0.502909   0.0
 0.462533   0.0
 0.512653   0.0
 0.0       -0.562901
 0.0        0.0
 0.0        0.507562
 0.0        0.473987

In [13]:
[x1 x2]

10×2 Matrix{Float64}:
 -0.5   0.0
  0.0  -0.5
  0.0   0.5
  0.0  -0.5
  0.0   0.0
 -0.5   0.0
  0.0  -0.5
  0.0   0.0
  0.5   0.0
 -0.5   0.0