#### Test Pipeline
This is a Julia Notebook to check the correctness of the C++ code. 

The notebook proceeds as follows: 
- Generate a synthetic instance
- Run the original implementation of the algorithm (pure Julia)
- Run the C++ implementation of the algorithm

#### Step 1: Data Generation

Generates a ground truth covariance matrix S of the form $I + \beta x_1 x_1^\top + \beta x_2 x_2^\top$, where 
- $x_1, x_2$ are $k$-sparse, with non-overlapping support 
- $\beta$ controls the signal-to-noise ratio

Then, samples $n$ multivariate normal observation from $\mathcal{N}(0,S)$ and constructs the empirical covariance matrix $\Sigma$

In [1]:
using Random, LinearAlgebra
p = 10 #Dimension
r = 2 #Number of sparse PCs
k = 4 #Sparsity of each PC
β = 1 #Signal strength


x1 = zeros(p); x1[1:k] = sign.(rand(k) .- .5)
x2 = zeros(p); x2[(k+1):(k+k)] = sign.(rand(k) .- .5)

# x1[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap)] = sign.(rand(k_overlap) .- .5)
# x2[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap_half)] = -x1[(2*k_nonoverlapping+1):(2*k_nonoverlapping+k_overlap_half)]
# x2[(2*k_nonoverlapping+k_overlap_half+1):(2*k_nonoverlapping+k_overlap)] = x1[(2*k_nonoverlapping+k_overlap_half+1):(2*k_nonoverlapping+k_overlap)]

@assert sum(abs.(x1) .> 0) == k
@assert sum(abs.(x2) .> 0) == k
@assert abs(dot(x1,x2)) ≤ 1e-10

shufflecoords = randperm(p)
x1 = x1[shufflecoords]; x2=x2[shufflecoords] 

x1 /= norm(x1); x2 /= norm(x2) 

S = β*x1*x1'+β*x2*x2'+ Matrix(1.0*I, p, p)
S = (S + S')/2

10×10 Matrix{Float64}:
 1.0   0.0    0.0    0.0    0.0    0.0   0.0   0.0    0.0    0.0
 0.0   1.25   0.0    0.0    0.0    0.0   0.0   0.25  -0.25   0.25
 0.0   0.0    1.25   0.25  -0.25   0.25  0.0   0.0    0.0    0.0
 0.0   0.0    0.25   1.25  -0.25   0.25  0.0   0.0    0.0    0.0
 0.0   0.0   -0.25  -0.25   1.25  -0.25  0.0   0.0    0.0    0.0
 0.0   0.0    0.25   0.25  -0.25   1.25  0.0   0.0    0.0    0.0
 0.0   0.0    0.0    0.0    0.0    0.0   1.0   0.0    0.0    0.0
 0.0   0.25   0.0    0.0    0.0    0.0   0.0   1.25  -0.25   0.25
 0.0  -0.25   0.0    0.0    0.0    0.0   0.0  -0.25   1.25  -0.25
 0.0   0.25   0.0    0.0    0.0    0.0   0.0   0.25  -0.25   1.25

In [2]:
n = 1000 #Sample size

using Distributions
d = MvNormal(zeros(p), S)
X = rand(d, n) #p by N matrix of observations

Sn = cov(X') #Sample covariance matrix

10×10 Matrix{Float64}:
  0.974057     0.0512088   …   0.0308505  -0.00491447   0.0180792
  0.0512088    1.21982         0.291571   -0.278138     0.273366
  0.00346936  -0.00424117     -0.0403878  -0.0433718   -0.0146031
 -0.0265371    0.0235388      -0.058609    0.0980771   -0.034894
  0.0403832   -0.014691       -0.0121609   0.0368603   -0.0393298
  0.0208152   -0.0421139   …   0.0402191  -0.0590935   -0.0307827
  0.0161977   -0.0580555       0.0410501   0.00719067  -0.0253777
  0.0308505    0.291571        1.23227    -0.291135     0.236669
 -0.00491447  -0.278138       -0.291135    1.22441     -0.329682
  0.0180792    0.273366        0.236669   -0.329682     1.23266

In [4]:
show(stdout, "text/plain", Sn)

10×10 Matrix{Float64}:
  0.974057     0.0512088    0.00346936  -0.0265371    0.0403832   0.0208152   0.0161977    0.0308505  -0.00491447   0.0180792
  0.0512088    1.21982     -0.00424117   0.0235388   -0.014691   -0.0421139  -0.0580555    0.291571   -0.278138     0.273366
  0.00346936  -0.00424117   1.20709      0.221238    -0.198467    0.244141    0.0360966   -0.0403878  -0.0433718   -0.0146031
 -0.0265371    0.0235388    0.221238     1.27112     -0.167242    0.191913    0.00553857  -0.058609    0.0980771   -0.034894
  0.0403832   -0.014691    -0.198467    -0.167242     1.28058    -0.200952   -0.0247817   -0.0121609   0.0368603   -0.0393298
  0.0208152   -0.0421139    0.244141     0.191913    -0.200952    1.18369     0.0202325    0.0402191  -0.0590935   -0.0307827
  0.0161977   -0.0580555    0.0360966    0.00553857  -0.0247817   0.0202325   0.940241     0.0410501   0.00719067  -0.0253777
  0.0308505    0.291571    -0.0403878   -0.058609    -0.0121609   0.0402191   0.0410501    1.2322

In [5]:
[x1 x2]

10×2 Matrix{Float64}:
  0.0   0.0
  0.5   0.0
  0.0  -0.5
  0.0  -0.5
  0.0   0.5
  0.0  -0.5
  0.0   0.0
  0.5   0.0
 -0.5   0.0
  0.5   0.0

In [6]:
[k, k]

2-element Vector{Int64}:
 4
 4

#### Step 2: Julia benchmark

Applies the Julia code to $S$ (true covariance matrix) and $\Sigma$ (emprirical covariance matrix)

In [8]:
include("algorithm2.jl")

findmultPCs_deflation (generic function with 1 method)

In [9]:
ofv_best, violation_best, runtime, x_best = findmultPCs_deflation(S, r, [k,k]; numIters = 20, verbose = true, violation_tolerance = 1e-4 )

x_best

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern: [4, 4]

  Iteration |      Objective value |   Orthogonality Violation |       Time 


          1 |                0.333 |                  2.00e+00 |      1.847 


          2 |                0.333 |                  1.00e-07 |      2.280 


10×2 Matrix{Float64}:
  0.0   0.0
 -0.5   0.0
  0.0  -0.5
  0.0  -0.5
  0.0   0.5
  0.0  -0.5
  0.0   0.0
 -0.5   0.0
  0.5   0.0
 -0.5   0.0

In [10]:
ofv_best, violation_best, runtime, x_best = findmultPCs_deflation(Sn, r, [k,k]; numIters = 20, verbose = true, violation_tolerance = 1e-4 )

x_best

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern: [4, 4]

  Iteration |      Objective value |   Orthogonality Violation |       Time 
          1 |                0.353 |                  2.00e+00 |      0.026 
          2 |                0.353 |                  2.00e+00 |      0.064 
          3 |                0.353 |                  2.00e+00 |      0.090 


          4 |                0.353 |                  2.00e+00 |      0.127 
          5 |                0.353 |                  2.00e+00 |      0.163 
          6 |                0.353 |                  2.00e+00 |      0.188 
          7 |                0.353 |                  2.00e+00 |      0.224 


          8 |                0.353 |                  2.00e+00 |      0.250 
          9 |                0.353 |                  2.00e+00 |      0.285 
         10 |                0.334 |                  1.00e-07 |      0.320 
         11 |                0.334 |                  1.00e-07 |      0.345 


         12 |                0.334 |                  1.00e-07 |      0.380 
         13 |                0.334 |                  1.00e-07 |      0.406 


10×2 Matrix{Float64}:
  0.0        0.0
  0.0        0.488698
  0.514884   0.0
  0.499587   0.0
 -0.500104   0.0
  0.484977   0.0
  0.0        0.0
  0.0        0.488606
  0.0       -0.523758
  0.0        0.498112

In [11]:
show(stdout, "text/plain", x_best)

10×2 Matrix{Float64}:
  0.0        0.0
  0.0        0.488698
  0.514884   0.0
  0.499587   0.0
 -0.500104   0.0
  0.484977   0.0
  0.0        0.0
  0.0        0.488606
  0.0       -0.523758
  0.0        0.498112

#### Step 3: R/C++ implementation

Applies the R code to $S$ (true covariance matrix) and $\Sigma$ (emprirical covariance matrix)

In [12]:
using RCall

In [11]:
R"""
install.packages('devtools', repos='http://cran.us.r-project.org', dependencies=TRUE)
"""


The downloaded binary packages are in
	/var/folders/tw/x4vcf7js2pdbgpn_tmnglgj40000gp/T//RtmpvkmsCb/downloaded_packages


│ Content type 'application/x-gzip' length 430872 bytes (420 KB)
│ downloaded 420 KB
│ 
└ @ RCall /Users/jeanpauphilet/.julia/packages/RCall/aK5sD/src/io.jl:172


RObject{StrSxp}
[1] "devtools"  "usethis"   "stats"     "graphics"  "grDevices" "utils"    
[7] "datasets"  "methods"   "base"     


In [16]:
R"""
library(devtools)
install_github('jeanpauphilet/sPCAmPC/R', auth_token ="XXX")
"""

RCall.REvalError: REvalError: Error in install_github("jeanpauphilet/sPCAmPC/R", auth_token = "XXX") : 
  could not find function "install_github"

In [24]:
# R"""library(sPCAmPC)"""
R"""
library(devtools)
reload(pkg = "sPCAmPC", quiet = FALSE)"""


RCall.REvalError: REvalError: Error in `package_file()`:
! 'sPCAmPC/' is not a directory.
Backtrace:
    ▆
 1. └─devtools::reload(pkg = "sPCAmPC/", quiet = FALSE)
 2.   └─devtools::as.package(pkg)
 3.     └─devtools::package_file(path = x)
 4.       └─cli::cli_abort("{.path {path}} is not a directory.")
 5.         └─rlang::abort(...)

In [20]:
R"""

TestMat <- $S 

TestKS <- matrix( c(4, 4), nrow = 1, ncol = 2, byrow = TRUE)

cpp_findmultPCs_deflation(TestMat, 2, TestKS, numIters=20)
"""

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern:  4 4

  Iteration |      Objective value |   Orthogonality Violation |       Time
          1 |                 0.25 |                  2.00e+00 |      0.144
          2 |                0.292 |                  1.00e-07 |      0.280
          3 |                0.292 |                  1.00e-07 |      0.401
          4 |                 0.25 |                  1.00e-07 |      0.597
          5 |                 0.25 |                  1.00e-07 |      0.716
          6 |                 0.25 |                  1.00e-07 |      0.854
          7 |                 0.25 |                  1.00e-07 |      0.957
          8 |                 0.25 |                  1.00e-07 |      1.121
          9 |                 0.25 |                  1.00e-07 |      1.273
         10 |                 0.25 |                  1.00e-07 |      1.452
         11 |                 0.25 | 


         13 |                 0.25 |                  1.00e-07 |      1.859
         14 |                 0.25 |                  1.00e-07 |      1.998
         15 |                 0.25 |                  1.00e-07 |      2.195
         16 |                 0.25 |                  1.00e-07 |      2.322
         17 |                 0.25 |                  1.00e-07 |      2.472
         18 |                 0.25 |                  1.00e-07 |      2.677
         19 |                 0.25 |                  1.00e-07 |      2.900
         20 |                 0.25 |                  1.00e-07 |      3.155


RObject{RealSxp}
           [,1]      [,2]
 [1,] 0.0000000 0.0000000
 [2,] 0.5773653 0.0000000
 [3,] 0.0000000 0.5773542
 [4,] 0.0000000 0.5773541
 [5,] 0.0000000 0.0000000
 [6,] 0.0000000 0.5773426
 [7,] 0.0000000 0.0000000
 [8,] 0.5773648 0.0000000
 [9,] 0.0000000 0.0000000
[10,] 0.5773207 0.0000000


In [21]:
R"""

TestMat <- $S 

cpp_findmultPCs_deflation(TestMat, 2, c(4, 4), numIters=20)
"""

RCall.REvalError: REvalError: Error: Not a matrix.

In [18]:
R"""

TestMat <- $Sn 

TestKS <- matrix( c(4, 4), nrow = 1, ncol = 2, byrow = TRUE)

cpp_findmultPCs_deflation(TestMat, 2, TestKS, numIters=20)
"""

---- Iterative deflation algorithm for sparse PCA with multiple PCs ---
Dimension: 10
Number of PCs: 2
Sparsity pattern:  4 4

  Iteration |      Objective value |   Orthogonality Violation |       Time
          1 |                0.353 |                  2.00e+00 |      0.105
          2 |                0.353 |                  2.00e+00 |      0.207
          3 |                0.353 |                  2.00e+00 |      0.312
          4 |                0.353 |                  2.00e+00 |      0.415
          5 |                0.353 |                  2.00e+00 |      0.518
          6 |                0.353 |                  2.00e+00 |      0.622
          7 |                0.353 |

                  2.00e+00 |      0.727
          8 |                0.353 |                  2.00e+00 |      0.832
          9 |                0.353 |                  2.00e+00 |      0.940
         10 |                0.353 |                  2.00e+00 |      1.046
         11 |                0.334 |                  1.00e-07 |      1.153
         12 |                0.334 |                  1.00e-07 |      1.259
         13 |                0.334 |                  1.00e-07 |      1.366
         14 |                0.334 |                  1.00e-07 |      1.470


RObject{RealSxp}
            [,1]       [,2]
 [1,]  0.0000000  0.0000000
 [2,]  0.0000000  0.4929815
 [3,] -0.5128363  0.0000000
 [4,] -0.5027264  0.0000000
 [5,]  0.4992676  0.0000000
 [6,] -0.4847648  0.0000000
 [7,]  0.0000000  0.0000000
 [8,]  0.0000000  0.4882265
 [9,]  0.0000000 -0.5195989
[10,]  0.0000000  0.4986192


In [19]:
[x1 x2]

10×2 Matrix{Float64}:
  0.0   0.5
  0.0   0.0
  0.0  -0.5
  0.5   0.0
  0.0   0.0
  0.5   0.0
 -0.5   0.0
  0.0   0.5
  0.0  -0.5
  0.5   0.0