# Simulation Study

In this notebook, we reproduce the simulation study for the configurations stated. First, we import the corresponding R scripts and libraries needed. 

In [1]:
# scripts and libraries
remove(list = ls())
options(warn = -1)
suppressMessages(library(dplyr))
suppressMessages(library(data.table))
suppressMessages(library(xtable))

source("../source/simulations.R")
source("../source/vectorial_methods.R")
source("../source/auxiliar_methods.R")


## Low Dimensional Setting

The configuration for the low dimensional case is:

In [2]:
m <- 11
r_values <- c(10,9,8)
i_values <- matrix(c(11,0,10,0,9,1,8,2),nrow=4,ncol=2,byrow = TRUE)
rownames(i_values) <- c('Case 1','Case 2','Case 3','Case 4')
Tt <- 100 # series length
S <- 500 # number of simulation
persistence <- "low" ; dist <- "t" # persistence and innovation process distribution
dependence <- TRUE

We run the simulation

In [3]:
# df_low_dimension <- run_simulation(c(1,1),m,r_values,i_values,Tt,S,
#                     dist = dist, persistence = persistence, dependence = dependence)
df_low_dimension <- read.csv("../databases/simulations_low_dimension.csv")

Then, we preprocess the output for showing the results in a LaTeX table

In [4]:
dt_low_dimension <- as.data.table(df_low_dimension)
dt_low_dimension[, mean_n_coint := sprintf("%.3f (%.3f)", mean_n_coint, sd_n_coint)]
dt_low_dimension[, mean_n_norms := sprintf("%.3f (%.3f)", mean_n_norms, sd_n_norms)]

# Remove SD columns as they are now embedded
dt_low_dimension[, c("sd_n_coint", "sd_n_norms") := NULL]
dt_low_dimension[, c("i1", "i2") := NULL]
setnames(dt_low_dimension, old = c("mean_n_coint","mean_n_norms"),
                     new = c("Dimension", "Subspace"))

We print the LaTeX table presented in the article

In [5]:
# generate tables for each case 
cases <- rev(unique(dt_low_dimension[,Case]))
for(case in cases){
    dt_aux <- dt_low_dimension[Case == case] ; dt_aux[,c("Case","m","Dimension","X") := NULL]
    dt_aux <- dt_aux[order(r,decreasing = TRUE)]
    print(xtable(dt_aux, caption = case, digits = 0), include.rownames = FALSE)
}

% latex table generated in R 4.1.2 by xtable 1.8-4 package
% Wed May 21 13:56:57 2025
\begin{table}[ht]
\centering
\begin{tabular}{lrl}
  \hline
Method & r & Subspace \\ 
  \hline
Johansen & 10 & 3.808 (0.532) \\ 
  PCA & 10 & 0.913 (0.500) \\ 
  PLS & 10 & 0.922 (0.508) \\ 
  Johansen & 9 & 3.540 (0.511) \\ 
  PCA & 9 & 1.288 (0.521) \\ 
  PLS & 9 & 1.197 (0.553) \\ 
  Johansen & 8 & 3.401 (0.423) \\ 
  PCA & 8 & 1.904 (0.348) \\ 
  PLS & 8 & 1.819 (0.395) \\ 
   \hline
\end{tabular}
\caption{Case 1} 
\end{table}
% latex table generated in R 4.1.2 by xtable 1.8-4 package
% Wed May 21 13:56:57 2025
\begin{table}[ht]
\centering
\begin{tabular}{lrl}
  \hline
Method & r & Subspace \\ 
  \hline
Johansen & 10 & 3.769 (0.499) \\ 
  PCA & 10 & 1.047 (0.516) \\ 
  PLS & 10 & 0.994 (0.480) \\ 
  Johansen & 9 & 3.560 (0.490) \\ 
  PCA & 9 & 1.355 (0.504) \\ 
  PLS & 9 & 1.225 (0.547) \\ 
  Johansen & 8 & 3.371 (0.408) \\ 
  PCA & 8 & 1.935 (0.347) \\ 
  PLS & 8 & 1.863 (0.367) \\ 
   \hline
\end

## High Dimensional Setting

Now, we set the cases for the high dimensional setting

In [6]:
m <- 300
r_values <- c(250,200,150)
i_values <- matrix(c(300,0,250,10,200,20,150,30),nrow=4,ncol=2,byrow = TRUE)
rownames(i_values) <- c('Case 1','Case 2','Case 3','Case 4')
Tt <- 100 # series length
S <- 100 # number of simulation
persistence <- "low" ; dist <- "t" # persistence and innovation process distribution
dependence <- TRUE

We run the simulation

In [10]:
# df_high_dimension <- run_simulation(c(1,1),m,r_values,i_values,Tt,S,
#                     dist = dist,persistence = persistence, dependence = dependence)
df_high_dimension <- read.csv("../databases/simulations_high_dimension.csv")

We preprocess the output

In [11]:
dt_high_dimension <- as.data.table(df_high_dimension)
dt_high_dimension[, mean_n_coint := sprintf("%.3f (%.3f)", mean_n_coint, sd_n_coint)]
dt_high_dimension[, mean_n_norms := sprintf("%.3f (%.3f)", mean_n_norms, sd_n_norms)]

# Remove SD columns as they are now embedded
dt_high_dimension[, c("sd_n_coint", "sd_n_norms") := NULL]
dt_high_dimension[, c("i1", "i2") := NULL]
setnames(dt_high_dimension, old = c("mean_n_coint","mean_n_norms"),
                     new = c("Dimension", "Subspace"))

Finally, we generate the tables

In [12]:
# generate tables for each case 
cases <- rev(unique(dt_high_dimension[,Case]))
for(case in cases){
    dt_aux <- dt_high_dimension[Case == case] ; dt_aux[,c("Case","m","Dimension","X") := NULL]
    dt_aux <- dt_aux[order(r,decreasing = TRUE)]
    print(xtable(dt_aux, caption = case), include.rownames = FALSE)
}

% latex table generated in R 4.1.2 by xtable 1.8-4 package
% Wed May 21 14:06:31 2025
\begin{table}[ht]
\centering
\begin{tabular}{lrl}
  \hline
Method & r & Subspace \\ 
  \hline
PCA & 250 & 10.811 (0.075) \\ 
  PLS & 250 & 15.902 (0.071) \\ 
  PCA & 200 & 15.538 (0.028) \\ 
  PLS & 200 & 18.147 (0.039) \\ 
  PCA & 150 & 19.100 (0.024) \\ 
  PLS & 150 & 20.364 (0.027) \\ 
   \hline
\end{tabular}
\caption{Case 1} 
\end{table}
% latex table generated in R 4.1.2 by xtable 1.8-4 package
% Wed May 21 14:06:31 2025
\begin{table}[ht]
\centering
\begin{tabular}{lrl}
  \hline
Method & r & Subspace \\ 
  \hline
PCA & 250 & 10.648 (0.137) \\ 
  PLS & 250 & 15.653 (0.070) \\ 
  PCA & 200 & 15.481 (0.075) \\ 
  PLS & 200 & 17.947 (0.034) \\ 
  PCA & 150 & 19.057 (0.056) \\ 
  PLS & 150 & 20.221 (0.025) \\ 
   \hline
\end{tabular}
\caption{Case 2} 
\end{table}
% latex table generated in R 4.1.2 by xtable 1.8-4 package
% Wed May 21 14:06:31 2025
\begin{table}[ht]
\centering
\begin{tabular}{lrl}
  \h