This repository contains the code to implement all examples in our paper: An Assumption-Free Exact Test For Fixed-Design Linear Models With Exchangeable Errors.
The folder code/ contains all R files for implementation and bash files for submitting the jobs to a cluster with Slurm system. The total computation load is ~20000 CPU hours to reproduce all experimental results in this paper. I used 256 cores for ~7 days.
CPT.Rcontains the implementation of cyclic permutation test. It depends on the "gaoptim" package for the genetic algorithm which has been removed from CRAN on 2018-06-17. But it has better performance than other CRAN packages for our purpose so we keep it. The following code can be used to download the package.
devtools::install_url("https://cran.r-project.org/src/contrib/Archive/gaoptim/gaoptim_1.1.tar.gz")
CPT_expr_prepareX.Rcontains the template to search for good ordering for CPT using Genetic Algorithm. It takes four external inputs:Xdistfor the type of design matrices,ratiofor the ratio between n and p,nindsfor the number of indices to be tested,seedfor random seed. The output includes the desing matrix (X), a weak ordering by running the genetic algorithm with 1000 samples (ordering1) and a strong ordering by running the genetic algorithm with 10000 samples (ordering). To reproduce the matrices used in the paper, run the following code on Shell.
cd CPT/code
./gen_jobs_prepareX.sh params_prepareX.txtEach row in the file params_prepareX.txt contains the four external inputs, namely Xdist, ratio, ninds and seed, corresponding to an experiment. The bash file gen_jobs_prepareX.sh will automatically generate bash files to call CPT_expr_prepareX.R in results/ folder for each row of params_prepareX.txt, submit each to a core, and save the output in the data/ folder. If the user wants to reduce the number of experiments or add other experiments, the simplest way is to create another .txt file mimicing params_prepareX.txt and including the desired experimental setups and run ./gen_jobs_prepareX.sh NEW_TXT_FILE.txt.
CPT_expr.Rcontains the code template for all numerical experiments in Section 3 and Appendix A. It takes five external inputs:Xdistfor the type of design matrices,epsdistfor the error distribution,ratiofor the ratio between n and p,nindsfor the number of indices to be tested,seedfor random seed. For given inputs, it requiredCPT_expr_prepareX.Rto be ran at least once withXdist,ratio,nindsandseedso that its output is stored indata/folder; otherwise the job will be halted. The output ofCPT_expr.Ris a data frame containing the inputs as well as the relative signal strength and the power for each method. To reproduce the results in Section 3 and Appendix A, first run the following code on Shell.
cd CPT/code
./gen_jobs_CPT.sh params_CPT_expr.txt
Each row in the file params_CPT_expr.txt contains the five external inputs, namely Xdist, epsdist, ratio, ninds and seed, corresponding to an experiment. The bash file gen_jobs_CPT.sh will automatically generate bash files to call CPT_expr.R in results/ folder for each row of params_CPT_expr.txt, submit each to a core, and save the output in the data/ folder. If the user wants to reduce the number of experiments or add other experiments, the simplest way is to create another .txt file mimicing params_CPT_expr.txt and including the desired experimental setups and run ./gen_jobs_CPT.sh NEW_TXT_FILE.txt. After the jobs are done, run the following code to aggregate the results from each core.
R CMD BATCH agglomerate.R
This will generate a file power_CPT_expr.RData in data/ folder. It is currently included for users who do not want to reproduce the experiments.
CPT_GA_SS.Rcontains the code template to compare genetic algorithm with stochastic search as in Figure 2. It takes four external inputs:Xdistfor the type of design matrices,algofor the algorithm (GA or SS),popSizefor the population size of GA (not matter for SS), andseedfor random seed. The output ofCPT_GA_SS.Ris a vector of O* values. To reproduce the results in Figure 2, first run the following code on Shell.
cd CPT/code
./gen_jobs_GA_SS.sh params_GA_SS.txt
Each row in the file params_GA_SS.txt contains the four external inputs, namely Xdist, algo, popSize and seed, corresponding to an experiment. The bash file gen_jobs_GA_SS.sh will automatically generate bash files to call CPT_GA_SS.R in results/ folder for each row of params_GA_SS.txt, submit each to a core, and save the output in the data/ folder. If the user wants to reduce the number of experiments or add other experiments, the simplest way is to create another .txt file mimicing params_GA_SS.txt and including the desired experimental setups and run ./gen_jobs_GA_SS.sh NEW_TXT_FILE.txt. After the jobs are done, run the following code to aggregate the results from each core.
R CMD BATCH agglomerate.R
This will generate a file CPT_GA_SS.RData in data/ folder. It is currently included for users who do not want to reproduce the experiments.
illustrate_perm.Rproduces Figure 1.CPT_GA_plot.Rproduces Figure 2.CPT_plot.Rproduces all Figure 3-16.expr_functions.Rcontains all helper functions.