# SAIGE Prototype

This notebook demonstrates a prototype of [SAIGE](https://saigegit.github.io/SAIGE-doc/) that supports reading from VCF Zarr stores.

The code for the protoype is available at https://github.com/Will-Tyler/SAIGE

To see the differences between the prototype and the upstream code, see the full diff [here](https://github.com/saigegit/SAIGE/compare/main...Will-Tyler:SAIGE:main). The main non-boilerplate changes needed is the new [VCZ.cpp](https://github.com/Will-Tyler/SAIGE/blob/2f0ad0d5cf612136487c22ea60bf53271b5bfe0a/src/VCZ.cpp) file which implements the variant file access interface in SAIGE.

Run the `setup.sh` script before using this notebook to install SAIGE and create a Conda environment for SAIGE.

## IMPORTANT!

TensorStore does not currently support string data, and so we could not exactly replicate the output of the other backends. Numerical output was exactly reproduced.

In [1]:
!pip install tstrait



### Simulate phenotypes

In [2]:
import tskit
import tstrait

ts = tskit.load('../scaling/data/chr21_10_5.ts')
model = tstrait.trait_model(distribution='normal', mean=0, var=1)
sim_result = tstrait.sim_phenotype(ts=ts, model=model, h2=0.3)

In [3]:
sim_result.trait

Unnamed: 0,position,site_id,effect_size,causal_allele,allele_freq,trait_id
0,18873290,324108,-2.098933,G,4e-05,0


In [4]:
sim_result.phenotype

Unnamed: 0,trait_id,individual_id,genetic_value,environmental_noise,phenotype
0,0,0,0.0,-0.032131,-0.032131
1,0,1,0.0,-0.023550,-0.023550
2,0,2,0.0,-0.039558,-0.039558
3,0,3,0.0,-0.004427,-0.004427
4,0,4,0.0,0.009059,0.009059
...,...,...,...,...,...
286713,0,286713,0.0,-0.003493,-0.003493
286714,0,286714,0.0,-0.002509,-0.002509
286715,0,286715,0.0,-0.024216,-0.024216
286716,0,286716,0.0,-0.014687,-0.014687


In [5]:
phenotype = sim_result.phenotype
phenotype['sample_id'] = 'tsk_' + phenotype['individual_id'].astype(str)
phenotype

Unnamed: 0,trait_id,individual_id,genetic_value,environmental_noise,phenotype,sample_id
0,0,0,0.0,-0.032131,-0.032131,tsk_0
1,0,1,0.0,-0.023550,-0.023550,tsk_1
2,0,2,0.0,-0.039558,-0.039558,tsk_2
3,0,3,0.0,-0.004427,-0.004427,tsk_3
4,0,4,0.0,0.009059,0.009059,tsk_4
...,...,...,...,...,...,...
286713,0,286713,0.0,-0.003493,-0.003493,tsk_286713
286714,0,286714,0.0,-0.002509,-0.002509,tsk_286714
286715,0,286715,0.0,-0.024216,-0.024216,tsk_286715
286716,0,286716,0.0,-0.014687,-0.014687,tsk_286716


In [6]:
# Save phenotype data to disk in the format that SAIGE expects.
phenotype[['sample_id', 'phenotype']].to_csv('chr21_10_5.phenotypes.txt', sep='\t', index=False)

### SAIGE workflow - initial setup steps

In [7]:
!plink2 --vcf ../scaling/data/chr21_10_5.vcf.gz --make-bed --out ./chr21_10_5 --max-alleles 2

PLINK v2.00a3 SSE4.2 (18 Feb 2022)             www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to ./chr21_10_5.log.
Options in effect:
  --make-bed
  --max-alleles 2
  --out ./chr21_10_5
  --vcf ../scaling/data/chr21_10_5.vcf.gz

Start time: Wed Jan 29 18:00:35 2025
31922 MiB RAM detected; reserving 15961 MiB for main workspace.
Using up to 8 compute threads.
--vcf: 2365367 variants scanned.
--vcf: ./chr21_10_5-temporary.pgen + ./chr21_10_5-temporary.pvar.zst +
./chr21_10_5-temporary.psam written.
100000 samples (0 females, 0 males, 100000 ambiguous; 100000 founders) loaded
from ./chr21_10_5-temporary.psam.
2308391 out of 2365367 variants loaded from ./chr21_10_5-temporary.pvar.zst.
Note: No phenotype data present.
2308391 variants remaining after main filters.
Writing ./chr21_10_5.fam ... done.
Writing ./chr21_10_5.bim ... done.
Writing ./chr21_10_5.bed ... 11131619%
Error: File write failure: No space left on devi

In [8]:
%%bash
#export PATH="/opt/miniconda3/bin:$PATH"
conda run -n saige Rscript SAIGE/extdata/step1_fitNULLGLMM.R     \
        --plinkFile=./chr21_10_5  \
        --useSparseGRMtoFitNULL=FALSE    \
        --phenoFile=./chr21_10_5.phenotypes.txt \
        --phenoCol=phenotype \
        --sampleIDColinphenoFile=sample_id \
        --invNormalize=TRUE     \
        --traitType=quantitative        \
        --outputPrefix=./chr21_10_5.model \
        --nThreads=24   \
        --IsOverwriteVarianceRatioFile=TRUE

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Linux Mint 21.3

Matrix products: default
BLAS/LAPACK: /home/benj/miniconda3/envs/saige/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] optparse_1.7.3 SAIGE_1.3.6   

loaded via a namespace (and not attached):
[1] compiler_4.3.1     Matrix_1.6-1.1     Rcpp_1.0.11        getopt_1.20.4     
[5] grid_4.3.1         data.table_1.14.8  RcppParallel_5.1.7 lattice_0.21

Only one variance ratio will be estimated using randomly selected markers with MAC >= 20
27 th marker in geno  1 
MAC:  51 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
53 th marker in geno  1 
MAC:  12626 
G0 0 0 0 0 0 0 0 0 0 1 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
19 th marker in geno  1 
MAC:  41 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
4 th marker in geno  1 
MAC:  87 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
49 th marker in geno  1 
MAC:  1158 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
40 th marker in geno  1 
MAC:  38 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
41 th marker in geno  1 
MAC:  36061 
G0 1 2 2 1 1 2 0 2 2 2 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
36 th marker in geno  1 
MAC:  135 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from getPCG1ofSigmaAndVector 1
58 th marker in geno  1 
MAC:  167 
G0 0 0 0 0 0 0 0 0 0 0 
CHR  1 
iter from g

Loading required package: optparse






# Benchmark

Here we time and run [step 2](https://saigegit.github.io/SAIGE-doc/docs/single_step2.html) of the single-variant association test using the VCF/BCF/Savvy and zarr data.

## BCF

In [9]:
%%bash

#export PATH="/opt/miniconda3/bin:$PATH"
time conda run -n saige Rscript SAIGE/extdata/step2_SPAtests.R        \
        --vcfFile=../scaling/data/chr21_10_5.bcf \
        --vcfFileIndex=../scaling/data/chr21_10_5.bcf.csi \
        --vcfField=GT   \
        --SAIGEOutputFile=./chr21_10_5.bcf_results.txt \
        --chrom=1       \
        --minMAF=0 \
        --minMAC=20 \
        --GMMATmodelFile=./chr21_10_5.model.rda \
        --varianceRatioFile=./chr21_10_5.model.varianceRatio.txt  \
        --is_Firth_beta=TRUE    \
        --pCutoffforFirth=0.05 \
        --is_output_moreDetails=TRUE    \
        --LOCO=FALSE

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Linux Mint 21.3

Matrix products: default
BLAS/LAPACK: /home/benj/miniconda3/envs/saige/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8   optparse_1.7.3      RhpcBLASctl_0.23-42
[4] SAIGE_1.3.6        

loaded via a namespace (and not attached):
[1] compiler_4.3.1     Matrix_1.6-1.1     Rcpp_1.0.11        getopt_1.20.4     
[5] grid_4.3.1     

Completed 10000/10000 markers in the chunk.
2723 markers were tested.
write to output
   user  system elapsed 
401.222   0.859 401.806 
isVcfEnd  FALSE 
(2025-01-29 18:34:31.052541) ---- Analyzing Chunk 22 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3027 markers were tested.
write to output
   user  system elapsed 
418.881   0.875 419.482 
isVcfEnd  FALSE 
(2025-01-29 18:34:48.728067) ---- Analyzing Chunk 23 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2877 markers were tested.
write to output
   user  system elapsed 
437.462   0.895 438.084 
isVcfEnd  FALSE 
(2025-01-29 18:35:07.329529) ---- Analyzing Chunk 24 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2814 markers were tested.
write to output
   user  system elapsed 
454.521   0.903 455.150 
isVcfEnd  FALSE 
(2025-01-29 18:35:24.396164) ---- Analyzing Chunk 25 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2896 markers were te

2885 markers were tested.
write to output
    user   system  elapsed 
1026.895    1.583 1028.337 
isVcfEnd  FALSE 
(2025-01-29 18:44:57.580146) ---- Analyzing Chunk 57 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2949 markers were tested.
write to output
    user   system  elapsed 
1044.598    1.603 1046.061 
isVcfEnd  FALSE 
(2025-01-29 18:45:15.306374) ---- Analyzing Chunk 58 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2917 markers were tested.
write to output
    user   system  elapsed 
1061.983    1.619 1063.463 
isVcfEnd  FALSE 
(2025-01-29 18:45:32.708925) ---- Analyzing Chunk 59 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2872 markers were tested.
write to output
    user   system  elapsed 
1079.272    1.631 1080.765 
isVcfEnd  FALSE 
(2025-01-29 18:45:50.013321) ---- Analyzing Chunk 60 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2857 markers were tested.
write to outpu

    user   system  elapsed 
1625.626    2.207 1627.789 
isVcfEnd  FALSE 
(2025-01-29 18:54:57.035075) ---- Analyzing Chunk 91 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2929 markers were tested.
write to output
    user   system  elapsed 
1644.230    2.243 1646.430 
isVcfEnd  FALSE 
(2025-01-29 18:55:15.674705) ---- Analyzing Chunk 92 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2835 markers were tested.
write to output
    user   system  elapsed 
1665.413    2.251 1667.623 
isVcfEnd  FALSE 
(2025-01-29 18:55:36.868626) ---- Analyzing Chunk 93 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2894 markers were tested.
write to output
    user   system  elapsed 
1682.534    2.267 1684.760 
isVcfEnd  FALSE 
(2025-01-29 18:55:54.001776) ---- Analyzing Chunk 94 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2910 markers were tested.
write to output
    user   system  elapsed 
1699.607    

2225.957    2.839 2228.902 
isVcfEnd  FALSE 
(2025-01-29 19:04:58.159929) ---- Analyzing Chunk 125 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2945 markers were tested.
write to output
    user   system  elapsed 
2243.385    2.863 2246.355 
isVcfEnd  FALSE 
(2025-01-29 19:05:15.600533) ---- Analyzing Chunk 126 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3005 markers were tested.
write to output
    user   system  elapsed 
2261.871    2.883 2264.862 
isVcfEnd  FALSE 
(2025-01-29 19:05:34.109437) ---- Analyzing Chunk 127 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2864 markers were tested.
write to output
    user   system  elapsed 
2278.833    2.919 2281.860 
isVcfEnd  FALSE 
(2025-01-29 19:05:51.105656) ---- Analyzing Chunk 128 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3075 markers were tested.
write to output
    user   system  elapsed 
2296.073    2.927 2299.109 
isVcfEnd

isVcfEnd  FALSE 
(2025-01-29 19:15:01.783453) ---- Analyzing Chunk 159 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2922 markers were tested.
write to output
    user   system  elapsed 
2847.555    3.419 2851.230 
isVcfEnd  FALSE 
(2025-01-29 19:15:20.48687) ---- Analyzing Chunk 160 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2880 markers were tested.
write to output
    user   system  elapsed 
2866.727    3.435 2870.463 
isVcfEnd  FALSE 
(2025-01-29 19:15:39.707355) ---- Analyzing Chunk 161 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2963 markers were tested.
write to output
    user   system  elapsed 
2883.916    3.455 2887.672 
isVcfEnd  FALSE 
(2025-01-29 19:15:56.918078) ---- Analyzing Chunk 162 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2933 markers were tested.
write to output
    user   system  elapsed 
2901.110    3.463 2904.875 
isVcfEnd  FALSE 
(2025-01-29 19:16:14

(2025-01-29 19:24:49.061827) ---- Analyzing Chunk 193 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2870 markers were tested.
write to output
    user   system  elapsed 
3432.028    4.003 3436.417 
isVcfEnd  FALSE 
(2025-01-29 19:25:05.655283) ---- Analyzing Chunk 194 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2811 markers were tested.
write to output
    user   system  elapsed 
3448.528    4.027 3452.961 
isVcfEnd  FALSE 
(2025-01-29 19:25:22.19841) ---- Analyzing Chunk 195 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2821 markers were tested.
write to output
    user   system  elapsed 
3464.996    4.047 3469.449 
isVcfEnd  FALSE 
(2025-01-29 19:25:38.686804) ---- Analyzing Chunk 196 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2745 markers were tested.
write to output
    user   system  elapsed 
3481.254    4.063 3485.724 
isVcfEnd  FALSE 
(2025-01-29 19:25:54.96371) ---- Anal

Completed 10000/10000 markers in the chunk.
2817 markers were tested.
write to output
    user   system  elapsed 
4026.639    4.499 4031.689 
isVcfEnd  FALSE 
(2025-01-29 19:35:00.93054) ---- Analyzing Chunk 228 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2914 markers were tested.
write to output
    user   system  elapsed 
4045.032    4.527 4050.111 
isVcfEnd  FALSE 
(2025-01-29 19:35:19.351064) ---- Analyzing Chunk 229 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2831 markers were tested.
write to output
    user   system  elapsed 
4062.061    4.535 4067.209 
isVcfEnd  FALSE 
(2025-01-29 19:35:36.452007) ---- Analyzing Chunk 230 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2930 markers were tested.
write to output
    user   system  elapsed 
4078.763    4.551 4083.928 
isVcfEnd  FALSE 
(2025-01-29 19:35:53.171053) ---- Analyzing Chunk 231 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the 

Loading required package: RhpcBLASctl










IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)































IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)























## VCF

In [10]:
%%bash

#export PATH="/opt/miniconda3/bin:$PATH"
time conda run -n saige Rscript SAIGE/extdata/step2_SPAtests.R        \
        --vcfFile=../scaling/data/chr21_10_5.vcf.gz \
        --vcfFileIndex=../scaling/data/chr21_10_5.vcf.gz.csi \
        --vcfField=GT   \
        --SAIGEOutputFile=./chr21_10_5.vcf_results.txt \
        --chrom=1       \
        --minMAF=0 \
        --minMAC=20 \
        --GMMATmodelFile=./chr21_10_5.model.rda \
        --varianceRatioFile=./chr21_10_5.model.varianceRatio.txt  \
        --is_Firth_beta=TRUE    \
        --pCutoffforFirth=0.05 \
        --is_output_moreDetails=TRUE    \
        --LOCO=FALSE

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Linux Mint 21.3

Matrix products: default
BLAS/LAPACK: /home/benj/miniconda3/envs/saige/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8   optparse_1.7.3      RhpcBLASctl_0.23-42
[4] SAIGE_1.3.6        

loaded via a namespace (and not attached):
[1] compiler_4.3.1     Matrix_1.6-1.1     Rcpp_1.0.11        getopt_1.20.4     
[5] grid_4.3.1     

Completed 10000/10000 markers in the chunk.
2723 markers were tested.
write to output
   user  system elapsed 
897.498   0.879 898.162 
isVcfEnd  FALSE 
(2025-01-29 19:53:03.212055) ---- Analyzing Chunk 22 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3027 markers were tested.
write to output
   user  system elapsed 
940.571   0.907 941.266 
isVcfEnd  FALSE 
(2025-01-29 19:53:46.315674) ---- Analyzing Chunk 23 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2877 markers were tested.
write to output
   user  system elapsed 
981.513   0.931 982.364 
isVcfEnd  FALSE 
(2025-01-29 19:54:27.428937) ---- Analyzing Chunk 24 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2814 markers were tested.
write to output
    user   system  elapsed 
1023.012    0.955 1023.893 
isVcfEnd  FALSE 
(2025-01-29 19:55:08.960376) ---- Analyzing Chunk 25 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2896 markers w

write to output
    user   system  elapsed 
2282.614    1.559 2284.152 
isVcfEnd  FALSE 
(2025-01-29 20:16:09.202824) ---- Analyzing Chunk 56 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2885 markers were tested.
write to output
    user   system  elapsed 
2323.002    1.579 2324.562 
isVcfEnd  FALSE 
(2025-01-29 20:16:49.627463) ---- Analyzing Chunk 57 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2949 markers were tested.
write to output
    user   system  elapsed 
2363.148    1.595 2364.725 
isVcfEnd  FALSE 
(2025-01-29 20:17:29.788984) ---- Analyzing Chunk 58 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2917 markers were tested.
write to output
    user   system  elapsed 
2403.853    1.623 2405.459 
isVcfEnd  FALSE 
(2025-01-29 20:18:10.515598) ---- Analyzing Chunk 59 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2872 markers were tested.
write to output
    user   system  elaps

3659.001    2.267 3661.308 
isVcfEnd  FALSE 
(2025-01-29 20:39:06.359585) ---- Analyzing Chunk 90 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2814 markers were tested.
write to output
    user   system  elapsed 
3700.231    2.287 3702.560 
isVcfEnd  FALSE 
(2025-01-29 20:39:47.628039) ---- Analyzing Chunk 91 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2929 markers were tested.
write to output
    user   system  elapsed 
3740.338    2.303 3742.685 
isVcfEnd  FALSE 
(2025-01-29 20:40:27.735502) ---- Analyzing Chunk 92 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2835 markers were tested.
write to output
    user   system  elapsed 
3782.380    2.323 3784.751 
isVcfEnd  FALSE 
(2025-01-29 20:41:09.802024) ---- Analyzing Chunk 93 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2894 markers were tested.
write to output
    user   system  elapsed 
3823.038    2.343 3825.435 
isVcfEnd  FA

isVcfEnd  FALSE 
(2025-01-29 21:02:00.325433) ---- Analyzing Chunk 124 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2996 markers were tested.
write to output
    user   system  elapsed 
5071.980    2.787 5075.096 
isVcfEnd  FALSE 
(2025-01-29 21:02:40.146349) ---- Analyzing Chunk 125 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2945 markers were tested.
write to output
    user   system  elapsed 
5111.271    2.795 5114.396 
isVcfEnd  FALSE 
(2025-01-29 21:03:19.449358) ---- Analyzing Chunk 126 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3005 markers were tested.
write to output
    user   system  elapsed 
5150.471    2.799 5153.601 
isVcfEnd  FALSE 
(2025-01-29 21:03:58.651822) ---- Analyzing Chunk 127 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2864 markers were tested.
write to output
    user   system  elapsed 
5191.076    2.807 5194.215 
isVcfEnd  FALSE 
(2025-01-29 21:04:3

(2025-01-29 21:24:47.866347) ---- Analyzing Chunk 158 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2989 markers were tested.
write to output
    user   system  elapsed 
6438.654    3.139 6442.220 
isVcfEnd  FALSE 
(2025-01-29 21:25:27.270617) ---- Analyzing Chunk 159 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2922 markers were tested.
write to output
    user   system  elapsed 
6482.796    3.155 6486.441 
isVcfEnd  FALSE 
(2025-01-29 21:26:11.499555) ---- Analyzing Chunk 160 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2880 markers were tested.
write to output
    user   system  elapsed 
6523.233    3.175 6526.899 
isVcfEnd  FALSE 
(2025-01-29 21:26:51.949772) ---- Analyzing Chunk 161 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2963 markers were tested.
write to output
    user   system  elapsed 
6566.774    3.191 6570.457 
isVcfEnd  FALSE 
(2025-01-29 21:27:35.516577) ---- An

Completed 10000/10000 markers in the chunk.
2918 markers were tested.
write to output
    user   system  elapsed 
7871.108    3.719 7875.911 
isVcfEnd  FALSE 
(2025-01-29 21:49:20.969519) ---- Analyzing Chunk 193 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2870 markers were tested.
write to output
    user   system  elapsed 
7911.798    3.739 7916.717 
isVcfEnd  FALSE 
(2025-01-29 21:50:01.768155) ---- Analyzing Chunk 194 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2811 markers were tested.
write to output
    user   system  elapsed 
7953.785    3.763 7958.898 
isVcfEnd  FALSE 
(2025-01-29 21:50:43.948692) ---- Analyzing Chunk 195 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2821 markers were tested.
write to output
    user   system  elapsed 
7995.705    3.799 8000.972 
isVcfEnd  FALSE 
(2025-01-29 21:51:26.031777) ---- Analyzing Chunk 196 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the

2759 markers were tested.
write to output
    user   system  elapsed 
9281.719    4.519 9290.776 
isVcfEnd  FALSE 
(2025-01-29 22:12:55.826587) ---- Analyzing Chunk 227 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2817 markers were tested.
write to output
    user   system  elapsed 
9323.457    4.535 9332.575 
isVcfEnd  FALSE 
(2025-01-29 22:13:37.633773) ---- Analyzing Chunk 228 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2914 markers were tested.
write to output
    user   system  elapsed 
9365.427    4.543 9374.623 
isVcfEnd  FALSE 
(2025-01-29 22:14:19.681688) ---- Analyzing Chunk 229 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2831 markers were tested.
write to output
    user   system  elapsed 
9407.416    4.555 9416.709 
isVcfEnd  FALSE 
(2025-01-29 22:15:01.761038) ---- Analyzing Chunk 230 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2930 markers were tested.
write to o

Loading required package: RhpcBLASctl










IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)































IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)









## Savvy

In [12]:
%%bash

#export PATH="/opt/miniconda3/bin:$PATH"
time conda run -n saige Rscript SAIGE/extdata/step2_SPAtests.R        \
        --savFile=../scaling/data/chr21_10_5.sav \
        --vcfField=GT   \
        --SAIGEOutputFile=./chr21_10_5.sav_results.txt \
        --chrom=1       \
        --minMAF=0 \
        --minMAC=20 \
        --GMMATmodelFile=./chr21_10_5.model.rda \
        --varianceRatioFile=./chr21_10_5.model.varianceRatio.txt  \
        --is_Firth_beta=TRUE    \
        --pCutoffforFirth=0.05 \
        --is_output_moreDetails=TRUE    \
        --LOCO=FALSE

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Linux Mint 21.3

Matrix products: default
BLAS/LAPACK: /home/benj/miniconda3/envs/saige/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8   optparse_1.7.3      RhpcBLASctl_0.23-42
[4] SAIGE_1.3.6        

loaded via a namespace (and not attached):
[1] compiler_4.3.1     Matrix_1.6-1.1     Rcpp_1.0.11        getopt_1.20.4     
[5] grid_4.3.1     

write to output
   user  system elapsed 
 63.337   0.764  63.743 
isVcfEnd  FALSE 
(2025-01-29 22:49:50.534339) ---- Analyzing Chunk 22 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3027 markers were tested.
write to output
   user  system elapsed 
 65.266   0.772  65.680 
isVcfEnd  FALSE 
(2025-01-29 22:49:52.470888) ---- Analyzing Chunk 23 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2877 markers were tested.
write to output
   user  system elapsed 
 66.856   0.788  67.286 
isVcfEnd  FALSE 
(2025-01-29 22:49:54.077678) ---- Analyzing Chunk 24 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2814 markers were tested.
write to output
   user  system elapsed 
 68.363   0.796  68.801 
isVcfEnd  FALSE 
(2025-01-29 22:49:55.605849) ---- Analyzing Chunk 25 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2896 markers were tested.
write to output
   user  system elapsed 
 69.828   0.804  70.274

   user  system elapsed 
121.910   1.036 122.657 
isVcfEnd  FALSE 
(2025-01-29 22:50:49.448514) ---- Analyzing Chunk 57 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2949 markers were tested.
write to output
   user  system elapsed 
123.554   1.040 124.305 
isVcfEnd  FALSE 
(2025-01-29 22:50:51.096413) ---- Analyzing Chunk 58 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2917 markers were tested.
write to output
   user  system elapsed 
125.158   1.044 125.913 
isVcfEnd  FALSE 
(2025-01-29 22:50:52.703713) ---- Analyzing Chunk 59 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2872 markers were tested.
write to output
   user  system elapsed 
126.575   1.052 127.339 
isVcfEnd  FALSE 
(2025-01-29 22:50:54.130397) ---- Analyzing Chunk 60 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2857 markers were tested.
write to output
   user  system elapsed 
128.030   1.060 128.802 
isVcfEnd  FALS

175.231   1.328 176.276 
isVcfEnd  FALSE 
(2025-01-29 22:51:43.069345) ---- Analyzing Chunk 92 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2835 markers were tested.
write to output
   user  system elapsed 
176.717   1.332 177.767 
isVcfEnd  FALSE 
(2025-01-29 22:51:44.564328) ---- Analyzing Chunk 93 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2894 markers were tested.
write to output
   user  system elapsed 
178.433   1.340 179.491 
isVcfEnd  FALSE 
(2025-01-29 22:51:46.29759) ---- Analyzing Chunk 94 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2910 markers were tested.
write to output
   user  system elapsed 
179.952   1.360 181.031 
isVcfEnd  FALSE 
(2025-01-29 22:51:47.821091) ---- Analyzing Chunk 95 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2884 markers were tested.
write to output
   user  system elapsed 
181.380   1.364 182.463 
isVcfEnd  FALSE 
(2025-01-29 22:51:49.25

231.740   1.568 233.033 
isVcfEnd  FALSE 
(2025-01-29 22:52:39.832501) ---- Analyzing Chunk 127 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2864 markers were tested.
write to output
   user  system elapsed 
233.264   1.576 234.565 
isVcfEnd  FALSE 
(2025-01-29 22:52:41.356329) ---- Analyzing Chunk 128 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3075 markers were tested.
write to output
   user  system elapsed 
235.010   1.592 236.328 
isVcfEnd  FALSE 
(2025-01-29 22:52:43.119705) ---- Analyzing Chunk 129 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2890 markers were tested.
write to output
   user  system elapsed 
236.589   1.608 237.922 
isVcfEnd  FALSE 
(2025-01-29 22:52:44.715099) ---- Analyzing Chunk 130 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2926 markers were tested.
write to output
   user  system elapsed 
238.334   1.620 239.680 
isVcfEnd  FALSE 
(2025-01-29 22:52:

   user  system elapsed 
287.213   1.816 288.765 
isVcfEnd  FALSE 
(2025-01-29 22:53:35.557352) ---- Analyzing Chunk 162 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2933 markers were tested.
write to output
   user  system elapsed 
288.964   1.820 290.520 
isVcfEnd  FALSE 
(2025-01-29 22:53:37.319198) ---- Analyzing Chunk 163 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2848 markers were tested.
write to output
   user  system elapsed 
290.617   1.820 292.187 
isVcfEnd  FALSE 
(2025-01-29 22:53:38.9906) ---- Analyzing Chunk 164 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2860 markers were tested.
write to output
   user  system elapsed 
292.391   1.840 294.003 
isVcfEnd  FALSE 
(2025-01-29 22:53:40.809726) ---- Analyzing Chunk 165 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2822 markers were tested.
write to output
   user  system elapsed 
294.146   1.844 295.782 
isVcfEnd  FA

write to output
   user  system elapsed 
345.740   2.088 348.017 
isVcfEnd  FALSE 
(2025-01-29 22:54:34.817042) ---- Analyzing Chunk 197 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2834 markers were tested.
write to output
   user  system elapsed 
347.351   2.092 349.648 
isVcfEnd  FALSE 
(2025-01-29 22:54:36.446279) ---- Analyzing Chunk 198 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2995 markers were tested.
write to output
   user  system elapsed 
349.274   2.104 351.662 
isVcfEnd  FALSE 
(2025-01-29 22:54:38.463385) ---- Analyzing Chunk 199 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2770 markers were tested.
write to output
   user  system elapsed 
350.963   2.112 353.362 
isVcfEnd  FALSE 
(2025-01-29 22:54:40.160755) ---- Analyzing Chunk 200 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2758 markers were tested.
write to output
   user  system elapsed 
352.479   2.124 354

2939 markers were tested.
write to output
   user  system elapsed 
404.796   2.396 407.491 
isVcfEnd  FALSE 
(2025-01-29 22:55:34.298689) ---- Analyzing Chunk 232 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3030 markers were tested.
write to output
   user  system elapsed 
 406.53    2.40  409.23 
isVcfEnd  FALSE 
(2025-01-29 22:55:36.045939) ---- Analyzing Chunk 233 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2916 markers were tested.
write to output
   user  system elapsed 
408.436   2.412 411.148 
isVcfEnd  FALSE 
(2025-01-29 22:55:37.965203) ---- Analyzing Chunk 234 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2812 markers were tested.
write to output
   user  system elapsed 
410.068   2.412 412.781 
isVcfEnd  FALSE 
(2025-01-29 22:55:39.589198) ---- Analyzing Chunk 235 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2847 markers were tested.
write to output
   user  system el

Loading required package: RhpcBLASctl










IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)































IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)





























IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)













IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)















real	7m15.578s
user	6m58.972s
sys	0m4.029s





## Zarr

In [13]:
%%bash

#export PATH="/opt/miniconda3/bin:$PATH"
time conda run -n saige Rscript SAIGE/extdata/step2_SPAtests.R        \
        --vczFile=/home/benj/projects/vcf-zarr-publication/scaling/data/chr21_10_5.zarr \
        --vcfField=GT   \
        --SAIGEOutputFile=./chr21_10_5.vcz_results.txt \
        --chrom=1       \
        --minMAF=0 \
        --minMAC=20 \
        --GMMATmodelFile=./chr21_10_5.model.rda \
        --varianceRatioFile=./chr21_10_5.model.varianceRatio.txt  \
        --is_Firth_beta=TRUE    \
        --pCutoffforFirth=0.05 \
        --is_output_moreDetails=TRUE    \
        --LOCO=FALSE

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Linux Mint 21.3

Matrix products: default
BLAS/LAPACK: /home/benj/miniconda3/envs/saige/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8   optparse_1.7.3      RhpcBLASctl_0.23-42
[4] SAIGE_1.3.6        

loaded via a namespace (and not attached):
[1] compiler_4.3.1     Matrix_1.6-1.1     Rcpp_1.0.11        getopt_1.20.4     
[5] grid_4.3.1     

write to output
   user  system elapsed 
176.168  26.331 123.051 
isVczEnd  FALSE 
(2025-01-29 22:58:06.53773) ---- Analyzing Chunk 23 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2877 markers were tested.
write to output
   user  system elapsed 
184.463  27.652 128.808 
isVczEnd  FALSE 
(2025-01-29 22:58:12.296319) ---- Analyzing Chunk 24 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2814 markers were tested.
write to output
   user  system elapsed 
192.320  28.685 134.372 
isVczEnd  FALSE 
(2025-01-29 22:58:17.853174) ---- Analyzing Chunk 25 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2896 markers were tested.
write to output
   user  system elapsed 
199.685  29.762 139.803 
isVczEnd  FALSE 
(2025-01-29 22:58:23.289577) ---- Analyzing Chunk 26 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2933 markers were tested.
write to output
   user  system elapsed 
207.707  31.005 145.213 

   user  system elapsed 
450.405  65.701 314.465 
isVczEnd  FALSE 
(2025-01-29 23:01:17.93734) ---- Analyzing Chunk 58 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2917 markers were tested.
write to output
   user  system elapsed 
458.052  66.972 319.329 
isVczEnd  FALSE 
(2025-01-29 23:01:22.792466) ---- Analyzing Chunk 59 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2872 markers were tested.
write to output
   user  system elapsed 
466.580  68.094 325.157 
isVczEnd  FALSE 
(2025-01-29 23:01:28.637627) ---- Analyzing Chunk 60 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2857 markers were tested.
write to output
   user  system elapsed 
473.926  69.287 329.977 
isVczEnd  FALSE 
(2025-01-29 23:01:33.448284) ---- Analyzing Chunk 61 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2973 markers were tested.
write to output
   user  system elapsed 
481.631  70.485 334.775 
isVczEnd  FALSE

715.976 107.043 488.495 
isVczEnd  FALSE 
(2025-01-29 23:04:11.968308) ---- Analyzing Chunk 93 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2894 markers were tested.
write to output
   user  system elapsed 
723.391 108.260 493.227 
isVczEnd  FALSE 
(2025-01-29 23:04:16.691273) ---- Analyzing Chunk 94 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2910 markers were tested.
write to output
   user  system elapsed 
730.505 109.367 497.863 
isVczEnd  FALSE 
(2025-01-29 23:04:21.335266) ---- Analyzing Chunk 95 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2884 markers were tested.
write to output
   user  system elapsed 
738.949 110.604 503.572 
isVczEnd  FALSE 
(2025-01-29 23:04:27.05552) ---- Analyzing Chunk 96 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2887 markers were tested.
write to output
   user  system elapsed 
745.224 111.380 508.710 
isVczEnd  FALSE 
(2025-01-29 23:04:32.18

977.752 146.470 667.190 
isVczEnd  FALSE 
(2025-01-29 23:07:10.668499) ---- Analyzing Chunk 128 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3075 markers were tested.
write to output
   user  system elapsed 
985.950 147.613 672.803 
isVczEnd  FALSE 
(2025-01-29 23:07:16.274018) ---- Analyzing Chunk 129 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2890 markers were tested.
write to output
   user  system elapsed 
994.419 148.714 678.901 
isVczEnd  FALSE 
(2025-01-29 23:07:22.374181) ---- Analyzing Chunk 130 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2926 markers were tested.
write to output
    user   system  elapsed 
1001.469  149.869  683.667 
isVczEnd  FALSE 
(2025-01-29 23:07:27.139116) ---- Analyzing Chunk 131 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2791 markers were tested.
write to output
    user   system  elapsed 
1008.850  151.144  688.272 
isVczEnd  FALSE 
(2025-

isVczEnd  FALSE 
(2025-01-29 23:10:00.794814) ---- Analyzing Chunk 162 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2933 markers were tested.
write to output
    user   system  elapsed 
1240.053  187.102  843.058 
isVczEnd  FALSE 
(2025-01-29 23:10:06.542066) ---- Analyzing Chunk 163 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2848 markers were tested.
write to output
    user   system  elapsed 
1247.510  188.287  847.996 
isVczEnd  FALSE 
(2025-01-29 23:10:11.468365) ---- Analyzing Chunk 164 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2860 markers were tested.
write to output
    user   system  elapsed 
1254.989  189.593  852.756 
isVczEnd  FALSE 
(2025-01-29 23:10:16.22788) ---- Analyzing Chunk 165 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2822 markers were tested.
write to output
    user   system  elapsed 
1262.021  190.812  857.446 
isVczEnd  FALSE 
(2025-01-29 23:10:20

(2025-01-29 23:12:55.138699) ---- Analyzing Chunk 196 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2745 markers were tested.
write to output
    user   system  elapsed 
1475.243  220.237 1016.883 
isVczEnd  FALSE 
(2025-01-29 23:13:00.353358) ---- Analyzing Chunk 197 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2834 markers were tested.
write to output
    user   system  elapsed 
1480.662  220.784 1022.041 
isVczEnd  FALSE 
(2025-01-29 23:13:05.512509) ---- Analyzing Chunk 198 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2995 markers were tested.
write to output
    user   system  elapsed 
1486.801  221.476 1027.368 
isVczEnd  FALSE 
(2025-01-29 23:13:10.832183) ---- Analyzing Chunk 199 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2770 markers were tested.
write to output
    user   system  elapsed 
1492.919  222.232 1032.564 
isVczEnd  FALSE 
(2025-01-29 23:13:16.035604) ---- An

Completed 10000/10000 markers in the chunk.
2930 markers were tested.
write to output
    user   system  elapsed 
1708.276  252.463 1189.945 
isVczEnd  FALSE 
(2025-01-29 23:15:53.415706) ---- Analyzing Chunk 231 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2939 markers were tested.
write to output
    user   system  elapsed 
1716.549  253.712 1195.453 
isVczEnd  FALSE 
(2025-01-29 23:15:58.930204) ---- Analyzing Chunk 232 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
3030 markers were tested.
write to output
    user   system  elapsed 
1724.419  255.058 1200.560 
isVczEnd  FALSE 
(2025-01-29 23:16:04.031473) ---- Analyzing Chunk 233 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the chunk.
2916 markers were tested.
write to output
    user   system  elapsed 
1730.182  255.713 1205.611 
isVczEnd  FALSE 
(2025-01-29 23:16:09.082662) ---- Analyzing Chunk 234 :  chrom InitialChunk ---- 
Completed 10000/10000 markers in the

Loading required package: RhpcBLASctl







real	20m25.739s
user	29m20.935s
sys	4m21.062s


## Final timings
### BCF:
user   system  elapsed 
4195.218    4.671 4200.508
### VCF:
user  system elapsed 
9718.428    4.735 9727.995 
### Savvy:
user  system elapsed 
415.371   2.420 418.098 
### Zarr:
user   system  elapsed 
1758.976  259.924 1223.767 

### Compare results

In [14]:
import pandas as pd

vcz_results = pd.read_csv('chr21_10_5.vcz_results.txt', sep='\t')
bcf_results = pd.read_csv('chr21_10_5.bcf_results.txt', sep='\t')
sav_results = pd.read_csv('chr21_10_5.sav_results.txt', sep='\t')
vcf_results = pd.read_csv('chr21_10_5.vcf_results.txt', sep='\t')

In [15]:
vcz_results.shape == bcf_results.shape == sav_results.shape == vcz_results.shape

True

In [16]:
all(vcz_results.columns == bcf_results.columns)
all(vcz_results.columns == vcf_results.columns)
all(vcz_results.columns == sav_results.columns)

True

In [17]:
for column in vcz_results.columns:
    print(column, all(vcz_results[column] == bcf_results[column]))
    print(column, all(vcz_results[column] == vcf_results[column]))
    print(column, all(vcz_results[column] == sav_results[column]))

CHR True
CHR True
CHR True
POS True
POS True
POS True
MarkerID False
MarkerID False
MarkerID False
Allele1 False
Allele1 False
Allele1 False
Allele2 False
Allele2 False
Allele2 False
AC_Allele2 True
AC_Allele2 True
AC_Allele2 True
AF_Allele2 True
AF_Allele2 True
AF_Allele2 True
MissingRate True
MissingRate True
MissingRate True
BETA True
BETA True
BETA True
SE True
SE True
SE True
Tstat True
Tstat True
Tstat True
var True
var True
var True
p.value True
p.value True
p.value True
N True
N True
N True


### Cleanup

A script, `cleanup.sh`, is added in the same folder as this notebook for convenience.