# Overview
`corrMatrix.py` is a command line tool to calculate the matrix of measurement error correlations **R** in with minimal user imput. 

# Arguments
The `corrMatrix.py` program takes the following arguments:
- `-data`: (Required) comma-separated list of filepaths to all GWAS sumstats
- `-snp`: (Required) comma-separated list of SNP-identifying (i.e., rsID) column names for all GWAS sumstats (order must correspond to the order of the list passed to `-data`)
- `-beta`: (Required) comma-separated list of names of BETA columns in GWAS sumstats (order matters, always)
- `-se`: (Required) comma-separated list of names of SE columns in GWAS sumstats
- `-pt`: (Optional) P-value threshold. Only SNPs with all GWAS estimates with P>this threshold will be used to calculate **R**.
  - We recommend using `-pt 0.05`, which is the default value
- `-names`: (Required) Comma-separated list of row, column names to assign to the correlation matrix **R**
  - order corresponds to the order of data passed to `-data`
- `-out`: (Required) Filepath (extension optional, will be space-separated) of location to write out correlation matrix **R**
  - The correlation matrix **R** will also be printed on the screen
  
# Example
Download the `corrMatrix.py` tool from our Github:

In [None]:
wget https://raw.githubusercontent.com/noahlorinczcomi/MRBEE/main/corrMatrix.py

Find the directory containing all of your GWAS summary statistics data sets. In this example, it will be the following:

In [None]:
cd /newdir
pwd

Which will show you something like
```unix
gwasOutcomeset1.csv.gz  gwasOutcomeset2.txt  gwasExposureset1.txt.gz  gwasExposureset2.csv  corrMatrix.py
```

As implied, the GWAS summary statistic files can have different extensions/delimiters.

You can use `corrMatrix.py` like this:

In [None]:
python corrMatrix.py \
 -data gwasOutcomeset1.csv.gz,gwasOutcomeset2.txt,gwasExposureset1.txt.gz,gwasExposureset2.csv \ 
 -snp rsID,rsID,rsID,rsID \
 -beta betaOutcome1,betaOutcome2,betaExposure1,betaExposure2 \ 
 -se seOutcome1,seOutcome2,seExposure1,seExposure2 
 -pt 0.05 \
 -names y1,y2,x1,x2 \
 -out R

which will print output similar to the following:
```unix
NOTE: -snp, -beta, -se, and -names flag declarations must be in the corresponding order of -data declarations
the program is running
          y1        y2        x1        x2
y1  1.000000  0.001672 -0.003739  0.000473
y2  0.001672  1.000000 -0.002589 -0.000470
x1 -0.003739 -0.002589  1.000000  0.005381
x2  0.000473 -0.000470  0.005381  1.000000
81038 SNPs used in correlation matrix estimation
```
and took approximately 10 seconds to run for 1,000,000 SNPs.

The correlation matrix $\mathbf{R}$ will be stored in the `newdir/` directory in a space-delimited file named `R.txt` unless otherwise specifed by changing the argument given to the `out` flag.

See `python corrMatrix.py --help` for additional guidance:
```unix

usage: Calculation of Correlation [-h] [-data DATA] [-snp SNP] [-beta BETA]
                                  [-se SE] [-pt PT] [-names NAMES] [-out OUT]

This program calculates the correlation between GWAS estimates for a pair of
phenotypes

optional arguments:
  -h, --help    show this help message and exit
  -data DATA    (Required) comma-separated list of filepaths to all GWAS
                sumstats
  -snp SNP      (Required) comma-separated list of SNP column names for all
                GWAS sumstats
  -beta BETA    (Required) comma-separated list of names of BETA columns in
                GWAS sumstats
  -se SE        (Required) comma-separated list of names of BETA columns in
                GWAS sumstats
  -pt PT        (Required) P-value threshold. Only SNPs with all GWAS
                estimates P>this threshold will be considered
  -names NAMES  (Required) Comma-separated list of row, column names to assign
                to the correlation matrix
  -out OUT      (Required) Directory (file location w/o extension) of location
                to write out correlation matrix
```