# SAIGE analysis for UK Biobank data: binary phenotypes

## Aim

To perfom genetic association analysis for binary traits (asthma and diabetes) using SAIGE software and UK Biobank imputed data of ~500K invidivuals

## Method and workflow overview

1. Download data from UKB: phenotype, genotypes files (`.fam`,`.bed`, `.bim`) and imputed genotypes (`.bgen`, `.bgi`, `.sample`)
2. Create a conda enviroment for the installation of SAIGE in Yale's HRC cluster. Instructions in https://github.com/weizhouUMICH/SAIGE
3. Run SAIGE analysis to obtain summary statistics for association analysis

## Input data

1. Genotype file for constructing the GRM (genetic relationship matrix) formated as a plink binary file
2. Phenotype file (contains non-genetic covariates). Format is space or tab delimited with a header (one column for sample IDs and one column for each phenotype)

### Global parameter setting

In [None]:
[global]
# Genotype file in plink binary format
parameter: geno_file = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv')
# Phenotype file for binary trait 1 (asthma)
parameter: pheno_file1 = path('/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720')
# Phenotype file for binary trait 2 (diabetes)
parameter: pheno_file2 = path('/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_030720')
# Path to bgen files
parameter: bgen_file = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1:22}_v3.bgen')
# Path to bgen file index
parameter: bgen_index_file = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1:22}_v3.bgen.bgi')
# Path to sample file
parameter: sample_files = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample')
#Name of the output files for trait 1 and step 1 of SAIGE
parameter: out_file_pheno1 = 'asthma_UKB_controlsnoautoimmune'
# the output directory for generated files
parameter: out_dir = path('output')


### Step 1: fitting the null

In [None]:
Rscript step1_fitNULLGLMM.R     \
        --plinkFile=${geno_file} \
        --phenoFile=${pheno_file1} \
        --phenoCol=y_binary \
        --covarColList=x1,x2 \
        --sampleIDColinphenoFile=IID \
        --traitType=binary        \
        --outputPrefix=${out_dir}/${out_file_pheno1} \
        --nThreads=4 \
        --LOCO=FALSE

### Step 2: perform single variant association test

In [None]:
Rscript step2_SPAtests.R \
        --bgenFile=${bgen_file} \
        --bgenFileIndex=${bgen_index_file} \
        --minMAF=0.0001 \
        --minMAC=1 \
        --sampleFile=${sample_file} \
        --GMMATmodelFile=${out_dir}/${out_file_pheno1}.rda \
        --varianceRatioFile=${out_dir}/${out_file_pheno1}.varianceRatio.txt \
        --SAIGEOutputFile=${out_dir}/${out_file_pheno1}.SAIGE.bgen.txt \
        --numLinesOutput=2 \
        --IsOutputAFinCaseCtrl=TRUE

### Results

Results for the single variant association analyses can be found in file with the suffix `.SAIGE.bgen.txt`