# BOLT-LMM analyses for UK Biobank data: quantitative traits

## Aim

To perfom genetic association analysis for quantitative traits (waist-to-hip ratio, waist circumference, BMI, blood lipids) using BOLT-LMM software and UK Biobank imputed data of ~500K invidivuals

## Method and workflow overview

1. Download data from UKB: phenotype, genotypes files (`.fam`,`.bed`, `.bim`) and imputed genotypes (`.bgen`, `.bgi`, `.sample`)
2. Install BOLT-LMM in Yale's Farnam cluster. To install for source code follow instructions here: https://data.broadinstitute.org/alkesgroup/BOLT-LMM/#x1-70002.2
3. Run BOLT-LMM analysis to obtain summary statistics for association analysis

## Bolt-LMM installation in Yale's cluster

For local installs add these lines to your .bash_profile
```
# local installs
export MY_PREFIX=~/software
export PATH=$MY_PREFIX/bin:$PATH
export LD_LIBRARY_PATH=$MY_PREFIX/lib:$LD_LIBRARY_PATH
```

In [None]:
module load miniconda
module load R

In [None]:
cd ~/software && mkdir bin lib && \
wget https://data.broadinstitute.org/alkesgroup/BOLT-LMM/downloads/BOLT-LMM_v2.3.4.tar.gz && \
tar -zxvf BOLT-LMM_v2.3.4.tar.gz && \
rm -rf BOLT-LMM_v2.3.4.tar.gz && \
cp BOLT-LMM_v2.3.4/bolt ~/software/bin/ && \
cp BOLT-LMM_v2.3.4/lib/* ~/software/lib/

## Input data

1. Genotype file for constructing the GRM (genetic relationship matrix) formated as a plink binary file `--bfile=prefix`
2. Reference genetic maps `--geneticMapFile=tables/genetic_map_hg##.txt.gz`
3. Imputed genotype dosages in `.bgen` format specify the options with  `--bgenFile` and ` --sampleFile`
4. Phenotype file (white space delimited file with column headers, first two columns should be FID and IID) specify files by options `--phenoFile` and the phenotype to be analized by `--phenoCol`
5. Covariates file (same format as phenoFile) specify them by `--covarFile` for qualitative covariates use `--covarCol` and for quantitative `--qCovarCol`. Use `--covarMaxLevels` to specify the number of categories of a qualitative covariate. To specify an array of quantitative covariates use `--qCovarCol=PC{1:20}`

Note: reference genome used **GRCh37/hg19**

## Global parameter setting

In [None]:
[global]
# Genotype files in plink binary format
parameter: bfile = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv') 
# Phenotype file for quantitative trait (BMI)
parameter: phenoFile = path('/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_caucasians_BMI_nopreg_adjagesex_residuals_andstandardized_022720') 
# Phenotype to be analyzed (specify the column)
parameter: phenoCol = 'residual'
# Covariate file path
parameter: covarFile = path('/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_withagesex_033120') # path to covariates file
# Qualitative covariates to be used in the analysis
parameter: covarCol1 = 'SEX'
# Maximum categories of covariates allowed 
parameter: covarMaxLevels = '10'
# Quantitative covariates to be used in the analysis
parameter: qCovarCol2 = 'AGE'
#parameter: qCovarCol3 = 'PC{1:10}' # if we are going to use PC as covariates uncomment
# Path to LDscore file for european population
parameter: LDscoresFile = path('~/software/BOLT-LMM_v2.3.4/tables/LDSCORE.1000G_EUR.tab.gz')
# Path to genetic map file used to interpolate genetic map coordinates from SNP physical (base pair) positions
parameter: geneticMapFile = path('~/software/BOLT-LMM_v2.3.4/tables/genetic_map_hg19.txt.gz')
# Specific number of threads to use
parameter: numThreads = '8'
#Name of the output files of BOLT-LMM association stats
parameter: statsFile = 'bolt_500UKB_selfRepWhite.BMI.stats.gz'
# Path to bgen files
parameter: bgenFile = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1:22}_v3.bgen')
# Minimum MAF to be used
parameter: bgenMinMAF = '0.001'
# Mimimun info score to be used
parameter: bgenMinINFO = '0.8'
# Path to sample file
parameter: sampleFile = path('/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample')
# The name of the output file where the association results are stored
parameter: statsFileBgenSnps = 'bolt_500K_selfRepWhite.BMI.bgen.stats.gz'
# the output directory for generated files
parameter: out_dir = path('~/results/pleiotropy/2020-04_bolt')

## Running BOLT-LMM

In [None]:
./bolt \
    --bfile=${bfile} \
    --phenoFile=${phenoFile} \
    --phenoCol=${phenoCol} \
    --covarFile=${covarFile} \
    --covarCol=${covarCol1} \
    --covarMaxLevels=${covarMaxLevels} \
    --qCovarCol=${qCovarCol2} \
    --LDscoresFile=${LDscoresFile} \
    --geneticMapFile=${geneticMapFile} \
    --lmmForceNonInf \
    --numThreads=${numThreads} \
    --statsFile=${out_dir}/${statsFile} \
    --bgenFile=${bgenFile} \
    --bgenMinMAF=${bgenMinMAF} \
    --bgenMinINFO=${bgenMinINFO} \
    --sampleFile=${sampleFile} \
    --statsFileBgenSnps=${out_dir}/${statsFileBgenSnps} \
    --verboseStats

## Results