## Analysis of BMI

This notebook applies [various LMM workflows](https://dianacornejo.github.io/pleiotropy_UKB/workflow) to perform association analysis for BMI.

## File paths on Yale cluster

- Genotype files in PLINK format:
`/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv`
- Genotype files in bgen format:
`SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/`
- Summary stats for imputed variants BOLT-LMM:
`/SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data`
- Summary stats for inputed variants FastGWA:
`/SAY/dbgapstg/scratch/UKBiobank/results/FastGWA_results/results_imputed_data`
- Phenotype files:
`/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis`
- Relationship file:
`/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620`

In [1]:
%save INT-BMI_boltlmm.sbatch -f

#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu 1G
#SBATCH --time 3-0:00:00
#SBATCH --job-name sos-submission
#SBATCH --output sos-submission-%J.out
#SBATCH --error sos-submission-%J.log

# Defining bash variables for the different paths,

cwd=~/scratch60/2020_04_bolt
bfile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.bed
sampleFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset
/ukb32285_imputedindiv.sample
bgenFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/*.bgen
phenoFile=~/project/phenotypes_UKB/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720
covarFile=~/project/phenotypes_UKB/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720
LDscoresFile=~/software/BOLT-LMM_v2.3.4/tables/LDSCORE.1000G_EUR.tab.gz
geneticMapFile=~/software/BOLT-LMM_v2.3.4/tables/genetic_map_hg19_withX.txt.gz
phenoCol=INT-BMI
covarCol=SEX
covarMaxLevels=10
qCovarCol=AGE
numThreads=20
bgenMinMAF=0.001
bgenMinINFO=0.8
job-size=1

#Running the workflow for BMI trait. Here, sex and age where used as covariates

sos run ~/project/pleiotropy_UKB/workflow/LMM.ipynb boltlmm \
    --cwd $cwd \
    --bfile $bfile \
    --sampleFile $sampleFile \
    --bgenFile $bgenFile \
    --phenoFile $phenoFile \
    --covarFile $covarFile \
    --LDscoresFile $LDscoresFile \
    --geneticMapFile $geneticMapFile \
    --phenoCol $phenoCol \
    --covarCol $covarCol \
    --covarMaxLevels $covarMaxLevels \
    --qCovarCol $qCovarCol \
    --numThreads $numThreads \
    --bgenMinMAF $bgenMinMAF \
    --bgenMinINFO $bgenMinINFO \
    --job-size $job-size \
    -c ~/project/pleiotropy_UKB/farnam.yml -q farnam -J 40 \
    -s build &> 060820-sos-INT-BMI-boltlmm.log

In [None]:
%save INT-BMI_clumping.sbatch -f

#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu 1G
#SBATCH --time 3-0:00:00
#SBATCH --job-name sos-submission
#SBATCH --output sos-submission-%J.out
#SBATCH --error sos-submission-%J.log

# Set the bash variables 
cwd=~/scratch60/plink-clumping
bfile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.bed
bgenFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/*.bgen
sampleFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
sumstatsFile=/home/dc2325/project/results/pleiotropy/2020-04_bolt/INT-BMI/ukb_imp_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.snp_stats.all_chr.gz
unrelated_samples=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
clump_field=P_BOLT_LMM
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
job_size=1
numThreads=20

#Running the LDclumping workflow for INT-BMI trait

sos run ~/project/pleiotropy_UKB/workflow/LDclumping.ipynb step \
    --cwd $output \
    --bfile $bfile \
    --sampleFile $sampleFile \
    --bgenFile $bgenFile \
    --numThreads $numThreads \
    -c ~/project/pleiotropy_UKB/farnam.yml -q farnam -J 40 \
    -s build &> sos-INT-BMI-060820.log