## Analysis of BMI

This notebook applies [various LMM workflows](https://dianacornejo.github.io/pleiotropy_UKB/workflow) to perform association analysis for BMI.

## File paths on Yale cluster

- Genotype files in PLINK format:
`/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv`
- Genotype files in bgen format:
`SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/`
- Summary stats for imputed variants BOLT-LMM:
`/SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data`
- Summary stats for inputed variants FastGWA:
`/SAY/dbgapstg/scratch/UKBiobank/results/FastGWA_results/results_imputed_data`
- Phenotype files:
`/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis`
- Relationship file:
`/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620`

## 07/01/20 analysis

On the cluster, open up this notebook using the JupyterLab server you set up via the ssh channel, then run the following cells,

### Bash variables for workflow configurations

In [1]:
tpl_file=../farnam.yml
#
lmm_dir=~/scratch60/2020-04_bolt/INT-BMI
lmm_sos=../workflow/LMM.ipynb
lmm_sbatch=../output/070120-bolt-INT-BMI.sbatch
##
bfile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.bed
sampleFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
bgenFile=`echo /SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
phenoFile=~/project/phenotypes_UKB/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720
formatFile=~/project/UKBB_GWAS_DEV/data/boltlmm_template.yml
covarFile=~/project/phenotypes_UKB/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720
LDscoresFile=~/software/BOLT-LMM_v2.3.4/tables/LDSCORE.1000G_EUR.tab.gz
geneticMapFile=~/software/BOLT-LMM_v2.3.4/tables/genetic_map_hg19_withX.txt.gz
phenoCol=INT-BMI
covarCol=SEX
covarMaxLevels=10
qCovarCol=AGE
numThreads=20
bgenMinMAF=0.001
bgenMinINFO=0.8
lmm_job_size=1
#
clumping_dir=~/scratch60/plink-clumping
clumping_sos=../workflow/LD_clump_patch.ipynb
clumping_sbatch=../output/061620-INT-BMI_ldclumping.sbatch
##
sumstatsFiles=/home/dc2325/project/results/pleiotropy/2020-04_bolt/INT-BMI/ukb_imp_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.snp_stats.all_chr.gz
unrelated_samples=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
ld_sample_size=1210
clump_field=P_BOLT_LMM
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1

### BlotLMM analysis

In [2]:
lmm_args="""boltlmm
    --cwd $lmm_dir 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
"""

sos run ../workflow/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: 
INFO: [32mfarnam[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mfarnam[0m output:   [32m../output/070120-bolt-INT-BMI.sbatch[0m
INFO: Workflow farnam (ID=966509325ed08711) is ignored with 1 ignored step.


### LD clumping template generate and submit

In [3]:
clumping_args="""default 
    --cwd $cwd 
    --bfile $bfile 
    --bgenFile $bgenFile 
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
"""

sos run ../workflow/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

INFO: Running [32mfarnam[0m: 
INFO: [32mfarnam[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mfarnam[0m output:   [32m../output/061620-INT-BMI_ldclumping.sbatch[0m
INFO: Workflow farnam (ID=83fb83df19fb8afe) is ignored with 1 ignored step.


**Diana I think you can take from here. Please adapt the scripts below to bash variables, and use the template generate command to generate them from your notebook on cluster. Generated scripts can be found under `output` folder of the repo**

In [None]:

# Set the bash variables 
cwd=~/scratch60/plink-clumping/chr7_region
region_file=~/scratch60/plink-clumping/chr7_region/INT-BMI_region.txt
pheno_path=~/project/phenotypes_UKB/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720
geno_path=~/scratch60/plink-clumping/chr7_region/bgenfilepath.txt
bgen_sample_file=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
sumstats_path=/home/dc2325/project/results/pleiotropy/2020-04_bolt/INT-BMI/ukb_imp_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.snp_stats.all_chr.gz
unrelated_samples=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
format_config_path=~/project/UKBB_GWAS_DEV/data/boltlmm_template.yml

#Running the Region Extractiob workflow for INT-BMI trait

sos run ~/project/UKBB_GWAS_DEV/workflow/Region_Extraction.ipynb \
    --cwd $cwd \
    --region-file $region_file \
    --pheno-path $pheno_path \
    --geno-path $geno_path \
    --bgen-sample-path $bgen_sample_file \
    --sumstats-path $sumstats_path \
    --format-config-path $format_config_path \
    --unrelated-samples $unrelated_samples \
    -s build &> 062320-sos-INT-BMI-region.log

In [None]:

# Set the bash variables 
cwd=~/scratch60/plink-clumping
bfile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.bed
bgenFile=`echo /SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
sumstatsFiles=`echo /home/dc2325/scratch60/plink-clumping/*.sumstats.gz`
unrelated_samples=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
ld_sample_size=1210
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=OR
numThreads=20
job_size=1

#Running the LDclumping workflow for INT-BMI trait

sos run ~/project/UKBB_GWAS_DEV/workflow/LD_Clumping.ipynb default \
    --cwd $cwd \
    --bfile $bfile \
    --bgenFile $bgenFile \
    --sampleFile $sampleFile \
    --sumstatsFiles $sumstatsFiles \
    --unrelated_samples $unrelated_samples \
    --ld_sample_size $ld_sample_size \
    --clump_field $clump_field\
    --clump_p1 $clump_p1 \
    --clump_p2 $clump_p2 \
    --clump_r2 $clump_r2 \
    --clump_kb $clump_kb \
    --clump_annotate $clump_annotate \
    --numThreads $numThreads \
    --job_size $job_size \
    -c ~/project/UKBB_GWAS_DEV/farnam.yml -q farnam -J 40 \
    -s build &> 062420-sos-INT-BMI_asthma-ldclumping.log

In [None]:

# Set the bash variables 
cwd=~/scratch60/region_extract
region_file=~/scratch60/plink-clumping/asthma.sumstats_INT_BMI.sumstats.clumped_region
pheno_path=/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/normalized_phenotypes/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720
geno_path=~/scratch60/plink-clumping/chr7_region/bgenfilepath.txt
bgen_sample_file=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
sumstats_path=/SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-BMI/ukb_imp_v3.UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720.BoltLMM.snp_stats.all_chr.gz
unrelated_samples=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
format_config_path=~/project/UKBB_GWAS_DEV/data/boltlmm_template.yml
job_size=10

#Running the Region Extraction workflow for INT-BMI trait

sos run ~/project/UKBB_GWAS_DEV/workflow/Region_Extraction.ipynb \
    --cwd $cwd \
    --region-file $region_file \
    --pheno-path $pheno_path \
    --geno-path $geno_path \
    --bgen-sample-path $bgen_sample_file \
    --sumstats-path $sumstats_path \
    --format-config-path $format_config_path \
    --unrelated-samples $unrelated_samples \
    --job_size $job_size \
    -c ~/project/UKBB_GWAS_DEV/farnam.yml -q farnam -J 40 \
    -s build &> 070120-sos-INT-BMI-region.log