## Analysis of Asthma

This notebook applies the `Get_Job_Script.ipynb` to automatically generate the sbatch scripts to run in Yale's cluster. The end result is to apply [various LMM workflows](https://github.com/statgenetics/UKBB_GWAS_dev/tree/master/workflow) to perform association analysis in the ASTHMA trait, do clumping analysis and extract associated regions.

## File paths on Yale cluster

- Genotype files in PLINK format:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv`
- Genotype files in bgen format:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/`
- Summary stats for imputed variants BOLT-LMM:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data`
- Summary stats for inputed variants FastGWA:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data`
- Phenotype files:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis`
- Relationship file:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620`

## File paths to specific phenotypic files for Asthma:

This were the ones used in the analysis prior to the full pipeline implementation

```
phenoFile=~/project/phenotypes_UKB/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_fastGWA.phe
covarFile=~/project/phenotypes_UKB/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_fastGWA_covSEX.txt
qcovarFile=~/project/phenotypes_UKB/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_fastGWA_covAGE.txt
```

## 07/01/20 analysis

On the cluster, open up this notebook using the JupyterLab server you set up via the ssh channel, then run the following cells,

## Bash variables for workflow configuration

In [6]:
# Common variables
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
USER_PATH=~/project
OUTPUT_PATH=../output

tpl_file=$USER_PATH/UKBB_GWAS_dev/farnam.yml
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
bgenFile=`echo /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
unrelated_samples=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
formatFile_fastgwa=$USER_PATH/UKBB_GWAS_dev/data/fastGWA_template.yml
formatFile_bolt=$USER_PATH/UKBB_GWAS_dev/data/boltlmm_template.yml
formatFile_saige=$USER_PATH/UKBB_GWAS_dev/data/saige_template.yml
container_lmm=$UKBB_PATH/lmm.sif
container_marp=$UKBB_PATH/marp.sif

# LMM directories
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/asthma
lmm_dir_bolt=$UKBB_PATH/results/BOLTLMM_results/results_imputed_data/asthma
lmm_dir_saige=$UKBB_PATH/results/SAIGE_results/results_imputed_data/asthma
lmm_sos=$USER_PATH/bioworkflows/GWAS/LMM.ipynb
lmm_sbatch_fastgwa=$OUTPUT_PATH/$(date +"%Y-%m-%d")_asthma-fastgwa.sbatch
lmm_sbatch_bolt=$OUTPUT_PATH/$(date +"%Y-%m-%d")_asthma-bolt.sbatch
lmm_sbatch_saige=$OUTPUT_PATH/$(date +"%Y-%m-%d")_asthma-saige.sbatch
phenoFile=$UKBB_PATH/phenotype_files/pleiotropy_R01/phenotypesforanalysis/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720

## LMM variables 
covarFile=$UKBB_PATH/phenotype_files/pleiotropy_R01/phenotypesforanalysis/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720
LDscoresFile=$UKBB_PATH/LDSCORE.1000G_EUR.tab.gz
geneticMapFile=$UKBB_PATH/genetic_map_hg19_withX.txt.gz
phenoCol=ASTHMA
covarCol=SEX
covarMaxLevels=10
qCovarCol=AGE
numThreads=20
bgenMinMAF=0.001
bgenMinINFO=0.8
lmm_job_size=1
ylim=0

### Specific to FastGWA
grmFile=$UKBB_PATH/results/FastGWA_results/results_imputed_data/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.grm.sp
### Specific to SAIGE
bgenMinMAC=4
trait_type=binary
loco=TRUE
sampleCol=IID

# LD clumping directories
clumping_dir=$UKBB_PATH/results/LD_clumping/asthma_INT-WHR_T2D_WAIST
clumping_sos=$USER_PATH/bioworkflows/GWAS/LD_Clumping.ipynb
clumping_sbatch=$OUTPUT_PATH/$(date +"%Y-%m-%d")_asthma_ldclumping.sbatch
## LD clumping variables
bfile_ref=$UKBB_PATH/results/LD_clumping/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.1210.ref_geno.bed
# For sumtastsFiles if more than one provide each path
# In this case asthma and INT-WHR
sumstatsFiles="/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz \
              /gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-WHR/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-WHR_withagesex_042020_rankNorm_WHR.boltlmm.snp_stats.gz \
              /gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/T2D/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_030720_T2D.fastGWA.snp_stats.gz \
              /gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-WAIST/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-WAIST_withagesex_042020_rankNorm_WAIST.boltlmm.snp_stats.gz"
ld_sample_size=1210
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1
clumpFile=$clumping_dir/asthma_INT-WHR_T2D_WAIST_ukbb.clumped

# Region extraction directories
extract_dir=$UKBB_PATH/results/region_extraction/asthma
extract_sos=$USER_PATH/bioworkflows/GWAS/Region_Extraction.ipynb
extract_sbatch=$OUTPUT_PATH/$(date +"%Y-%m-%d")_asthma-region.sbatch
## Region extraction variables
region_file=$UKBB_PATH/results/LD_clumping/asthma_INT-WHR_T2D/asthma_INT-WHR_T2D_ukbb.clumped_region
geno_path=$UKBB_PATH/results/UKBB_bgenfilepath.txt
sumstats_path=$UKBB_PATH/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz
extract_job_size=10

# Finemapping
finemap_dir=$UKBB_PATH/results/fine_mapping/asthma
finemap_sos=$USER_PATH/UKBB_GWAS_dev/workflow/SuSiE_test.ipynb
finemap_sbatch=$OUTPUT_PATH/$(date +"%Y-%m-%d")_asthma-finemap.sbatch
region_dir=$UKBB_PATH/results/region_extraction/asthma
region_file=$UKBB_PATH/results/LD_clumping/asthma_INT-WHR_T2D/asthma_INT-WHR_T2D_ukbb.clumped_region
sumstats_path=$UKBB_PATH/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz
N=230411
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
container_marp=/gpfs/gibbs/pi/dewan/data/UKBiobank/marp.sif
pip_cutoff=0.1




## BoltLMM job

In [2]:
lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-07-17_asthma-bolt.sbatch[0m
INFO: Workflow farnam (ID=e70247f4a0182528) is executed successfully with 1 completed step.


## FastGWA job

In [3]:
lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-09-15_asthma-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=d20de1f960d9c797) is executed successfully with 1 completed step.



## SAIGE job

In [None]:
lmm_args="""SAIGE
    --cwd $lmm_dir_saige 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_saige 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --bgenMinMAC $bgenMinMAC
    --trait_type $trait_type
    --loco $loco
    --sampleCol $sampleCol
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_saige \
    --args "$lmm_args"

## LD clumping job

In [8]:
clumping_args="""default 
    --cwd $clumping_dir 
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --genoFile $bgenFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
    --clumpFile $clumpFile
    --container_lmm $container_lmm
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb dewan \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

INFO: Running [32mdewan[0m: Configuration for Yale `pi_dewan` partition cluster
INFO: [32mdewan[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mdewan[0m output:   [32moutput/2021-05-04_asthma_ldclumping.sbatch[0m
INFO: Workflow dewan (ID=w00e3489ca81f19ac) is ignored with 1 ignored step.



## Region extract job

In [3]:
extract_args="""default
    --cwd $extract_dir
    --region-file $region_file
    --pheno-path $phenoFile
    --geno-path $geno_path
    --bgen-sample-path $sampleFile
    --sumstats-path $sumstats_path
    --format-config-path $formatFile_fastgwa
    --unrelated-samples $unrelated_samples
    --job-size $extract_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $extract_sos \
    --to-script $extract_sbatch \
    --args "$extract_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-04-29_asthma-region.sbatch[0m
INFO: Workflow farnam (ID=wda426cfd8ead4444) is executed successfully with 1 completed step.



## Hudson plot

### Asthma vs BMI

In [2]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/GWAS/Hudson_plot.ipynb
hudson_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/hudson_plots/pleiotropy/asthma_INT-BMI
hudson_sbatch=../output/$(date +"%Y-%m-%d")_asthma_vs_BMI_hudson.sbatch
sumstats_1=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz
sumstats_2=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-BMI/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-BMI_withagesex_041720_rankNorm_BMI.boltlmm.snp_stats.gz
toptitle="Asthma_fastGWA"
bottomtitle="INT-BMI_boltlmm"
highlight_p_top=0
highlight_p_bottom=0
pval_filter=5e-06
job_size=1
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
#highlight_snp=
annotate_snp=0
phenocol1='asthma'
phenocol2='INT-BMI'

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --annotate_snp $annotate_snp
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-11-05_asthma_vs_BMI_hudson.sbatch[0m
INFO: Workflow farnam (ID=44859da9e0f9e48b) is executed successfully with 1 completed step.



### Asthma vs WHR

In [None]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/GWAS/Hudson_plot.ipynb
hudson_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/hudson_plots/pleiotropy/asthma_INT-WHR
hudson_sbatch=../output/$(date +"%Y-%m-%d")_asthma_vs_WHR_hudson.sbatch
sumstats_1=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz
sumstats_2=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-WHR/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-WHR_withagesex_042020_rankNorm_WHR.boltlmm.snp_stats.gz
toptitle="Asthma_fastGWA"
bottomtitle="INT-WHR_boltlmm"
highlight_p_top=0
highlight_p_bottom=0
pval_filter=5e-06
job_size=1
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
#highlight_snp=
annotate_snp=0
phenocol1='asthma'
phenocol2='INT-WHR'

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --annotate_snp $annotate_snp
    --container_lmm $container_lmm
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

### Asthma vs INT-WAIST

In [None]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/GWAS/Hudson_plot.ipynb
hudson_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/hudson_plots/pleiotropy/asthma_INT-WAIST
hudson_sbatch=../output/$(date +"%Y-%m-%d")_asthma_vs_WAIST_hudson.sbatch
sumstats_1=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz
sumstats_2=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data/INT-WAIST/UKB_caucasians_BMIwaisthip_AsthmaAndT2D_INT-WAIST_withagesex_042020_rankNorm_WAIST.boltlmm.snp_stats.gz
toptitle="Asthma_fastGWA"
bottomtitle="INT-WAIST_boltlmm"
highlight_p_top=0
highlight_p_bottom=0
pval_filter=5e-06
job_size=1
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
#highlight_snp=
annotate_snp=0
phenocol1='asthma'
phenocol2='INT-WAIST'

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --annotate_snp $annotate_snp
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

### Asthma vs T2D

In [None]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/GWAS/Hudson_plot.ipynb
hudson_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/hudson_plots/pleiotropy/asthma_T2D
hudson_sbatch=../output/$(date +"%Y-%m-%d")_asthma_vs_T2D_hudson.sbatch
sumstats_1=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_ASTHMA.fastGWA.snp_stats.gz
sumstats_2=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data/T2D_091120/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_030720_T2D.fastGWA.snp_stats.gz
toptitle="Asthma_fastGWA"
bottomtitle="T2D_fastGWA"
highlight_p_top=0
highlight_p_bottom=0
pval_filter=5e-06
job_size=1
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif
#highlight_snp=
annotate_snp=0
phenocol1='asthma'
phenocol2='T2D'

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --annotate_snp $annotate_snp
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

## Epistasis job with PLINK

In [2]:
epistasis_dir=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/set2_chr11_chr20
bfile=/gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_asthma_interaction.bed
phenoFile=~/scratch60/Epistasis/asthma/Asthma_casesbyICD10codesANDselfreport_controlsbyselfreportandicd10_noautoimmuneincontrols_forbolt030720_epistasis
setFile=~/scratch60/Epistasis/asthma/set2_chr11_chr20
epistasis_sos=~/project/UKBB_GWAS_dev/workflow/Epistasis.ipynb
epistasis_sbatch=../output/$(date +"%Y-%m-%d")_asthma-set2_chr11_chr20-epistasis.sbatch
numThreads=20
job_size=1
tpl_file=../farnam.yml
container_lmm=/gpfs/gibbs/pi/dewan/data/UKBiobank/lmm.sif

epistasis_args="""epistasis
    --cwd $epistasis_dir 
    --bfile $bfile
    --phenoFile $phenoFile
    --setFile $setFile
    --numThreads $numThreads
    --job_size $job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $epistasis_sos \
    --to-script $epistasis_sbatch \
    --args "$epistasis_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-02_asthma-set2_chr11_chr20-epistasis.sbatch[0m
INFO: Workflow farnam (ID=5a1b05b3001b354a) is executed successfully with 1 completed step.


In [22]:
%save chr3_filter.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr3
#SBATCH --output ../output/chr3-%J.out
#SBATCH --error ../output/chr3-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bgen /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr3_v3.bgen \
   --sample /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
   --keep /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620 \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr3_v3_unrelated

In [11]:
%save chr3_filter_exclude.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr3
#SBATCH --output ../output/chr3-%J.out
#SBATCH --error ../output/chr3-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr3_v3_unrelated \
   --exclude /home/dc2325/scratch60/Epistasis/asthma/ukb_asthma_interaction-merge.missnp \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr3_v3_unrelated_exclude

In [1]:
%save chr3_filter_exclude_maf.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr3_maf
#SBATCH --output ../output/chr3_maf-%J.out
#SBATCH --error ../output/chr3_maf-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr3_v3_unrelated_exclude \
   --maf 0.05 \
   --geno 0.1 \
   --mind 0.1 \
   --make-bed --out /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_chr3_v3_unrelated_exclude_maf

In [19]:
%save chr11_filter.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr11
#SBATCH --output ../output/chr11-%J.out
#SBATCH --error ../output/chr11-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bgen /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr11_v3.bgen \
   --sample /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
   --keep /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620 \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr11_v3_unrelated 

In [12]:
%save chr11_filter_exclude.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr11
#SBATCH --output ../output/chr11-%J.out
#SBATCH --error ../output/chr11-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr11_v3_unrelated \
   --exclude /home/dc2325/scratch60/Epistasis/asthma/ukb_asthma_interaction-merge.missnp \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr11_v3_unrelated_exclude

In [2]:
%save chr11_filter_exclude_maf.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr11_maf
#SBATCH --output ../output/chr11_maf-%J.out
#SBATCH --error ../output/chr11_maf-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr11_v3_unrelated_exclude \
   --maf 0.05 \
   --geno 0.1 \
   --mind 0.1 \
   --make-bed --out /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_chr11_v3_unrelated_exclude_maf

In [20]:
%save chr20_filter.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr20
#SBATCH --output ../output/chr20-%J.out
#SBATCH --error ../output/chr20-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bgen /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr20_v3.bgen \
   --sample /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
   --keep /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620 \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr20_v3_unrelated 

In [13]:
%save chr20_filter_exclude.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr20
#SBATCH --output ../output/chr20-%J.out
#SBATCH --error ../output/chr20-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr20_v3_unrelated \
   --exclude /home/dc2325/scratch60/Epistasis/asthma/ukb_asthma_interaction-merge.missnp \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr20_v3_unrelated_exclude

In [3]:
%save chr20_filter_exclude_maf.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr20_maf
#SBATCH --output ../output/chr20_maf-%J.out
#SBATCH --error ../output/chr20_maf-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr20_v3_unrelated_exclude \
   --maf 0.05 \
   --geno 0.1 \
   --mind 0.1 \
   --make-bed --out /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_chr20_v3_unrelated_exclude_maf

In [21]:
%save chr2_filter.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr2
#SBATCH --output ../output/chr2-%J.out
#SBATCH --error ../output/chr2-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bgen /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb_imp_chr2_v3.bgen \
   --sample /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
   --keep /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620 \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr2_v3_unrelated 

In [14]:
%save chr2_filter_exclude.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr2
#SBATCH --output ../output/chr2-%J.out
#SBATCH --error ../output/chr2-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr2_v3_unrelated \
   --exclude /home/dc2325/scratch60/Epistasis/asthma/ukb_asthma_interaction-merge.missnp \
   --make-bed --out ~/scratch60/Epistasis/asthma/ukb_imp_chr2_v3_unrelated_exclude

In [4]:
%save chr2_filter_exclude_maf.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 40G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/chr2_maf
#SBATCH --output ../output/chr2_maf-%J.out
#SBATCH --error ../output/chr2_maf-%J.log
module load PLINK/2_x86_64_20180428
plink2 \
   --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr2_v3_unrelated_exclude \
   --maf 0.05 \
   --geno 0.1 \
   --mind 0.1 \
   --make-bed --out /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_chr2_v3_unrelated_exclude_maf

In [19]:
%save merge_filter.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 60G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/allchr
#SBATCH --output ../output/allchr-%J.out
#SBATCH --error ../output/allchr-%J.log
#for chr in {'chr3','chr11','chr20'}; do echo "/home/dc2325/scratch60/Epistasis/asthma/ukb_imp_${chr}_v3_unrelated.bed /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_${chr}_v3_unrelated.bim /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_${chr}_v3_unrelated.fam\n" >> list_beds.txt; done
module load PLINK/1.90-beta5.3
plink \
  --bfile /home/dc2325/scratch60/Epistasis/asthma/ukb_imp_chr2_v3_unrelated_exclude \
  --merge-list /home/dc2325/scratch60/Epistasis/asthma/list_beds.txt \
  --make-bed --out /gpfs/gibbs/pi/dewan/data/UKBiobank/results/ukb_asthma_interaction \
  &> ../output/plink_interaction.log

In [6]:
%save merge_filter_maf.sh -f
#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 4
#SBATCH --mem 80G
#SBATCH --time 1-0:00:00
#SBATCH --job-name ../output/allchr-maf
#SBATCH --output ../output/allchr_maf-%J.out
#SBATCH --error ../output/allchr_maf-%J.log
#for chr in {'chr3','chr11','chr20'}; do echo "/gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_${chr}_v3_unrelated_exclude_maf.bed /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_${chr}_v3_unrelated_exclude_maf.bim /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_${chr}_v3_unrelated_exclude_maf.fam\n" >> list_beds.txt; done
module load PLINK/1.90-beta5.3
plink \
  --bfile /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_imp_chr2_v3_unrelated_exclude_maf \
  --merge-list /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/list_beds.txt \
  --make-bed --out /gpfs/gibbs/pi/dewan/data/UKBiobank/results/interaction/ukb_asthma_interaction \
  &> ../output/asthma_interaction_maf.log

# JAZF1 Gene

## Region Extraction Job for JAZF1 gene

In [None]:
# LD clumping has already been done to find regions to extract
# the gene for JAZF1 has been split between two regions because of this clumping
#     7 27453215 28031275
#     7 28138193 28259233 
# the region-file parameter should include these two regions

# the entire JAZF1 region can be extracted without the LD clumping by specifying only the region 
#     7 27870196 28220414 
# in the region-file parameter

region_file=$UKBB_PATH/results/LD_clumping/asthma_INT-WHR_T2D/asthma_INT-WHR_T2D_ukbb.clumped_region

extract_args="""default
    --cwd $extract_dir
    --region-file $region_file
    --pheno-path $phenoFile
    --geno-path $geno_path
    --bgen-sample-path $sampleFile
    --sumstats-path $sumstats_path
    --format-config-path $formatFile_fastgwa
    --unrelated-samples $unrelated_samples
    --job-size $extract_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb dewan \
    --template-file $tpl_file \
    --workflow-file $extract_sos \
    --to-script $extract_sbatch \
    --args "$extract_args"

## Finemapping Job for JAZF1 gene

In [None]:
region_file=$UKBB_PATH/results/LD_clumping/asthma_INT-WHR_T2D/asthma_INT-WHR_T2D_ukbb.clumped_region

finemap_args="""default
    --cwd $finemap_dir
    --region_dir $region_dir
    --region_file $phenoFile
    --sumstats_path $sumstats_path
    --container_lmm $container_lmm
    --container_marp $container_marp
    --pip_cutoff $pip_cutoff
    --N $N
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $finemap_sos \
    --to-script $finemap_sbatch \
    --args "$finemap_args"

# 03/01/23 Burden test for asthma 200WES

## White European

In [7]:
lmm_dir_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_extwhite
lmm_sbatch_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_extwhite/asthma_200k_exomes-regenie-burden_$(date +"%Y-%m-%d").sbatch
phenoFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_White_asthma_pcs_rerun
covarFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_White_asthma_pcs_rerun
phenoCol=ASTHMA_ICD10orself_03_28_22
covarCol=sex
qCovarCol=`echo age asm_PC{1..10}`
genoFile=`echo ~/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
bfile=~/UKBiobank/genotype_files_processed/012323_white_european_460649ind_hg38/final_files_no_outliers/*.bed
anno_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.anno_file
set_list=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.set_list_file
mask_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.mask_file
aaf_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.aff_file
build_mask=max
aaf_bins='0.005 0.01'
tpl_file=~/project/bioworkflows/admin/csg.yml
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
container_marp=~/containers/marp.sif
container_lmm=~/containers/lmm.sif 
lmm_job_size=1
ylim=20
k=10
reverse_log_p=True
numThreads=20
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
bsize=1000
## Trait leave empty for qt traits
trait=bt
minMAC=1
snpannofile=~/UKBiobank/results/ukb23155_200Kexomes_annovar/2021_10_12_hg38_exome/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.gz

lmm_args="""regenie_burden
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --trait $trait
    --anno_file $anno_file
    --set_list $set_list
    --mask_file $mask_file
    --aaf_file $aaf_file
    --aaf_bins $aaf_bins
    --build_mask $build_mask
    --job_size $lmm_job_size
    --ylim $ylim
    --k $k
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --minMAC $minMAC
    --snpannofile $snpannofile
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_extwhite/asthma_200k_exomes-regenie-burden_2023-03-01.sbatch[0m
INFO: Workflow csg (ID=we2b126aa32fdea21) is executed successfully with 1 completed step.



## Asian

In [8]:
lmm_dir_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_asian
lmm_sbatch_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_asian/asthma_ASN_200k_exomes-regenie-burden_$(date +"%Y-%m-%d").sbatch
phenoFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_Asian_asthma_pcs_rerun
covarFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_Asian_asthma_pcs_rerun
phenoCol=ASTHMA_ICD10orself_03_28_22
covarCol=sex
qCovarCol=`echo age asm_PC{1..10}`
genoFile=`echo ~/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
bfile=~/UKBiobank/genotype_files_processed/012323_asian_10189ind_hg38/final_files_no_outliers/*.bed
anno_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.anno_file
set_list=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.set_list_file
mask_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.mask_file
aaf_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.aff_file
build_mask=max
aaf_bins='0.005 0.01'
tpl_file=~/project/bioworkflows/admin/csg.yml
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
container_marp=~/containers/marp.sif
container_lmm=~/containers/lmm.sif 
lmm_job_size=1
ylim=20
k=10
reverse_log_p=True
numThreads=20
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
bsize=1000
## Trait leave empty for qt traits
trait=bt
minMAC=1
snpannofile=~/UKBiobank/results/ukb23155_200Kexomes_annovar/2021_10_12_hg38_exome/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.gz

lmm_args="""regenie_burden
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --trait $trait
    --anno_file $anno_file
    --set_list $set_list
    --mask_file $mask_file
    --aaf_file $aaf_file
    --aaf_bins $aaf_bins
    --build_mask $build_mask
    --job_size $lmm_job_size
    --ylim $ylim
    --k $k
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --minMAC $minMAC
    --snpannofile $snpannofile
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_asian/asthma_ASN_200k_exomes-regenie-burden_2023-03-01.sbatch[0m
INFO: Workflow csg (ID=w91378407f33b6b00) is executed successfully with 1 completed step.



## African

In [9]:
lmm_dir_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_african
lmm_sbatch_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_african/asthma_AFR_200k_exomes-regenie-burden_$(date +"%Y-%m-%d").sbatch
phenoFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_African_asthma_pcs_rerun
covarFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_African_asthma_pcs_rerun
phenoCol=ASTHMA_ICD10orself_03_28_22
covarCol=sex
qCovarCol=`echo age asm_PC{1..10}`
genoFile=`echo ~/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
bfile=~/UKBiobank/genotype_files_processed/012323_african_9096ind_hg38/final_files_no_outliers/*.bed
anno_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.anno_file
set_list=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.set_list_file
mask_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.mask_file
aaf_file=~/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/102121_burden_files/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.aff_file
build_mask=max
aaf_bins='0.005 0.01'
tpl_file=~/project/bioworkflows/admin/csg.yml
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
container_marp=~/containers/marp.sif
container_lmm=~/containers/lmm.sif 
lmm_job_size=1
ylim=20
k=10
reverse_log_p=True
numThreads=20
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
bsize=1000
## Trait leave empty for qt traits
trait=bt
minMAC=1
snpannofile=~/UKBiobank/results/ukb23155_200Kexomes_annovar/2021_10_12_hg38_exome/ukb23155_chr1_chr22_091321.hg38.hg38_multianno.renamedcols.csv.gz

lmm_args="""regenie_burden
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --trait $trait
    --anno_file $anno_file
    --set_list $set_list
    --mask_file $mask_file
    --aaf_file $aaf_file
    --aaf_bins $aaf_bins
    --build_mask $build_mask
    --job_size $lmm_job_size
    --ylim $ylim
    --k $k
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --minMAC $minMAC
    --snpannofile $snpannofile
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/UKBiobank/results_pleiotropy/REGENIE_results/results_burden_exome/030123_asthma_african/asthma_AFR_200k_exomes-regenie-burden_2023-03-01.sbatch[0m
INFO: Workflow csg (ID=wff3fd1eddc919bbd) is executed successfully with 1 completed step.



# 03/30/23 Exome analysis univariate asthma 200WES -  hg38 genotype array

## White European

In [1]:
## All filters set to 0 because the version of the bfile has already been QC'ed previously and there is not need to do it here
maf_filter=0
geno_filter=0
hwe_filter=0
mind_filter=0
lmm_dir_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_whiteEUR/
lmm_sbatch_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_whiteEUR/asthma_200k_exomes-regenie_whiteEUR_$(date +"%Y-%m-%d").sbatch
phenoFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_White_asthma_pcs_rerun
covarFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_White_asthma_pcs_rerun
phenoCol=ASTHMA_ICD10orself_03_28_22
covarCol=sex
qCovarCol=`echo age asm_PC{1..10}` 
# Use the QC'ed exome files variant and sample missingness < 10%
genoFile=`echo ~/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files that passed QC using Megan's parameters geno=0.01, mind=0.1, maf=0.01, hwe=5e-08
bfile=~/UKBiobank/genotype_files_processed/012323_white_european_460649ind_hg38/final_files_no_outliers/*.bed
tpl_file=~/project/bioworkflows/admin/csg.yml
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
container_marp=~/containers/marp.sif
container_lmm=~/containers/lmm.sif 
lmm_job_size=1
ylim=0
reverse_log_p=True
numThreads=20
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
bsize=1000
trait=bt
## Using this MAC the default in regenie analysis
minMAC=5
label_annotate='SNP'
lowmem_dir=~/scratch60/predictions

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --label_annotate $label_annotate
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg\
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args" 

  msg['msg_id'] = self._parent_header['header']['msg_id']


INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_whiteEUR/asthma_200k_exomes-regenie_whiteEUR_2023-03-30.sbatch[0m
INFO: Workflow csg (ID=w715b322866ff8f73) is executed successfully with 1 completed step.



## Asian

In [2]:
## All filters set to 0 because the version of the bfile has already been QC'ed previously and there is not need to do it here
maf_filter=0
geno_filter=0
hwe_filter=0
mind_filter=0
lmm_dir_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_ASN/
lmm_sbatch_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_ASN/asthma_200k_exomes-regenie_ASN_$(date +"%Y-%m-%d").sbatch
phenoFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_Asian_asthma_pcs_rerun
covarFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_Asian_asthma_pcs_rerun
phenoCol=ASTHMA_ICD10orself_03_28_22
covarCol=sex
qCovarCol=`echo age asm_PC{1..10}` 
# Use the QC'ed exome files variant and sample missingness < 10%
genoFile=`echo ~/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files that passed QC using Megan's parameters geno=0.01, mind=0.1, maf=0.01, hwe=5e-08
bfile=~/UKBiobank/genotype_files_processed/012323_asian_10189ind_hg38/final_files_no_outliers/*.bed
tpl_file=~/project/bioworkflows/admin/csg.yml
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
container_marp=~/containers/marp.sif
container_lmm=~/containers/lmm.sif 
lmm_job_size=1
ylim=0
reverse_log_p=True
numThreads=20
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
bsize=1000
trait=bt
## Using this MAC the default in regenie analysis
minMAC=5
label_annotate='SNP'
lowmem_dir=~/scratch60/predictions

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --label_annotate $label_annotate
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg\
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args" 

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_ASN/asthma_200k_exomes-regenie_ASN_2023-03-30.sbatch[0m
INFO: Workflow csg (ID=w2a3f749524205d45) is executed successfully with 1 completed step.



## African

In [3]:
## All filters set to 0 because the version of the bfile has already been QC'ed previously and there is not need to do it here
maf_filter=0
geno_filter=0
hwe_filter=0
mind_filter=0
lmm_dir_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_AFR/
lmm_sbatch_regenie=~/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_AFR/asthma_200k_exomes-regenie_AFR_$(date +"%Y-%m-%d").sbatch
phenoFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_African_asthma_pcs_rerun
covarFile=~/UKBiobank/phenotype_files/pleiotropy/UKB_exome_African_asthma_pcs_rerun
phenoCol=ASTHMA_ICD10orself_03_28_22
covarCol=sex
qCovarCol=`echo age asm_PC{1..10}` 
# Use the QC'ed exome files variant and sample missingness < 10%
genoFile=`echo ~/UKBiobank/data/exome_files/project_VCF/072721_run/plink/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files that passed QC using Megan's parameters geno=0.01, mind=0.1, maf=0.01, hwe=5e-08
bfile=~/UKBiobank/genotype_files_processed/012323_african_9096ind_hg38/final_files_no_outliers/*.bed
tpl_file=~/project/bioworkflows/admin/csg.yml
lmm_sos=~/project/bioworkflows/GWAS/LMM.ipynb
container_marp=~/containers/marp.sif
container_lmm=~/containers/lmm.sif 
lmm_job_size=1
ylim=0
reverse_log_p=True
numThreads=20
formatFile_regenie=~/project/UKBB_GWAS_dev/data/regenie_template.yml
bsize=1000
trait=bt
## Using this MAC the default in regenie analysis
minMAC=5
label_annotate='SNP'
lowmem_dir=~/scratch60/predictions

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --label_annotate $label_annotate
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg\
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args" 

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/UKBiobank/results_pleiotropy/REGENIE_results/results_exome_data/300323_asthma_AFR/asthma_200k_exomes-regenie_AFR_2023-03-30.sbatch[0m
INFO: Workflow csg (ID=waa34e1de719fb8e4) is executed successfully with 1 completed step.

