# Analysis of hearing impairment phenotypes

This notebook applies the `Get_Job_Script.ipynb` to automatically generate the sbatch scripts to run in Yale's cluster. The end result is to apply [various LMM workflows](https://github.com/statgenetics/UKBB_GWAS_dev/tree/master/workflow) to perform association analysis in different hearing impairment traits, do clumping analysis and extract associated regions.

The phenotypes analyzed are:

1. Hearing aid f.3393
2. Hearing difficulty f.2247
3. Hearing difficulty with background noise f.2257
4. Combined phenotype f.2247 & f.2257

## File paths on Yale cluster

- Genotype files in PLINK format:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv`
- Genotype files in bgen format:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb39554_imputeddataset/`
- Summary stats for imputed variants BOLT-LMM:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/results/BOLTLMM_results/results_imputed_data`
- Summary stats for inputed variants FastGWA:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/results/FastGWA_results/results_imputed_data`
- Phenotype files:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/phenotype_files/hearing_impairment`
- Relationship file:
`/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620`

## Create symlinks to necessary folders in your home dir

```
ln -s /mnt/mfs/statgen/archive/UKBiobank_Yale_transfer ~/
ln -s /mnt/mfs/statgen/UKBiobank ~/
ln -s /mnt/mfs/statgen/containers ~/
```

## Fork and clone bioworkflows and UKBB_GWAS_dev repos to a folder called project

```
mkdir project 
git clone https://github.com/statgenetics/UKBB_GWAS_dev.git
git clone https://github.com/cumc/bioworkflows.git`m
```



## 08/31/20 analysis

On the cluster, open up this notebook using the JupyterLab server you set up via the ssh channel, then run the following cells,

## Bash variables for workflow configuration

### Yale's cluster

Run this cell when working on Yale's cluster

In [4]:
# Common variables Yale cluster
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
USER_PATH=$HOME/project
container_lmm=$UKBB_PATH/lmm.sif
container_marp=$UKBB_PATH/marp.sif
container_annovar=$UKBB_PATH/annovar.sif
hearing_pheno_path=$UKBB_PATH/phenotype_files/hearing_impairment
tpl_file=$USER_PATH/UKBB_GWAS_dev/farnam.yml
formatFile_fastgwa=$USER_PATH/UKBB_GWAS_dev/data/fastGWA_template.yml
formatFile_bolt=$USER_PATH/UKBB_GWAS_dev/data/boltlmm_template.yml
formatFile_saige=$USER_PATH/UKBB_GWAS_dev/data/saige_template.yml
formatFile_regenie=$USER_PATH/UKBB_GWAS_dev/data/regenie_template.yml
###bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
unrelated_samples=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620

# Cleaned Imputed data BGEN input
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

# Non-QC'ed Exome data PLINK input (as downloaded from the UKBB)
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`

### Columbia's cluster

Run this cell if running your jobs on Columbia's cluster

In [5]:
# Common variables Columbia's cluster
UKBB_PATH=$HOME/UKBiobank
UKBB_yale=$HOME/UKBiobank_Yale_transfer
USER_PATH=$HOME/project
container_lmm=$HOME/containers/lmm.sif
container_marp=$HOME/containers/marp.sif
container_annovar=$HOME/containers/gatk4-annovar.sif
hearing_pheno_path=$UKBB_PATH/phenotype_files/hearing_impairment
tpl_file=$USER_PATH/bioworkflows/admin/csg.yml
formatFile_fastgwa=$USER_PATH/UKBB_GWAS_dev/data/fastGWA_template.yml
formatFile_bolt=$USER_PATH/UKBB_GWAS_dev/data/boltlmm_template.yml
formatFile_saige=$USER_PATH/UKBB_GWAS_dev/data/saige_template.yml
formatFile_regenie=$USER_PATH/UKBB_GWAS_dev/data/regenie_template.yml
###bfile=$UKBB_yale/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
unrelated_samples=$UKBB_yale/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620

# Cleaned Imputed data BGEN input
genoFile=`echo $UKBB_yale/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_yale/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

# Non-QC'ed Exome data PLINK input (as downloaded from the UKBB)
genoFile=`echo $UKBB_yale/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`




## Shared variables for workflows and results

In [6]:
# Workflows
lmm_sos=$USER_PATH/bioworkflows/GWAS/LMM.ipynb
anno_sos=$USER_PATH/bioworkflows/variant-annotation/annovar.ipynb
clumping_sos=$USER_PATH/bioworkflows/GWAS/LD_Clumping.ipynb
extract_sos=$USER_PATH/bioworkflows/GWAS/Region_Extraction.ipynb

# LMM directories for imputed data
lmm_imp_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data
lmm_imp_dir_bolt=$UKBB_PATH/results/BOLTLMM_results/results_imputed_data
lmm_imp_dir_saige=$UKBB_PATH/results/SAIGE_results/results_imputed_data
lmm_imp_dir_regenie=$UKBB_PATH/results/REGENIE_results/results_imputed_data

# LMM directories for exome data
lmm_exome_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_exome_data
lmm_exome_dir_bolt=$UKBB_PATH/results/BOLTLMM_results/results_exome_data
lmm_exome_dir_saige=$UKBB_PATH/results/SAIGE_results/results_exome_data
lmm_exome_dir_regenie=$UKBB_PATH/results/REGENIE_results/results_exome_data




## Specification of LMM variables

In [4]:
## LMM variables 
## Specific to Bolt_LMM
LDscoresFile=$UKBB_PATH/LDSCORE.1000G_EUR.tab.gz
geneticMapFile=$UKBB_PATH/genetic_map_hg19_withX.txt.gz
covarMaxLevels=10
numThreads=20
bgenMinMAF=0.001
bgenMinINFO=0.8
lmm_job_size=1
ylim=0

### Specific to FastGWA (depeding if you run from Yale or Columbia)
####Yale's cluster
grmFile=$UKBB_PATH/results/FastGWA_results/results_imputed_data/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.grm.sp
####Columbia's cluster
grmFile=$UKBB_yale/results/FastGWA_results/results_imputed_data/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.grm.sp

### Specific to SAIGE
bgenMinMAC=4
trait_type=binary
loco=TRUE
sampleCol=IID

### Specific to REGENIE
bsize=1000
lowmem=$HOME/scratch60/
lowmem_dir=$HOME/scratch60/predictions
trait=bt
minMAC=4
maf_filter=0.01
geno_filter=0.01
hwe_filter=0
mind_filter=0.1
reverse_log_p=True

## Specification of LD clumping variables

In [4]:
# LD clumping directories
clumping_dir=$UKBB_PATH/results/LD_clumping

## LD clumping variables
# For sumtastsFiles if more than one provide each path
####Yale's cluster
bfile_ref=$UKBB_PATH/results/LD_clumping/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.1210.ref_geno.bed
####Columbia's cluster
bfile_ref=$UKBB_yale/results/LD_clumping/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.1210.ref_geno.bed
# Changes dependending upon which traits are analyzed
# In this case tinnitus only
sumstatsFiles=$UKBB_PATH/results/
ld_sample_size=1210
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1
clumpFile= 
clumregionFile=

## Specification of Region extraction variables

In [None]:
# Region extraction directories
extract_dir=$UKBB_PATH/results/region_extraction

## Region extraction variables
region_file=$UKBB_PATH/results/LD_clumping/
geno_path=$UKBB_PATH/results/UKBB_bgenfilepath.txt
sumstats_path=$UKBB_PATH/results/FastGWA_results/results_imputed_data/
extract_job_size=10

# 1. Hearing aid user f.3393

## FastGWA job only white British

In [None]:
lmm_dir_fastgwa=$lmm_imp_dir_fastgwa/f3393_hearing_aid
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f3393_imp-fastgwa.sbatch
phenoFile=$hearing_pheno_path/200828_UKBB_Hearing_aid_f3393
covarFile=$hearing_pheno_path/200828_UKBB_Hearing_aid_f3393
phenoCol=hearing_aid_cat
covarCol=sex
qCovarCol=age_final_aid

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

## FastGWA all whites

In [2]:
lmm_dir_fastgwa=$lmm_imp_dir_fastgwa//f3393_hearing_aid_expandedwhite
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f3393_expandedwhite_imp-fastgwa.sbatch
phenoFile=$hearing_pheno_path/120120_UKBB_Hearing_aid_f3393_expandedwhite
covarFile=$hearing_pheno_path/120120_UKBB_Hearing_aid_f3393_expandedwhite
phenoCol=hearing_aid_cat
covarCol=sex
qCovarCol=age_final_aid

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-12-02_f3393_expandedwhite_imp-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=852e6b72805bc1ab) is executed successfully with 1 completed step.


## FastGWA exome data

In [2]:
lmm_dir_fastgwa=$lmm_exome_dir_fastgwa/f3393_hearing_aid_exomes
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f3393_exomes-fastgwa.sbatch
phenoFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_aid_f3393_128254ind_exomes
covarFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_aid_f3393_128254ind_exomes
phenoCol=hearing_aid_cat
covarCol=sex
qCovarCol=age_final_aid

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --genoFile $genoFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-01-11_f3393_exomes-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=3f949028f44f4152) is executed successfully with 1 completed step.



## Regenie exome data

In [4]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes_bfile
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exome_bfile-regenie.sbatch
phenoFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_aid_f3393_128254ind_exomes
covarFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_aid_f3393_128254ind_exomes
phenoCol=hearing_aid_cat
covarCol=sex
qCovarCol=age_final_aid
#Use original bed files from the UKBB exome data
#bfile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/exome_files_snpsonly/ukb23155.filtered.merged.bed
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_prefix $lowmem_prefix
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-12_f3393_hearing_aid_exome_bfile-regenie.sbatch[0m
INFO: Workflow farnam (ID=w03430bc454ddd1e8) is executed successfully with 1 completed step.



## Regenie imputed data: Expanded white control NA

This analysis is done with the correct number of cases and controls (those NA for f.3393)

In [3]:
lmm_dir_regenie=$lmm_imp_dir_regenie/f3393_hearing_aid_impdata_newpheno
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_impdata-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_final_aid
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-07-13_f3393_hearing_aid_impdata-regenie.sbatch[0m
INFO: Workflow farnam (ID=w535e731e96a51db8) is executed successfully with 1 completed step.



## Regenie: single variant association analysis with exome data on replication set 50K exomes

### f.3393 & Pure controls

In [2]:
# First run using only pure controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes50K_pure_ctrl
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exomes50K_pure_ctrl-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_final_aid
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-19_f3393_hearing_aid_exomes50K_pure_ctrl-regenie.sbatch[0m
INFO: Workflow farnam (ID=wfb6905d41d076b6e) is executed successfully with 1 completed step.



### f.3393 & Controls NA for f.3393

In [4]:
# First run using only pure controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes50K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exomes50K_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_final_aid
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-19_f3393_hearing_aid_exomes50K_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=w06d19fc2ff71aba3) is ignored with 1 ignored step.



## Regenie in exome data (original Plink files UKBB unqc'ed) using modified phenotype file with controls_na for f.3393

### f.3393 & Controls NA for f.3393

In [2]:
# First run using controls na for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes200K_noqc_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/062421_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_104402ind
covarFile=$hearing_pheno_path/062421_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_104402ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_final_aid
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed
hwe_filter=5e-08

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-06-24_f3393_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=w574ef8382b581afa) is executed successfully with 1 completed step.



## Regenie in exome data after VCF-QC 200K exomes

### f.3393 & Controls NA for f.3393

In [9]:
# Run using all controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes200K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/f3393_hearing_aid_exomes200K_ctrl_na-regenie_$(date +"%Y-%m-%d").sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_final_aid
genoFile=`echo /mnt/mfs/statgen/UKBiobank/data/exome_files/project_VCF/plink_files/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/data/genotype_files/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/project/UKBB_GWAS_dev/output/f3393_hearing_aid_exomes200K_ctrl_na-regenie_2021-05-18.sbatch[0m
INFO: Workflow csg (ID=w1682bd842f840e70) is executed successfully with 1 completed step.



## Regenie Burden with 50K exomes

In [None]:
lmm_dir_regenie=$lmm_exome_dir_regenie/burden/f3393_hearing_aid_exomes50K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exomes50K_ctrl_na-regenie-burden.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_228760ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_aid
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed
anno_file=$lmm_exome_dir_regenie/burden/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.anno_file
set_list=$lmm_exome_dir_regenie/burden/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.set_list_file
mask_file=$lmm_exome_dir_regenie/burden/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.mask_file
keep_gene=
build_mask=max
aaf_bins=0.005,0.01

lmm_args="""regenie_burden
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --trait $trait
    --anno_file $anno_file
    --set_list $set_list
    --mask_file $mask_file
    --keep_gene $keep_gene
    --aaf_bins $aaf_bins
    --build_mask $build_mask
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

## Create the anno_file, set_list_file and mask_files necessary for burden test

In [12]:
burden_dir=$UKBB_PATH/results/ukb23155_200Kexomes_annovar/burden_files
anno_sbatch_burden=$USER_PATH/UKBB_GWAS_dev/output/ukb23155_200Kexomes_burdenfiles_$(date +"%Y-%m-%d").sbatch
## Annotated exome file for 50K exomes UKBB
#annotated_file_hg38=$UKBB_PATH/results/ukb32285_exomespb_annovar/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.csv
## Annotate exome file for 200K exomes UKBB
annotated_file_hg38=$UKBB_PATH/results/ukb23155_200Kexomes_annovar/ukb23155_chr1_chr22_exomedata.hg38.hg38_multianno.csv.gz
bim_name=$UKBB_PATH/results/ukb23155_200Kexomes_annovar/exome_bim_merge/ukb23155_chr1_chr22.bim
job_size=1
name_prefix='ukb23155_chr1_chr22_burden_files'
container_annovar=$HOME/containers/gatk4-annovar.sif

anno_args="""burden_files
    --cwd $burden_dir
    --annotated_file $annotated_file_hg38
    --bim_name $bim_name
    --name_prefix $name_prefix
    --job_size $job_size
    --container_annovar $container_annovar
"""

sos run $USER_PATH/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $anno_sos \
    --to-script $anno_sbatch_burden\
    --args "$anno_args"


INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mcsg[0m output:   [32m/home/dmc2245/project/UKBB_GWAS_dev/output/ukb23155_200Kexomes_burdenfiles_2021-08-05.sbatch[0m
INFO: Workflow csg (ID=w8ab0e2798f132664) is ignored with 1 ignored step.



## Regenie Burden with 200K exomes

This run is with the new phenotype file with pure control definition and cases definition made by Fabiha

In [None]:
lmm_dir_regenie=$lmm_exome_dir_regenie/burden/f3393_hearing_aid_200K_exomes
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_200k_exomes-regenie-burden.sbatch
phenoFile=$hearing_pheno_path/080421_UKBB_Hearing_aid_f3393_expandedwhite_6305cases_98082ctrl
covarFile=$hearing_pheno_path/080421_UKBB_Hearing_aid_f3393_expandedwhite_6305cases_98082ctrl
phenoCol=f3393
covarCol=sex
qCovarCol=age
#This run do it with unqc'ed plink files while we wait for the qc'ed ones
genoFile=
#Use the original bed files for the genotype array for the expanded white on regenie step1
##Yale's cluster
#bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed
## Columbia's cluster
bfile=$UKBB_PATH/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed
anno_file=$lmm_exome_dir_regenie/burden/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.anno_file
set_list=$lmm_exome_dir_regenie/burden/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.set_list_file
mask_file=$lmm_exome_dir_regenie/burden/ukb32285_exomespb_chr1_22.hg38.hg38_multianno.mask_file
keep_gene=
build_mask=max
aaf_bins=0.005,0.01

lmm_args="""regenie_burden
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --trait $trait
    --anno_file $anno_file
    --set_list $set_list
    --mask_file $mask_file
    --keep_gene $keep_gene
    --aaf_bins $aaf_bins
    --build_mask $build_mask
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

## Bolt-LMM job

In [2]:
lmm_dir_bolt=$lmm_imp_dir_bolt/f3393_hearing_aid
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_f3393_hearing_aid_imp-bolt.sbatch
phenoFile=$hearing_pheno_path/200828_UKBB_Hearing_aid_f3393
covarFile=$hearing_pheno_path/200828_UKBB_Hearing_aid_f3393
phenoCol=hearing_aid_cat
covarCol=sex
qCovarCol=age_final_aid

lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp    
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-20_f3393_hearing_aid_imp-bolt.sbatch[0m
INFO: Workflow farnam (ID=c482417dfcba9f23) is executed successfully with 1 completed step.



## LD clumping job

### Imputed data

In [None]:
clumping_dir=$clumping_dir/f3393_hearing_aid
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f3393_hearing_aid_ldclumping.sbatch
sumstatsFiles=$lmm_imp_dir_fastgwa/f3393_hearing_aid/200828_UKBB_Hearing_aid_f3393_hearing_aid_cat.fastGWA.snp_stats.gz

clumping_args="""default 
    --cwd $clumping_dir 
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --genoFile $genofile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

### Exome data: 

FIXME: 
1. Option: Is to create the reference bed from the exome data. However this bfile is used to calculate the LD between SNPs, therefore it is better to use a reference file created from the genotype array. One drawback will be that variants present in the exome and absent in the genotype array wont be selected as index SNPs.
2. Option: use the bfile_ref created from the imputed data 

In [1]:
clumping_dir=$clumping_dir/f3393_hearing_aid_exome
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exome_ldclumping.sbatch
#clumping_sbatch=../output/$(date +"%Y-%m-%d")_refbedfile2_exome_ldclumping.sbatch
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sumstatsFiles=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.snp_stats.gz
sampleFile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631.fam
bfile_ref=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631_chr1_22_exomedata.1200.ref_geno.bed
ld_sample_size=1200
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20

# Select samples filter_samples workflow & create reference file with reference workflow
# Then use default workflow to run the LD clumping
clumping_args="""default
    --cwd $clumping_dir
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --genoFile $genoFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
    --container_lmm $container_lmm
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-07_f3393_hearing_aid_exome_ldclumping.sbatch[0m
INFO: Workflow farnam (ID=w64d7831c0c82bb7e) is executed successfully with 1 completed step.



## Region extraction

### Imputed data

In [4]:
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_Hearing_aid_f3393
extract_dir=$UKBB_PATH/results/region_extraction/f3393_hearing_aid
extract_sos=~/project/UKBB_GWAS_dev/workflow/Region_Extraction_4.ipynb
extract_sbatch=../output/$(date +"%Y-%m-%d")_f3393_hearing_aid_imp-region.sbatch
region_file=$UKBB_PATH/results/LD_clumping/f3393_hearing_aid/*.clumped_region
geno_path=$UKBB_PATH/results/UKBB_bgenfilepath.txt
sumstats_path=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f3393_hearing_aid/*.snp_stats.gz

extract_args="""default
    --cwd $extract_dir
    --region-file $region_file
    --pheno-path $phenoFile
    --geno-path $geno_path
    --bgen-sample-path $sampleFile
    --sumstats-path $sumstats_path
    --format-config-path $formatFile_fastgwa
    --unrelated-samples $unrelated_samples
    --job-size $extract_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $extract_sos \
    --to-script $extract_sbatch \
    --args "$extract_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-28_f3393_hearing_aid_imp-region.sbatch[0m
INFO: Workflow farnam (ID=55af58288aa5cc61) is executed successfully with 1 completed step.



### Exome data

In [15]:
### Create the bedfilepath file
cd $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020
for file in ukb23155_c{1..22}_b0_v1.bed;
    do echo `pwd`/$file;
done | awk '{print NR " " $s}' > UKBB_exome_plinkfilepath.txt

In [2]:
tpl_file=/home/dc2325/project/UKBB_GWAS_dev/farnam.yml
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/phenotypes_exome_data/010421_UKBB_Hearing_aid_f3393_128254ind_exomes
extract_dir=$UKBB_PATH/results/region_extraction/f3393_hearing_aid_exomes
# Original region extraction pipeline
extract_sos=~/project/bioworkflows/admin/Region_Extraction.ipynb 
#extract_sos=~/project/UKBB_GWAS_dev/workflow/Region_Extraction_4.ipynb
extract_sbatch=/home/dc2325/project/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exome-regionextrac.sbatch
region_file=$UKBB_PATH/results/LD_clumping/f3393_hearing_aid_exome/*.clumped_region
geno_path=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/UKBB_exome_plinkfilepath.txt
sumstats_path=$UKBB_PATH/results/REGENIE_results/results_exome_data/f3393_hearing_aid_exomes/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.snp_stats.gz
unrelated_samples=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
extract_job_size=10
sampleFile=
formatFile_regenie=
#container_lmm=$UKBB_PATH/lmm.sif

extract_args="""default
    --cwd $extract_dir
    --region-file $region_file
    --pheno-path $phenoFile
    --geno-path $geno_path
    --bgen-sample-path $sampleFile
    --sumstats-path $sumstats_path
    --format-config-path $formatFile_regenie
    --unrelated-samples $unrelated_samples
    --job-size $extract_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $extract_sos \
    --to-script $extract_sbatch \
    --args "$extract_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-02-09_f3393_hearing_aid_exome-regionextrac.sbatch[0m
INFO: Workflow farnam (ID=w60dbbcede9750add) is executed successfully with 1 completed step.



## Fine mapping

In [None]:
finemap_dir=$UKBB_PATH/results/fine_mapping/f3393_hearing_aid
finemap_sos=~/project/UKBB_GWAS_dev/workflow/SuSiE_test.ipynb
finemap_sbatch=../output/$(date +"%Y-%m-%d")_f3393_hearing_aid_imp-finemap.sbatch
region_dir=$UKBB_PATH/results/region_extraction/f3393_hearing_aid
region_file=$UKBB_PATH/results/LD_clumping/f3393_hearing_aid/200828_UKBB_Hearing_aid_f3393_hearing_aid_cat.fastGWA.snp_stats.clumped_region
sumstats_path=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f3393_hearing_aid/*.snp_stats.gz
N=230411
container_lmm=/home/dc2325/scratch60/lmm_v_1_4.sif

finemap_args="""default
    --cwd $finemap_dir
    --region_dir $region_dir
    --region_file $phenoFile
    --sumstats_path $sumstats_path
    --container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $finemap_sos \
    --to-script $finemap_sbatch \
    --args "$finemap_args"

## Post_GWAS annotation SNP-to-gene

### Annotate exome data (old pheno)

In [3]:
lmm_dir=$UKBB_PATH/results/REGENIE_results/results_exome_data/f3393_hearing_aid_exomes
postgwa_sbatch=../output/$(date +"%Y-%m-%d")_f3393_postgwa.sbatch
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f3393_hearing_aid_exomes/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.snp_stats.gz
tpl_file=../farnam.yml
postgwa_sos=~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb
job_size=1
hg=38
postgwa_args="""default
    --cwd $lmm_dir
    --sumstatsFile $sumstatsFile
    --hg $hg
    --job_size $job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $postgwa_sos \
    --to-script $postgwa_sbatch \
    --args "$postgwa_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-01-28_f3393_postgwa.sbatch[0m
INFO: Workflow farnam (ID=w6b10aab5fa809918) is executed successfully with 1 completed step.


### Merge bim files from 200K exome data

In [6]:
cwd=$USER_PATH/ukb23155_200Kexomes_annovar/exome_bim_merge
bimfiles=`echo $UKBB_yale/UKBiobank/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=$USER_PATH/ukb23155_200Kexomes_annovar/exome_bim_merge/ukb23155_chr1_chr22.bim
build='hg38'

sos run ~/project/bioworkflows/variant-annotation/annovar.ipynb bim_from_plink\
    --cwd $cwd \
    --bim_name $bim_name \
    --bimfiles $bimfiles \
    --build $build
    --job_size $job_size \
    --container_annovar $container_annovar

INFO: Running [32mbim_merge[0m: Merge all the bimfiles into a single file to use later with awk Only need to run this cell once
INFO: [32mbim_merge[0m is [32mcompleted[0m.
INFO: [32mbim_merge[0m output:   [32m/home/dc2325/project/results/exome_bim_merge/ukb23155_chr1_chr22.bim[0m
INFO: Workflow bim_merge (ID=w7c5d4415eb1baffa) is executed successfully with 1 completed step.



### Annotate exome data (old pheno) with bfile

In [13]:
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
ukbb=$UKBB_PATH
USER_PATH=/home/dc2325/project
cwd=/home/dc2325/scratch60/output/bfile_annovar
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f3393_hearing_aid_exomes_bfile/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.snp_stats.gz
hg=38
job_size=1
container_annovar=/gpfs/gibbs/pi/dewan/data/UKBiobank/annovar.sif
bimfiles=`echo /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=/home/dc2325/project/results/exome_bim_merge/ukb23155_chr1_chr22.bim
humandb=/gpfs/ysm/datasets/db/annovar/humandb

sos run ~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb annovar \
    --cwd $cwd \
    --sumstatsFile $sumstatsFile\
    --bim_name $bim_name \
    --bimfiles $bimfiles\
    --hg $hg \
    --job_size $job_size \
    --humandb $humandb\
    --ukbb $ukbb \
    --container_annovar $container_annovar \
    -s build

INFO: Running [32mannovar_1[0m: Get the list of significantly associated SNPs
INFO: Step [32mannovar_1[0m (index=0) is [32mignored[0m with signature constructed
INFO: [32mannovar_1[0m output:   [32m/home/dc2325/scratch60/output/bfile_annovar/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.snp_annotate[0m
INFO: Running [32mannovar_2[0m: Get chr, start, end, ref_allele, alt_allele format
INFO: Step [32mannovar_2[0m (index=0) is [32mignored[0m with signature constructed
INFO: [32mannovar_2[0m output:   [32m/home/dc2325/scratch60/output/bfile_annovar/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.avinput[0m
INFO: Running [32mannovar_3[0m: Annotate variants file using ANNOVAR
INFO: [32mannovar_3[0m is [32mcompleted[0m.
INFO: [32mannovar_3[0m output:   [32m/home/dc2325/scratch60/output/bfile_annovar/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.hg38_multianno.csv[0m
INFO: Workflow annovar (I

### Annotate exome data (new phenotype) with bfile

In [None]:
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
ukbb=$UKBB_PATH
USER_PATH=/home/dc2325/project
cwd=/home/dc2325/scratch60/output/200k_new_pheno_annovar
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_hearing_difficulty_exomes200K_noqc_ctrl_na/062421_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_144952ind_f2247_ctrl_na.regenie.snp_stats.gz
hg=38
job_size=1
container_annovar=$UKBB_PATH/annovar.sif
bimfiles=`echo /gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=/home/dc2325/project/results/exome_bim_merge/ukb23155_chr1_chr22.bim
humandb=/gpfs/ysm/datasets/db/annovar/humandb

sos run ~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb annovar \
    --cwd $cwd \
    --sumstatsFile $sumstatsFile\
    --bim_name $bim_name \
    --hg $hg \
    --job_size $job_size \
    --humandb $humandb\
    --ukbb $ukbb \
    --bimfiles $bimfiles\
    --container_annovar $container_annovar

## Annotation using ANNOVAR of 200Kexomes, 50K exomes and bgen imputed variants

### 200K exomes

In [3]:
annovar_dir=$UKBB_PATH/results/annovar_exome
bedfiles=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
bimfiles=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_chr1_chr22_exomedata.bim
annovar_sbatch=../output/$(date +"%Y-%m-%d")_annovar_chr1_22_exomes_postgwa.sbatch
tpl_file=../farnam.yml
annovar_sos=~/project/UKBB_GWAS_dev/workflow/QC_Exome_UKBB.ipynb
job_size=1
humandb=/gpfs/ysm/datasets/db/annovar/humandb
ukbb=/gpfs/gibbs/pi/dewan/data/UKBiobank
container_annovar=$UKBB_PATH/annovar.sif 
name_prefix=ukb23155_chr1_chr22

# Use the bim_merge workflow first and then the annovar workflow
annovar_args="""annovar
    --cwd $annovar_dir \
    --bedfiles $bedfiles\
    --bimfiles $bimfiles \
    --bim_name $bim_name \
    --humandb $humandb \
    --ukbb $ukbb \
    --job_size $job_size \
    --name_prefix $name_prefix \
    --container_annovar $container_annovar
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $annovar_sos \
    --to-script $annovar_sbatch \
    --args "$annovar_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-07_annovar_chr1_22_exomes_postgwa.sbatch[0m
INFO: Workflow farnam (ID=w696e79ce57c148dd) is executed successfully with 1 completed step.



### 50K exomes

In [3]:
# First run using only pure controls for f3393 
annovar_dir=$UKBB_PATH/results/annovar_exome
bfiles=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
bim_name=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bim
annovar_sbatch=~/project/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_annovar_chr1_22_50Kexomes_postgwa.sbatch
annovar_sos=~/project/bioworkflows/variant-annotation/annovar.ipynb
job_size=1
humandb=/gpfs/ysm/datasets/db/annovar/humandb
ukbb=/gpfs/gibbs/pi/dewan/data/UKBiobank
container_annovar=$UKBB_PATH/annovar.sif 
name_prefix=ukb23155_chr1_chr22_50Kexomes

# Use the bim_merge workflow first and then the annovar workflow
annovar_args="""annovar
    --cwd $annovar_dir \
    --bim_name $bim_name \
    --humandb $humandb \
    --ukbb $ukbb \
    --job_size $job_size \
    --name_prefix $name_prefix \
    --container_annovar $container_annovar
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $annovar_sos \
    --to-script $annovar_sbatch \
    --args "$annovar_args"


INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-30_annovar_chr1_22_50Kexomes_postgwa.sbatch[0m
INFO: Workflow farnam (ID=w29025279f82f6eb9) is executed successfully with 1 completed step.



### BGEN imputed data

# 2. Hearing difficulty/problems f.2247

## FastGWA job white British

In [None]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_hearing_difficulty
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2247_imp-fastgwa.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_Hearing_difficulty_f2247
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_Hearing_difficulty_f2247
phenoCol=hearing_diff_new
covarCol=sex
qCovarCol=age_final_diff

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

## FastGWA all white

In [3]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_hearing_difficulty_expandedwhite
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2247_expanded_white_imp-fastgwa.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/120120_UKBB_Hearing_difficulty_f2247_expandedwhite
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/120120_UKBB_Hearing_difficulty_f2247_expandedwhite
phenoCol=hearing_diff_new
covarCol=sex
qCovarCol=age_final_diff

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-12-02_f2247_expanded_white_imp-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=9da5aed9605f3f9f) is executed successfully with 1 completed step.


## FastGWA exome data

In [2]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_exome_data/f2247_hearing_difficulty_exomes
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2247_exomes-fastgwa.sbatch
bfile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/exome_files_snpsonly/ukb23155.filtered.merged.bed
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
sampleFile=
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/phenotypes_exome_data/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/phenotypes_exome_data/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes
formatFile_fastgwa=~/project/UKBB_GWAS_dev/data/fastGWA_template.yml
phenoCol="hearing_diff_new"
covarCol=sex
qCovarCol=age_final_diff
bgenMinMAF=0.001

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --genoFile $genoFile 
    --phenoFile $phenoFile
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-01-12_f2247_exomes-fastgwa.sbatch[0m
INFO: Workflow farnam (ID=372a90dd555a65fb) is executed successfully with 1 completed step.



## Regenie exome data

In [5]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_hearing_difficulty_exomes_bfile
lmm_sbatch_regenie=../output/$(date +"%Y-%m-%d")_f2247_hearing_difficulty_exome_bfile-regenie.sbatch
phenoFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes
covarFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes
phenoCol=hearing_diff_new
covarCol=sex
qCovarCol=age_final_diff
#Use original bed files from the UKBB exome data
#bfile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/exome_files_snpsonly/ukb23155.filtered.merged.bed
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_prefix $lowmem_prefix
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-12_f2247_hearing_difficulty_exome_bfile-regenie.sbatch[0m
INFO: Workflow farnam (ID=w6ce6b23ca9b23cce) is executed successfully with 1 completed step.



## Regenie imputed data: expanded white control NA

This run includes the new phenotype information with the imputed data

In [5]:
lmm_dir_regenie=$lmm_imp_dir_regenie/f2247_hearing_difficulty_impdata_newpheno
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_hearing_difficulty_impdata-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_316411ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_316411ind
phenoCol=f2247_ctrl_na
covarCol=sex
qCovarCol=age_final_diff_new
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-07-13_f2247_hearing_difficulty_impdata-regenie.sbatch[0m
INFO: Workflow farnam (ID=w80da2f9338cdcf29) is executed successfully with 1 completed step.



## Regenie: 50K exomes replication set

### f.2247 & pure controls

In [2]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_hearing_difficulty_exomes50K_pure_ctrl
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_hearing_difficulty_exomes50K_pure_ctrl-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_pure_ctrl_184909ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_pure_ctrl_184909ind
phenoCol=f2247_ctrl_pure
covarCol=sex
qCovarCol=age_final_diff_new
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-21_f2247_hearing_difficulty_exomes50K_pure_ctrl-regenie.sbatch[0m
INFO: Workflow farnam (ID=wcaa8ffe2883ba03a) is executed successfully with 1 completed step.



In [None]:
# First run using controls na for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f3393_hearing_aid_exomes200K_noqc_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f3393_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/062421_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_104402ind
covarFile=$hearing_pheno_path/062421_UKBB_Hearing_aid_f3393_expandedwhite_z974included_ctrl_na_104402ind
phenoCol=f3393_ctrl_na
covarCol=sex
qCovarCol=age_final_aid
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed
hwe_filter=5e-08

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

### f.2247 Controls with NA for f.3393

In [6]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_hearing_difficulty_exomes50K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_hearing_difficulty_exomes50K_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_316411ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_316411ind
phenoCol=f2247_ctrl_na
covarCol=sex
qCovarCol=age_final_diff_new
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-19_f2247_hearing_difficulty_exomes50K_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=wcb0151f3731055ae) is executed successfully with 1 completed step.



## Regenie in exome data (original Plink files UKBB unqc'ed) using modified phenotype file with controls_na for f.3393

### f.2247 Controls with NA for f.3393

In [3]:
# First run using controls na for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_hearing_difficulty_exomes200K_noqc_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/062421_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_144952ind
covarFile=$hearing_pheno_path/062421_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_144952ind
phenoCol=f2247_ctrl_na
covarCol=sex
qCovarCol=age_final_diff_new
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed
hwe_filter=5e-08

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-06-24_f2247_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=w82514504865b0a4f) is executed successfully with 1 completed step.



## Regenie in exome data after VCF-QC 200K exomes

### f.2247 Controls with NA for f.3393

In [10]:
# Run using all controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_hearing_difficulty_exomes200K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/f2247_hearing_difficulty_exomes200K_ctrl_na-regenie_$(date +"%Y-%m-%d").sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_316411ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_difficulty_f2247_expandedwhite_z974included_ctrl_na_316411ind
phenoCol=f2247_ctrl_na
covarCol=sex
qCovarCol=age_final_diff_new
genoFile=`echo /mnt/mfs/statgen/UKBiobank/data/exome_files/project_VCF/plink_files/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/data/genotype_files/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/project/UKBB_GWAS_dev/output/f2247_hearing_difficulty_exomes200K_ctrl_na-regenie_2021-05-18.sbatch[0m
INFO: Workflow csg (ID=wdf2119aff8d7a187) is executed successfully with 1 completed step.



## LD clumping job

### Imputed data

In [None]:
clumping_dir=$UKBB_PATH/results/LD_clumping/f2247_hearing_difficulty
clumping_sos=~/project/bioworkflows/admin/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f2247_hearing_difficulty_ldclumping.sbatch
sumstatsFiles=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_hearing_difficulty/200828_UKBB_Hearing_difficulty_f2247_hearing_diff_new.fastGWA.snp_stats.gz

clumping_args="""default 
    --cwd $clumping_dir 
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --bgenFile $bgenFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

### Exome data

In [4]:
tpl_file=../farnam.yml
clumping_dir=$UKBB_PATH/results/LD_clumping/f2247_hearing_difficulty_exome
clumping_sos=~/project/bioworkflows/admin/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f2247_hearing_diff_exome_ldclumping.sbatch
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sumstatsFiles=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_hearing_difficulty_exomes/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes_hearing_diff_new.regenie.snp_stats.gz
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
sampleFile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631.fam
unrelated_samples=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
bfile_ref=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631_chr1_22_exomedata.1200.ref_geno.bed
container_lmm=$UKBB_PATH/lmm.sif
ld_sample_size=1200
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1

# Select samples filter_samples workflow & create reference file with reference workflow
# Then use default workflow to run the LD clumping
clumping_args="""default
    --cwd $clumping_dir
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --genoFile $genoFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
    --container_lmm $container_lmm
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-07_f2247_hearing_diff_exome_ldclumping.sbatch[0m
INFO: Workflow farnam (ID=w923c2b21e51a953e) is executed successfully with 1 completed step.



## Post-GWAS annotation

In [8]:
lmm_dir=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_hearing_difficulty_exomes
postgwa_sbatch=../output/$(date +"%Y-%m-%d")_f2247_postgwa.sbatch
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_hearing_difficulty_exomes/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes_hearing_diff_new.regenie.snp_stats.gz
tpl_file=../farnam.yml
postgwa_sos=~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb
job_size=1
hg=38
postgwa_args="""default
    --cwd $lmm_dir
    --sumstatsFile $sumstatsFile
    --hg $hg
    --job_size $job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $postgwa_sos \
    --to-script $postgwa_sbatch \
    --args "$postgwa_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-01-28_f2247_postgwa.sbatch[0m
INFO: Workflow farnam (ID=w50d0cee396725518) is executed successfully with 1 completed step.


In [None]:
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
cwd=/home/dc2325/scratch60/output/bfile_annovar
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_hearing_difficulty_exomes_bfile/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes_hearing_diff_new.regenie.snp_stats.gz
hg=38
job_size=1
container_annovar=$UKBB_PATH/annovar.sif
bimfiles=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=/home/dc2325/scratch60/output/ukb23155_chr1_chr22.bim
humandb=/gpfs/ysm/datasets/db/annovar/humandb

sos run ~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb annovar \
    --cwd $cwd \
    --sumstatsFile $sumstatsFile\
    --bim_name $bim_name \
    --hg $hg \
    --job_size $job_size \
    --humandb $humandb\
    --ukbb $UKBB_PATH \
    --container_annovar $container_annovar\
    -s build

# 3. Hearing difficulty with background noise f.2257

## FastGWA job white British

In [None]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2257_hearing_background_noise
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2257_imp-fastgwa.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_Hearing_background_noise_f2257
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_Hearing_background_noise_f2257
phenoCol=hearing_noise_cat
covarCol=sex
qCovarCol=age_final_noise

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

## FastGWA job all white

In [6]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2257_hearing_background_noise_expandedwhite
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2257_expandedwhite_imp-fastgwa.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/120120_UKBB_Hearing_background_noise_f2257_expandedwhite
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/120120_UKBB_Hearing_background_noise_f2257_expandedwhite
phenoCol=hearing_noise_cat
covarCol=sex
qCovarCol=age_final_noise

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb dewan \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mdewan[0m: Configuration for Yale `pi_dewan` partition cluster
INFO: [32mdewan[0m is [32mcompleted[0m.
INFO: [32mdewan[0m output:   [32m../output/2020-12-02_f2257_expandedwhite_imp-fastgwa.sbatch[0m
INFO: Workflow dewan (ID=5e423268c0ec2f3b) is executed successfully with 1 completed step.


## Regenie exome data

In [7]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2257_hearing_noise_exomes_bfile
lmm_sbatch_regenie=../output/$(date +"%Y-%m-%d")_f2257_hearing_noise_exome_bfile-regenie.sbatch
phenoFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_background_noise_f2257_175531ind_exomes
covarFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_Hearing_background_noise_f2257_175531ind_exomes
phenoCol=hearing_noise_cat
covarCol=sex
qCovarCol=age_final_noise
#Use original bed files from the UKBB exome data
#bfile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/exome_files_snpsonly/ukb23155.filtered.merged.bed
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_prefix $lowmem_prefix
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-12_f2257_hearing_noise_exome_bfile-regenie.sbatch[0m
INFO: Workflow farnam (ID=wd8efa7feebdc6a0e) is executed successfully with 1 completed step.



## Regenie imputed data: expanded white control NA

In [6]:
lmm_dir_regenie=$lmm_imp_dir_regenie/f2257_hearing_noise_impdata_newpheno
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2257_hearing_noise_impdata-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_363603ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_363603ind
phenoCol=f2257_ctrl_na
covarCol=sex
qCovarCol=age_final_noise
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-07-13_f2257_hearing_noise_impdata-regenie.sbatch[0m
INFO: Workflow farnam (ID=w072e7fe313d2ff85) is executed successfully with 1 completed step.



## Regenie: 50K exomes replication set

### f.2257 & Pure controls

In [3]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2257_hearing_background_noise_exomes50K_pure_ctrl
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2257_hearing_background_noise_exomes50K_pure_ctrl-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_pure_ctrl_232101ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_pure_ctrl_232101ind
phenoCol=f2257_ctrl_pure
covarCol=sex
qCovarCol=age_final_noise
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-21_f2257_hearing_background_noise_exomes50K_pure_ctrl-regenie.sbatch[0m
INFO: Workflow farnam (ID=w33d438bc153fd1f4) is executed successfully with 1 completed step.



### f.2257 & controls NA for f.3393

In [4]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2257_hearing_background_noise_exomes50K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2257_hearing_background_noise_exomes50K_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_363603ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_363603ind
phenoCol=f2257_ctrl_na
covarCol=sex
qCovarCol=age_final_noise
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-21_f2257_hearing_background_noise_exomes50K_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=w0012483d59f4307c) is executed successfully with 1 completed step.



## Regenie in exome data (original Plink files UKBB unqc'ed) using modified phenotype file with controls_na for f.3393

### f.2257 Controls with NA for f.3393

In [4]:
# First run using controls na for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f2257_hearing_difficulty_exomes200K_noqc_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2257_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/062421_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_166199ind
covarFile=$hearing_pheno_path/062421_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_166199ind
phenoCol=f2257_ctrl_na
covarCol=sex
qCovarCol=age_final_noise
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed
hwe_filter=5e-08

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-06-24_f2257_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=wdb0e3955c5caa7bd) is executed successfully with 1 completed step.



## Regenie in exome data after VCF-QC 200K exomes

### f.2257 & controls NA for f.3393

In [11]:
# Run using all controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f2257_hearing_background_noise_exomes200K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/f2257_hearing_background_noise_exomes200K_ctrl_na-regenie_$(date +"%Y-%m-%d").sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_363603ind
covarFile=$hearing_pheno_path/041521_UKBB_Hearing_background_noise_f2257_expandedwhite_z974included_ctrl_na_363603ind
phenoCol=f2257_ctrl_na
covarCol=sex
qCovarCol=age_final_noise
genoFile=`echo /mnt/mfs/statgen/UKBiobank/data/exome_files/project_VCF/plink_files/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/data/genotype_files/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/project/UKBB_GWAS_dev/output/f2257_hearing_background_noise_exomes200K_ctrl_na-regenie_2021-05-18.sbatch[0m
INFO: Workflow csg (ID=w5cbf62c4eb06bba8) is executed successfully with 1 completed step.



## LD clumping job

### Imputed data

In [None]:
clumping_dir=$UKBB_PATH/results/LD_clumping/f2257_hearing_background_noise
clumping_sos=~/project/bioworkflows/admin/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f2257_hearing_background_noise_ldclumping.sbatch
sumstatsFiles=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2257_hearing_background_noise/200828_UKBB_Hearing_background_noise_f2257_hearing_noise_cat.fastGWA.snp_stats.gz

clumping_args="""default 
    --cwd $clumping_dir 
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --bgenFile $bgenFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

### Exome data

In [5]:
tpl_file=../farnam.yml
clumping_dir=$UKBB_PATH/results/LD_clumping/f2257_hearing_background_noise_exome
clumping_sos=~/project/bioworkflows/admin/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f2257_hearing_background_noise_exome_ldclumping.sbatch
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sumstatsFiles=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2257_hearing_noise_exomes/010421_UKBB_Hearing_background_noise_f2257_175531ind_exomes_hearing_noise_cat.regenie.snp_stats.gz
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
sampleFile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631.fam
unrelated_samples=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
bfile_ref=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631_chr1_22_exomedata.1200.ref_geno.bed
container_lmm=$UKBB_PATH/lmm.sif
ld_sample_size=1200
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1

# Select samples filter_samples workflow & create reference file with reference workflow
# Then use default workflow to run the LD clumping
clumping_args="""default
    --cwd $clumping_dir
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --genoFile $genoFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
    --container_lmm $container_lmm
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-07_f2257_hearing_diff_exome_ldclumping.sbatch[0m
INFO: Workflow farnam (ID=w7c7b3789d732c7e0) is executed successfully with 1 completed step.



## Post-GWAS annotation

In [9]:
lmm_dir=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2257_hearing_noise_exomes
postgwa_sbatch=../output/$(date +"%Y-%m-%d")_f2257_postgwa.sbatch
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2257_hearing_noise_exomes/010421_UKBB_Hearing_background_noise_f2257_175531ind_exomes_hearing_noise_cat.regenie.snp_stats.gz
tpl_file=../farnam.yml
postgwa_sos=~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb
job_size=1
hg=38
postgwa_args="""default
    --cwd $lmm_dir
    --sumstatsFile $sumstatsFile
    --hg $hg
    --job_size $job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $postgwa_sos \
    --to-script $postgwa_sbatch \
    --args "$postgwa_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-01-28_f2257_postgwa.sbatch[0m
INFO: Workflow farnam (ID=w9856930ebcb27a9d) is executed successfully with 1 completed step.


In [None]:
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
cwd=/home/dc2325/scratch60/output/bfile_annovar
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2257_hearing_noise_exomes_bfile/010421_UKBB_Hearing_background_noise_f2257_175531ind_exomes_hearing_noise_cat.regenie.snp_stats.gz
hg=38
job_size=1
container_annovar=$UKBB_PATH/annovar.sif
bimfiles=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=/home/dc2325/scratch60/output/ukb23155_chr1_chr22.bim
humandb=/gpfs/ysm/datasets/db/annovar/humandb

sos run ~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb annovar \
    --cwd $cwd \
    --sumstatsFile $sumstatsFile\
    --bim_name $bim_name \
    --hg $hg \
    --job_size $job_size \
    --humandb $humandb\
    --ukbb $UKBB_PATH \
    --container_annovar $container_annovar\
    -s build

# 4. Combined phenotype f.2247 & f.2257

## FastGWA job white British

In [None]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_f2257_combined
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2247_f2257_imp-fastgwa.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_f2247_f2257
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_f2247_f2257
phenoCol=f2247_f2257
covarCol=sex
qCovarCol=age

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

## FastGWA job all white

In [7]:
lmm_dir_fastgwa=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_f2257_combined_expandedwhite
lmm_sbatch_fastgwa=../output/$(date +"%Y-%m-%d")_f2247_f2257_expandedwhite_imp-fastgwa.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/120120_UKBB_f2247_f2257_expandedwhite
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/120120_UKBB_f2247_f2257_expandedwhite
phenoCol=f2247_f2257
covarCol=sex
qCovarCol=age

lmm_args="""fastGWA
    --cwd $lmm_dir_fastgwa 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_fastgwa 
    --covarFile $covarFile  
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --grmFile $grmFile
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb dewan \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_fastgwa \
    --args "$lmm_args"

INFO: Running [32mdewan[0m: Configuration for Yale `pi_dewan` partition cluster
INFO: [32mdewan[0m is [32mcompleted[0m.
INFO: [32mdewan[0m output:   [32m../output/2020-12-02_f2247_f2257_expandedwhite_imp-fastgwa.sbatch[0m
INFO: Workflow dewan (ID=cca88c31ebd37730) is executed successfully with 1 completed step.


## Bolt-LMM job

In [3]:
lmm_dir_bolt=$UKBB_PATH/results/BOLTLMM_results/results_imputed_data/f2247_f2257_combined
lmm_sbatch_bolt=../output/$(date +"%Y-%m-%d")_f2247_f2257_imp-bolt.sbatch
phenoFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_f2247_f2257
covarFile=$UKBB_PATH/phenotype_files/hearing_impairment/200828_UKBB_f2247_f2257
phenoCol=f2247_f2257
covarCol=sex
qCovarCol=age
lmm_option='lmmForceNonInf'

lmm_args="""boltlmm
    --cwd $lmm_dir_bolt 
    --bfile $bfile 
    --sampleFile $sampleFile
    --bgenFile $bgenFile 
    --phenoFile $phenoFile 
    --formatFile $formatFile_bolt 
    --covarFile $covarFile 
    --LDscoresFile $LDscoresFile 
    --geneticMapFile $geneticMapFile 
    --phenoCol $phenoCol 
    --covarCol $covarCol 
    --covarMaxLevels $covarMaxLevels 
    --qCovarCol $qCovarCol 
    --lmm_option $lmm_option
    --numThreads $numThreads 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO 
    --job_size $lmm_job_size
    --ylim $ylim
    --container_lmm $container_lmm
    --container_marp $container_marp    
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_bolt \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-28_f2247_f2257_imp-bolt.sbatch[0m
INFO: Workflow farnam (ID=6f1d2712738ba187) is executed successfully with 1 completed step.



## Regenie exome data

In [8]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_f2257_combined_exomes_bfile
lmm_sbatch_regenie=../output/$(date +"%Y-%m-%d")_f2247_f2257_combined_exome_bfile-regenie.sbatch
phenoFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_f2247_f2257_136862ind_exomes
covarFile=$hearing_pheno_path/phenotypes_exome_data/010421_UKBB_f2247_f2257_136862ind_exomes
phenoCol=f2247_f2257
covarCol=sex
qCovarCol=age_combined
#Use original bed files from the UKBB exome data
#bfile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/exome_files_snpsonly/ukb23155.filtered.merged.bed
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_prefix $lowmem_prefix
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-12_f2247_f2257_combined_exome_bfile-regenie.sbatch[0m
INFO: Workflow farnam (ID=wbd137a88958a3c38) is executed successfully with 1 completed step.



## Regenie imputed data: expanded white controls NA

In [7]:
lmm_dir_regenie=$lmm_imp_dir_regenie/f2247_f2257_combined_impdata_newpheno
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_f2257_combined_impdata-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_299916ind
covarFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_299916ind
phenoCol=f2247_f2257_ctrl_na
covarCol=sex
qCovarCol=age_combined
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-07-13_f2247_f2257_combined_impdata-regenie.sbatch[0m
INFO: Workflow farnam (ID=w5bc39c94a28aa7c9) is executed successfully with 1 completed step.



## Regenie: 50K replication set

### combined phenotype & pure controls

In [5]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_f2257_exomes50K_pure_ctrl
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_f2257_exomes50K_pure_ctrl-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_pure_ctrl_168414ind
covarFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_pure_ctrl_168414ind
phenoCol=f2247_f2257_ctrl_pure
covarCol=sex
qCovarCol=age_combined
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-21_f2247_f2257_exomes50K_pure_ctrl-regenie.sbatch[0m
INFO: Workflow farnam (ID=w5e147b05ebd7fc54) is executed successfully with 1 completed step.



### combined phenotype & controls Na for f.3393

In [6]:
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_f2257_exomes50K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_f2257_exomes50K_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_299916ind
covarFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_299916ind
phenoCol=f2247_f2257_ctrl_na
covarCol=sex
qCovarCol=age_combined
genoFile=/gpfs/gibbs/pi/dewan/data/UKBiobank/genotype_files/ukb28374_exomedata/ukb32285_exomespb_chr1_22.bed
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-04-21_f2247_f2257_exomes50K_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=w1a01bde43afe534a) is executed successfully with 1 completed step.



## Regenie in exome data (original Plink files UKBB unqc'ed) using modified phenotype file with controls_na for f.3393

### Combined phenotype Controls with NA for f.3393

In [5]:
# First run using controls na for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_f2257_exomes200K_noqc_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_f2247_f2257_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch
phenoFile=$hearing_pheno_path/062421_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_137245ind
covarFile=$hearing_pheno_path/062421_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_137245ind
phenoCol=f2247_f2257_ctrl_na
covarCol=sex
qCovarCol=age_combined
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_originalgenotypefilesdownloaded083019/UKB_genotypedatadownloaded083019.bed
hwe_filter=5e-08

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m/home/dc2325/project/UKBB_GWAS_dev/output/2021-06-24_f2247_f2257_hearing_aid_exomes200K_noqc_ctrl_na-regenie.sbatch[0m
INFO: Workflow farnam (ID=w8700ff76175d3c64) is executed successfully with 1 completed step.



## Regenie in exome data after VCF-QC 200K exomes

### combined phenotype & controls Na for f.3393

In [12]:
# Run using all controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/f2247_f2257_exomes200K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/f2247_f2257_exomes200K_ctrl_na-regenie_$(date +"%Y-%m-%d").sbatch
phenoFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_299916ind
covarFile=$hearing_pheno_path/041521_UKBB_f2247_f2257_expandedwhite_z974included_ctrl_na_299916ind
phenoCol=f2247_f2257_ctrl_na
covarCol=sex
qCovarCol=age_combined
genoFile=`echo /mnt/mfs/statgen/UKBiobank/data/exome_files/project_VCF/plink_files/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/data/genotype_files/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/project/UKBB_GWAS_dev/output/f2247_f2257_exomes200K_ctrl_na-regenie_2021-05-18.sbatch[0m
INFO: Workflow csg (ID=w6bd7047f416af43a) is executed successfully with 1 completed step.



## LD clumping job

### Imputed data

In [None]:
clumping_dir=$UKBB_PATH/results/LD_clumping/f2247_f2257_combined
clumping_sos=~/project/bioworkflows/admin/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f2247_f2257_combined_ldclumping.sbatch
sumstatsFiles=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_f2257_combined/200828_UKBB_f2247_f2257_f2247_f2257.fastGWA.snp_stats.gz

clumping_args="""default 
    --cwd $clumping_dir 
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --bgenFile $bgenFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

### Exome data

In [6]:
tpl_file=../farnam.yml
clumping_dir=$UKBB_PATH/results/LD_clumping/f2247_f2257_combined_exome
clumping_sos=~/project/bioworkflows/admin/LD_Clumping.ipynb
clumping_sbatch=../output/$(date +"%Y-%m-%d")_f2247_f2257_combined_exome_ldclumping.sbatch
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated082020removedwithdrawnindiv.bed
sumstatsFiles=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes/010421_UKBB_f2247_f2257_136862ind_exomes_f2247_f2257.regenie.snp_stats.gz
genoFile=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bed`
sampleFile=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631.fam
unrelated_samples=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620
bfile_ref=$UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_s200631_chr1_22_exomedata.1200.ref_geno.bed
container_lmm=$UKBB_PATH/lmm.sif
ld_sample_size=1200
clump_field=P
clump_p1=5e-08
clump_p2=1
clump_r2=0.2
clump_kb=2000
clump_annotate=BP
numThreads=20
clump_job_size=1

# Select samples filter_samples workflow & create reference file with reference workflow
# Then use default workflow to run the LD clumping
clumping_args="""default
    --cwd $clumping_dir
    --bfile $bfile
    --bfile_ref $bfile_ref 
    --genoFile $genoFile
    --sampleFile $sampleFile 
    --sumstatsFiles $sumstatsFiles 
    --unrelated_samples $unrelated_samples 
    --ld_sample_size $ld_sample_size 
    --clump_field $clump_field
    --clump_p1 $clump_p1 
    --clump_p2 $clump_p2 
    --clump_r2 $clump_r2 
    --clump_kb $clump_kb 
    --clump_annotate $clump_annotate 
    --numThreads $numThreads 
    --job_size $clump_job_size
    --container_lmm $container_lmm
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $clumping_sos \
    --to-script $clumping_sbatch \
    --args "$clumping_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-07_f2247_f2257_combined_exome_ldclumping.sbatch[0m
INFO: Workflow farnam (ID=wf81e3f1aaede6aab) is executed successfully with 1 completed step.



### Post-GWAS annotation Snp-to-gene

In [10]:
lmm_dir=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes
postgwa_sbatch=../output/$(date +"%Y-%m-%d")_f2247_f2257_postgwa.sbatch
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes/010421_UKBB_f2247_f2257_136862ind_exomes_f2247_f2257.regenie.snp_stats.gz
tpl_file=../farnam.yml
postgwa_sos=~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb
job_size=1
hg=38

postgwa_args="""default
    --cwd $lmm_dir
    --sumstatsFile $sumstatsFile
    --hg $hg
    --job_size $job_size
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $postgwa_sos \
    --to-script $postgwa_sbatch \
    --args "$postgwa_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-01-28_f2247_f2257_postgwa.sbatch[0m
INFO: Workflow farnam (ID=w311a9815d7e57ba3) is executed successfully with 1 completed step.


### Post-GWAS annotation ANNOVAR

In [None]:
lmm_dir=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes
postgwa_sbatch=../output/$(date +"%Y-%m-%d")_f2247_f2257_postgwa.sbatch
postgwa_sos=~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb
tpl_file=../farnam.yml
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes/010421_UKBB_f2247_f2257_136862ind_exomes_f2247_f2257.regenie.snp_stats.gz
hg=38
job_size=1
container_annovar=/home/dc2325/scratch60/annovar.sif
bimfiles=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=/home/dc2325/scratch60/output/ukb23155_chr1_chr22.bim
humandb=/gpfs/ysm/datasets/db/annovar/humandb

annovar_args="""annovar
    --cwd $lmm_dir
    --hg $hg
    --bimfiles $bimfiles
    --bim_name $bim_name
    --sumstatsFile $sumstatsFile
    --hg $hg
    --humandb $humandb
    --job_size $job_size
    --container_annovar $container_annovar
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $postgwa_sos \
    --to-script $postgwa_sbatch \
    --args "$postgwa_args"

In [None]:
UKBB_PATH=/gpfs/gibbs/pi/dewan/data/UKBiobank
cwd=/home/dc2325/scratch60/output/bfile_annovar
sumstatsFile=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes_bfile/010421_UKBB_f2247_f2257_136862ind_exomes_f2247_f2257.regenie.snp_stats.gz
hg=38
job_size=1
container_annovar=$UKBB_PATH/annovar.sif
bimfiles=`echo $UKBB_PATH/genotype_files/ukb28374_exomedata/exome_data_OCT2020/ukb23155_c{1..22}_b0_v1.bim`
bim_name=/home/dc2325/scratch60/output/ukb23155_chr1_chr22.bim
humandb=/gpfs/ysm/datasets/db/annovar/humandb

sos run ~/project/UKBB_GWAS_dev/workflow/snptogene.ipynb annovar \
    --cwd $cwd \
    --sumstatsFile $sumstatsFile\
    --bim_name $bim_name \
    --hg $hg \
    --job_size $job_size \
    --humandb $humandb\
    --ukbb $UKBB_PATH \
    --container_annovar $container_annovar\
    -s build

# 5. Hudson plot

## f2247_f2257 combined vs Hdiff wells paper

In [2]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/admin/Hudson_plot.ipynb
hudson_dir=$UKBB_PATH/results/hudson_plots/hearing_impairment/
hudson_sbatch=../output/$(date +"%Y-%m-%d")_f2247_f2257_vs_hdiff_hudson.sbatch
sumstats_1=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_f2257_combined/200828_UKBB_f2247_f2257_f2247_f2257.fastGWA.snp_stats.gz
sumstats_2=/home/dc2325/project/HI_UKBB/2019_Wells_sumstats/HD_EA_gwas_sumstats.txt.gz
toptitle="f2247_f2257_combined"
bottomtitle="Hdiff_Wells_GWAS"
highlight_p_top=0.0
highlight_p_bottom=0.0
pval_filter=5e-08
highlight_snp=/home/dc2325/project/HI_UKBB/2019_Wells_sumstats/hdiff_snps_wells
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --highlight_snp $highlight_snp
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-26_f2247_f2257_vs_hdiff_hudson.sbatch[0m
INFO: Workflow farnam (ID=f34e7c33d379875d) is executed successfully with 1 completed step.



## f3393 vs haid wells paper

In [2]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/admin/Hudson_plot.ipynb
hudson_dir=$UKBB_PATH/results/hudson_plots/hearing_impairment/
hudson_sbatch=../output/$(date +"%Y-%m-%d")_f3393_vs_Haid_hudson.sbatch
sumstats_1=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f3393_hearing_aid/200828_UKBB_Hearing_aid_f3393_hearing_aid_cat.fastGWA.snp_stats.gz
sumstats_2=/home/dc2325/project/HI_UKBB/2019_Wells_sumstats/HAID_EA_gwas_sumstats.txt.gz
toptitle="f_3393_hearing_aid"
bottomtitle="Haid_Wells_GWAS"
highlight_p_top=0.0
highlight_p_bottom=0.0
pval_filter=5e-08
highlight_snp=/home/dc2325/project/HI_UKBB/2019_Wells_sumstats/haid_snps_wells
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --highlight_snp $highlight_snp
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-22_f3393_vs_Haid_hudson.sbatch[0m
INFO: Workflow farnam (ID=f48b56e3e091ef16) is executed successfully with 1 completed step.



## f.2247 exome and inputed data

In [7]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/admin/Hudson_plot.ipynb
hudson_dir=$UKBB_PATH/results/hudson_plots
hudson_sbatch=../output/$(date +"%Y-%m-%d")_f2247_imp_exome_hudson.sbatch
sumstats_1=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_hearing_difficulty/200828_UKBB_Hearing_difficulty_f2247_hearing_diff_new.fastGWA.snp_stats.gz
sumstats_2=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_hearing_difficulty_exomes/010421_UKBB_Hearing_difficulty_f2247_171970ind_exomes_hearing_diff_new.regenie.snp_stats.gz
toptitle="f2247_imputed"
bottomtitle="f2247_exome"
phenocol1="f2247_imputed"
phenocol2="f2247_exome"
highlight_p_top=5e-08
highlight_p_bottom=5e-08
pval_filter=5e-08
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --phenocol1 $phenoCol1
    --phenocol2 $phenoCol2
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-02_f2247_imp_exome_hudson.sbatch[0m
INFO: Workflow farnam (ID=w2d252da39eb028af) is executed successfully with 1 completed step.



## f.2257 exome and imputed data

In [8]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/admin/Hudson_plot.ipynb
hudson_dir=$UKBB_PATH/results/hudson_plots
hudson_sbatch=../output/$(date +"%Y-%m-%d")_f2257_imp_exome_hudson.sbatch
sumstats_1=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2257_hearing_background_noise/200828_UKBB_Hearing_background_noise_f2257_hearing_noise_cat.fastGWA.snp_stats.gz
sumstats_2=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2257_hearing_noise_exomes/010421_UKBB_Hearing_background_noise_f2257_175531ind_exomes_hearing_noise_cat.regenie.snp_stats.gz
toptitle="f2257_imputed"
bottomtitle="f2257_exome"
phenocol1="f2257_imputed"
phenocol2="f2257_exome"
highlight_p_top=5e-08
highlight_p_bottom=5e-08
pval_filter=5e-08
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-02_f2257_imp_exome_hudson.sbatch[0m
INFO: Workflow farnam (ID=w0c5959e07610ebcd) is executed successfully with 1 completed step.



## Combined phenotype

In [8]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/admin/Hudson_plot.ipynb
hudson_dir=$UKBB_PATH/results/hudson_plots
hudson_sbatch=../output/$(date +"%Y-%m-%d")_f2247_f2257_imp_exome_hudson.sbatch
sumstats_1=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f2247_f2257_combined/200828_UKBB_f2247_f2257_f2247_f2257.fastGWA.snp_stats.gz
sumstats_2=$UKBB_PATH/results/REGENIE_results/results_exome_data/f2247_f2257_combined_exomes/010421_UKBB_f2247_f2257_136862ind_exomes_f2247_f2257.regenie.snp_stats.gz
toptitle="Combined_f2247_f2257_imputed"
bottomtitle="Combined_f2247_f2257_exome"
phenocol1="f2247_f2257_imputed"
phenocol2="f2247_f2257_exome"
highlight_p_top=5e-08
highlight_p_bottom=5e-08
pval_filter=5e-08
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-03_f2247_f2257_imp_exome_hudson.sbatch[0m
INFO: Workflow farnam (ID=w093ec0ac945b39c0) is executed successfully with 1 completed step.



## f3393 exome and imputed data

In [3]:
tpl_file=../farnam.yml
hudson_sos=~/project/bioworkflows/admin/Hudson_plot.ipynb
hudson_dir=$UKBB_PATH/results/hudson_plots
hudson_sbatch=../output/$(date +"%Y-%m-%d")_f3393_imp_exome_hudson.sbatch
sumstats_1=$UKBB_PATH/results/FastGWA_results/results_imputed_data/f3393_hearing_aid/200828_UKBB_Hearing_aid_f3393_hearing_aid_cat.fastGWA.snp_stats.gz
sumstats_2=$UKBB_PATH/results/REGENIE_results/results_exome_data/f3393_hearing_aid_exomes/010421_UKBB_Hearing_aid_f3393_128254ind_exomes_hearing_aid_cat.regenie.snp_stats.gz
toptitle="f3393_imputed"
bottomtitle="f3393_exome"
phenocol1="f3393_imputed"
phenocol2="f3393_exome"
highlight_p_top=5e-08
highlight_p_bottom=5e-08
pval_filter=5e-08
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

hudson_args="""hudson
    --cwd $hudson_dir
    --sumstats_1 $sumstats_1
    --sumstats_2 $sumstats_2
    --toptitle $toptitle
    --bottomtitle $bottomtitle
    --phenocol1 $phenocol1
    --phenocol2 $phenocol2
    --job_size $job_size
    --highlight_p_top $highlight_p_top
    --highlight_p_bottom $highlight_p_bottom
    --pval_filter $pval_filter
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $hudson_sos \
    --to-script $hudson_sbatch \
    --args "$hudson_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2021-02-03_f3393_imp_exome_hudson.sbatch[0m
INFO: Workflow farnam (ID=w7da36fe5bdb68b67) is executed successfully with 1 completed step.



# 6. Fine mapping

## f.3391 hearing aid

In [1]:
tpl_file=../farnam.yml
finemap_sos=~/project/UKBB_GWAS_dev/SuSiE_RSS.ipynb
finemap_dir=$UKBB_PATH/results/fine_mapping/f3393_hearing_aid
finemap_sbatch=../output/$(date +"%Y-%m-%d")_f3393_hearing_aid_susie.sbatch
sumstatFile=$UKBB_PATH/results/region_extraction/f3393_hearing_aid/10_126783170_126813028/200828_UKBB_Hearing_aid_f3393_hearing_aid_cat.fastGWA.snp_stats_10_126783170_126813028.sumstats.gz
ldFile=$UKBB_PATH/results/region_extraction/f3393_hearing_aid/10_126783170_126813028/200828_UKBB_Hearing_aid_f3393_hearing_aid_cat.fastGWA.snp_stats_10_126783170_126813028.sample_ld.gz
N=230411
job_size=1
container_lmm=$UKBB_PATH/lmm.sif

finemap_args="""default
    --cwd $finemap_dir
    --sumstatFile $sumstatFile
    --ldFile $ldFile
    --N $N
    --job_size $job_size
    --container_lmm $container_lmm
"""
sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb farnam \
    --template-file $tpl_file \
    --workflow-file $finemap_sos \
    --to-script $finemap_sbatch \
    --args "$finemap_args"

INFO: Running [32mfarnam[0m: Configuration for Yale `farnam` cluster
INFO: [32mfarnam[0m is [32mcompleted[0m.
INFO: [32mfarnam[0m output:   [32m../output/2020-10-15_f3393_hearing_aid_susie.sbatch[0m
INFO: Workflow farnam (ID=fdf18b3eb0fe563d) is executed successfully with 1 completed step.



# 7. Mendelian-like phenotype with UKBB 200K exome data (plink_geno_mind)

In [3]:
# Run using all controls for f3393 
lmm_dir_regenie=$lmm_exome_dir_regenie/mendelian_like_exomes200K_ctrl_na
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/mendelian_like_exomes200K_ctrl_na-regenie_$(date +"%Y-%m-%d").sbatch
phenoFile=$hearing_pheno_path/full_mendilian-like_pheno_file.tsv
covarFile=$hearing_pheno_path/full_mendilian-like_pheno_file.tsv
phenoCol=mendilian-like
covarCol=sex
#qCovarCol=age_final_aid
genoFile=`echo /mnt/mfs/statgen/UKBiobank/data/exome_files/project_VCF/plink_files/plink_geno_mind/ukb23156_c{1..22}.merged.filtered.bed`
#Use the original bed files for the genotype array for the expanded white on regenie step1
bfile=$UKBB_PATH/data/genotype_files/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

INFO: Running [32mcsg[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg[0m is [32mcompleted[0m.
INFO: [32mcsg[0m output:   [32m/home/dmc2245/project/UKBB_GWAS_dev/output/mendelian_like_exomes200K_ctrl_na-regenie_2021-07-30.sbatch[0m
INFO: Workflow csg (ID=w31f7be3d104e7443) is executed successfully with 1 completed step.



## Regenie imputed data: Expanded white control NA (08/10/21 analysis)

#### Analysis for f2247 & f2257 (080421)


In [None]:
lmm_dir_regenie=$lmm_imp_dir_regenie/081021_Combined_f2247_f2257
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_Combined_f2247_f2257-regenie.sbatch
phenoFile=$hearing_pheno_path/080421_UKBB_Combined_f2247_f2257_expandedwhite_39049cases_98082ctrl
covarFile=$hearing_pheno_path/080421_UKBB_Combined_f2247_f2257_expandedwhite_39049cases_98082ctrl
phenoCol=f2247_f2257
covarCol=sex
qCovarCol=age
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

#### Analysis for Hearing_aid_f3393 (080421)

In [None]:
lmm_dir_regenie=$lmm_imp_dir_regenie/081021_Hearing_aid_f3393
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_Hearing_aid_f3393-regenie.sbatch
phenoFile=$hearing_pheno_path/080421_UKBB_Hearing_aid_f3393_expandedwhite_6305cases_98082ctrl
covarFile=$hearing_pheno_path/080421_UKBB_Hearing_aid_f3393_expandedwhite_6305cases_98082ctrl
phenoCol=f3393
covarCol=sex
qCovarCol=age
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

#### Hearing_difficulty_f2247

In [None]:
lmm_dir_regenie=$lmm_imp_dir_regenie/081021_Hearing_difficulty_f2247
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_Hearing_difficulty_f2247-regenie.sbatch
phenoFile=$hearing_pheno_path/080421_UKBB_Hearing_difficulty_f2247_expandedwhite_46237cases_98082ctrl
covarFile=$hearing_pheno_path/080421_UKBB_Hearing_difficulty_f2247_expandedwhite_46237cases_98082ctrl
phenoCol=f2247
covarCol=sex
qCovarCol=age
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

#### Hearing_noise_f2257

In [None]:
lmm_dir_regenie=$lmm_imp_dir_regenie/081021_Hearing_noise_f2257
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_Hearing_noise_f2257-regenie.sbatch
phenoFile=$hearing_pheno_path/080421_UKBB_Hearing_noise_f2257_expandedwhite_66656cases_98082ctrl
covarFile=$hearing_pheno_path/080421_UKBB_Hearing_noise_f2257_expandedwhite_66656cases_98082ctrl
phenoCol=f2257
covarCol=sex
qCovarCol=age
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"

#### Mendelian

In [None]:
lmm_dir_regenie=$lmm_imp_dir_regenie/081021_Mendelian
lmm_sbatch_regenie=$USER_PATH/UKBB_GWAS_dev/output/$(date +"%Y-%m-%d")_Mendelian-regenie.sbatch
phenoFile=$hearing_pheno_path/080421_UKBB_Mendelian_expandedwhite_1520cases_98082ctrl
covarFile=$hearing_pheno_path/080421_UKBB_Mendelian_expandedwhite_1520cases_98082ctrl
phenoCol=mendelian
covarCol=sex
qCovarCol=age
genoFile=`echo $UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb_imp_chr{1..22}_v3.bgen`
sampleFile=$UKBB_PATH/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample

#Use the original bed files for the genotype array on regenie step1
bfile=$UKBB_PATH/genotype_files/pleiotropy_geneticfiles/UKB_expandedwhite_qcgenotypefiles/UKB_expandedwhiteonly_phenotypeindepqc_410905indiv_528206snps_102720.bed

lmm_args="""regenie
    --cwd $lmm_dir_regenie 
    --bfile $bfile 
    --genoFile $genoFile
    --sampleFile $sampleFile
    --phenoFile $phenoFile 
    --formatFile $formatFile_regenie 
    --phenoCol $phenoCol
    --covarCol $covarCol  
    --qCovarCol $qCovarCol
    --bsize $bsize
    --lowmem_dir $lowmem_dir
    --trait $trait 
    --bgenMinMAF $bgenMinMAF 
    --bgenMinINFO $bgenMinINFO
    --maf_filter $maf_filter
    --geno_filter $geno_filter
    --hwe_filter $hwe_filter
    --mind_filter $mind_filter
    --minMAC $minMAC
    --job_size $lmm_job_size
    --ylim $ylim
    --reverse_log_p $reverse_log_p
    --numThreads $numThreads
    --container_lmm $container_lmm
    --container_marp $container_marp
"""

sos run ~/project/bioworkflows/admin/Get_Job_Script.ipynb csg \
    --template-file $tpl_file \
    --workflow-file $lmm_sos \
    --to-script $lmm_sbatch_regenie \
    --args "$lmm_args"