## Analysis of type 2 diabetes (T2D)

This notebook applies [various LMM workflows](https://dianacornejo.github.io/pleiotropy_UKB/workflow) to perform association analysis for T2D.

## File paths on Yale cluster

- Genotype files in PLINK format:
`/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv`
- Genotype files in bgen format:
`SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/`
- Summary stats for imputed variants BOLT-LMM:
`/SAY/dbgapstg/scratch/UKBiobank/results/BOLTLMM_results/results_imputed_data`
- Summary stats for inputed variants FastGWA:
`/SAY/dbgapstg/scratch/UKBiobank/results/FastGWA_results/results_imputed_data`
- Phenotype files:
`/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis`
- Relationship file:
`/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/unrelated_n307259/UKB_unrelatedcauc_phenotypes_asthmat2dbmiwaisthip_agesex_waisthipratio_040620`

## File paths to specific phenotypic files for Asthma:

This were the ones used in the analysis prior to the full pipeline implementation

```
phenoFile=~/project/phenotypes_UKB/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_052820.phe
covarFile=~/project/phenotypes_UKB/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_052820_covSEX.txt
qcovarFile=~/project/phenotypes_UKB/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_052820_covAGE.txt
```

In [None]:
%save T2D_fastGWA.sbatch -f

#!/bin/bash
#SBATCH --partition general
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu 1G
#SBATCH --time 3-0:00:00
#SBATCH --job-name sos-submission
#SBATCH --output sos-submission-%J.out
#SBATCH --error sos-submission-%J.log

# Defining bash variables for the different paths,

cwd=~/scratch60/2020_04_fastGWA/T2D
bfile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/pleiotropy_geneticfiles/UKB_Caucasians_phenotypeindepqc120319_updated020720removedwithdrawnindiv.bed
sampleFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
bgenFile=/SAY/dbgapstg/scratch/UKBiobank/genotype_files/ukb39554_imputeddataset/*.bgen
phenoFile=/SAY/dbgapstg/scratch/UKBiobank/phenotype_files/pleiotropy_R01/phenotypesforanalysis/diabetes_casesbyICD10andselfreport_controlswithoutautoiummune_030720
phenoCol=T2D
LDscoresFile=~/software/BOLT-LMM_v2.3.4/tables/LDSCORE.1000G_EUR.tab.gz
geneticMapFile=~/software/BOLT-LMM_v2.3.4/tables/genetic_map_hg19_withX.txt.gz
covarCol=SEX
covarMaxLevels=10
qCovarCol=AGE
numThreads=6
bgenMinMAF=0.001
bgenMinINFO=0.8
job-size=1

#Running the workflow for BMI trait. Here, sex and age where used as covariates

sos run ~/project/pleiotropy_UKB/workflow/LMM.ipynb fastGWA \
    --cwd $cwd \
    --bfile $bfile \
    --sampleFile $sampleFile \
    --bgenFile $bgenFile \
    --phenoFile $phenoFile \
    --phenoCol $phenoCol \
    --covarCol $covarCol \
    --covarMaxLevels $covarMaxLevels \
    --qCovarCol $qCovarCol \
    --numThreads $numThreads \
    --bgenMinMAF $bgenMinMAF \
    --bgenMinINFO $bgenMinINFO \
    --job-size $job-size \
    -c ~/project/pleiotropy_UKB/farnam.yml -q farnam -J 100 \
    -s build &> 052820-sos-T2D-fastGWA.log