# eMerge Fagerstrom Test For Nicotine Dependence (FTND) GWAS
__Author:__ Jesse Marks

This document logs the steps taken to process the emerge data and perform the FTND GWAS. FTND is a standard instrument for assessing the physical addiction to nicotine. For more information, see [this website](https://cde.drugabuse.gov/instrument/d7c0b0f5-b865-e4de-e040-bb89ad43202b).

The genotype data were imputed on the [Michigan Imputation Server](https://imputationserver.sph.umich.edu/index.html).

* We use the variable `FTNDboth_cat` variable that lumps together the former smokers that have lifetime FTND (N=736) with the current smokers that have current FTND (N=78). This will optimize sample size, especially since the severe category is slim.

## FTNDboth_cat variable description
| cat | Freq |   |
|-----|------|---|
| 0   | 537  |   |
| 1   | 217  |   |
| 2   | 60   |   |

Where FTND conversion is 0=0-3, 1=4-6, and 2=7+

## Convert Phenotype Data from Strata format to .csv file

In [2]:
### R console ###
library(haven)

# convert Stata data into comma separated
setwd('C:/Users/jmarks/Desktop/Projects/Nicotine/eMerge/phenotype/') # local machine
pheno = read_dta("FTND Phenotype.dta")
write.csv(pheno, file = "FTND_pheno.csv")
pheno[1:5,]

dbgap_subject_id,emergeid,px060701_smoking_cigarette_quant,v96,v97,v98,px070301_exposure_smoke_childhoo,v332,v333,px070301_exposure_smoke_adulthoo,...,wstFTND4,curFTND5,wstFTND5,curFTND6,wstFTND6,CurFTND,WstFTND,curFTND_cat,wstFTND_cat,FTNDboth_cat
197114,16214874,,,,,,,,,...,,,,,,,,,,
196041,16214875,,,,,1.0,99.0,99.0,1.0,...,,,,,,,,,,
199794,16214879,,,,,1.0,99.0,99.0,1.0,...,1.0,,0.0,,0.0,,3.0,,0.0,0.0
197593,16214881,,,,,2.0,,,2.0,...,,,,,,,,,,
199622,16214896,,,,,1.0,6.0,18.0,1.0,...,0.0,,0.0,,0.0,,1.0,,0.0,0.0


## Copy phenotype data to EC2

In [None]:
### local machine ###
cd /cygdrive/c/Users/jmarks/Desktop/Projects/Nicotine/eMerge/phenotype
scp -i ~/.ssh/gwas_rsa FTND_pheno.csv ec2-user@35.171.207.199:/shared/s3/emerge/data/phenotype

## Inflate imputation results

In [None]:
### EC2 console ###
cd /shared/s3/emerge/data/genotype/imputed

# inflate chr results
for f in {1..22};do
echo '#!/bin/bash' > chr_$f.sh
echo '' >> chr_$f.sh
echo 'unzip -P "ScSu1byrJL49kO" chr_'$f'.zip' >> chr_$f.sh
done

for chr in {1..22}; do
sh /shared/bioinformatics/software/scripts/qsub_job.sh \
--job_name inflate_chr${chr} \
--script_prefix test/chr${chr}_results \
--mem 5 \
--priority 0 \
--program bash chr_${chr}.sh
done

In [6]:
pheno[1:10,]
length(pheno)
names(pheno)

dbgap_subject_id,emergeid,px060701_smoking_cigarette_quant,v96,v97,v98,px070301_exposure_smoke_childhoo,v332,v333,px070301_exposure_smoke_adulthoo,...,wstFTND4,curFTND5,wstFTND5,curFTND6,wstFTND6,CurFTND,WstFTND,curFTND_cat,wstFTND_cat,FTNDboth_cat
197114,16214874,,,,,,,,,...,,,,,,,,,,
196041,16214875,,,,,1.0,99.0,99.0,1.0,...,,,,,,,,,,
199794,16214879,,,,,1.0,99.0,99.0,1.0,...,1.0,,0.0,,0.0,,3.0,,0.0,0.0
197593,16214881,,,,,2.0,,,2.0,...,,,,,,,,,,
199622,16214896,,,,,1.0,6.0,18.0,1.0,...,0.0,,0.0,,0.0,,1.0,,0.0,0.0
198387,16214899,,,,,,,,1.0,...,,,,,,,,,,
196059,16214926,,,,,,,,1.0,...,,,,,,,,,,
198933,16214927,,,,,,,,,...,,,,,,,,,,
196928,16214948,,,,,1.0,99.0,19.0,1.0,...,1.0,,,,0.0,,,,,
196629,16215019,,0.0,,,,,,,...,,,,,,,,,,


## S3 data transfer

In [None]:
# Copy phenotype data
cd /shared/s3/emerge/data/phenotype
aws s3 cp ./ s3://rti-nd/eMERGE/emerge_ftnd/data/phenotype \
    --recursive --exclude="*" --include="*ped.gz" --quiet &

cd /shared/s3/emerge_ftnd/data/assoc_tests
# Copy association test results
aws s3 cp ./ s3://rti-nd/eMERGE/emerge_ftnd/results/rvtest/ \
    --recursive --exclude="*" --include="*MetaScore*gz*" --quiet &
aws s3 cp ./ s3://rti-nd/eMERGE/emerge_ftnd/results/figures/ \
    --recursive --exclude="*" --include="*.png.gz" --quiet &



ancestry=ea
# copy imputation files
cd /shared/sandbox/emerge_ftnd/genotype/imputed/
aws s3 cp ./ s3://rti-nd/eMERGE/emerge_ftnd/data/genotype/imputed/${ancestry}/ \
    --recursive  --quiet --exclude "*" --include "*dose.vcf.gz" --quiet &
aws s3 cp ./ s3://rti-nd/eMERGE/emerge_ftnd/data/genotype/imputed/${ancestry}/ \
    --recursive --quiet --exclude "*" --include "*dose.vcf.gz.tbi" --quiet &
aws s3 cp ./ s3://rti-nd/eMERGE/emerge_ftnd/data/genotype/imputed/${ancestry}/ \
    --recursive --exclude "*" --include "*.info.gz" --quiet &
aws s3 cp snp_stats/ s3://rti-nd/eMERGE/emerge_ftnd/data/genotype/imputed/${ancestry}/ \
    --recursive --exclude "*" --include "*.txt" --quiet &
aws s3 cp qc_report/ s3://rti-nd/eMERGE/emerge_ftnd/data/genotype/imputed/${ancestry}/ \
    --recursive --exclude "*" --include "*.html" --quiet &
aws s3 cp logs/ s3://rti-nd/eMERGE/emerge_ftnd/data/genotype/imputed/${ancestry}/ \
    --recursive --exclude "*" --include "*.log" --quiet &