# HHS2 Preeclampsia

```
Hi Jesse,
 
I have uploaded all the DNAm and phenotype data to S3 at s3://rti-hhs2-preeclampsia/data/post_qc/. You may need to merge the two phenotype files “s3://rti-hhs2-preeclampsia/data/post_qc/phenotype/phenotype_with_cell_type_estimate_svs.csv” and “s3://rti-hhs2-preeclampsia/data/post_qc/phenotype/phenotype.txt” using the PublicID and the prefix of Sample_ID.
 
Please note that they want to run the EWAS using m-values.  I have created M values separated by chromosome there. Here is the list of analyses we want to run:
 
DNAm (M-values) ~ pss1 + cell type proportions + SVs (1-7) ; using linear regression
CaseControll ~ DNAm (M-values) + cell type proportions + SVs (1-7) ; using logistic regression
 
If time allows, then run
DNAm (M-values) ~ pss1 + cell type proportions + SVs (1-7)  + crace + crace*pss1; using linear regression
CaseControll ~ DNAm (M-values) + cell type proportions + SVs (1-7) + crace + crace*DNAm; using logistic regression
``` 

## Phenotype
See what data looks like.
merge using the PublicID and the prefix of Sample_ID

In [None]:
cd ~/hhs2-preclampsia/
aws s3 cp s3://rti-hhs2-preeclampsia/data/post_qc/phenotype/phenotype.txt .
head phenotype.txt
#CaseControl     crace   TechRep PTB     PE      SB      pOUTCOME_CA     GAwksCA PEgHTN  SGA_alex        GA      pss1    CHOL    TRIG    hsCRP   HDL     iliac_v1       bmi_v1  sbp_v1  dbp_v1  hip_v1  PublicID
#Case    1               0       1       0       1       37      2       3       11      11      169     162     0.37    76      85      20.741998016    114   70       93      16242I
#Case    1               0       1       0       1       37      3       1       9       10      172     75      0.13    77      96      19.77010294     110   74       96.5    05181N

aws s3 cp s3://rti-hhs2-preeclampsia/data/post_qc/phenotype/phenotype_with_cell_type_estimate_svs.csv .
head phenotype_with_cell_type_estimate_svs.csv
#Sample_Name.1,Sample_Name,Sample_Plate,Sample_Well,Basename,Pool_ID,Sentrix_ID,Sentrix_Position,Object_Identifier,Deepwell_Plate,Deepwell_Position,Deepwell_Row,Deepwell_Column,DNA_Source,Extraction_Method,Chip_Location,Population,MTA,SIF_Sex,Experiment_Name,Sample_ID,Sample_Type,filenames,Bcell,CD4T,CD8T,Mono,Neu,NK,PC1,PC2,PC3,PC4,PC5,PC6,PC7
#205707910081_R01C01,WG3006827-DNAA09-P2415-S1181_14960L-1-3544094585,WG0001431-MSA4,A09,205707910081_R01C01,NA,205707910081,R01C01,S1181_14960L-1-3544094585,WG3006827-DNA,A09,A,9,Whole Blood,qiasymphony,205707910081_R01C01,White,Pittsburgh,FALSE,G3516,14960L-1-3544094585,invest,205707910081_R01C01,0.0294848684187368,0.123319636942128,0.102935063193843,0.081028837478977,0.6483524902393,-1.77428424755414e-20,8360.47340131729,2449.94076744983,5840.93421832148,1409.06242614306,-3629.14933347429,70.7285336161499,-121.561348873756




In [None]:
# merge the phenotype data

pheno0 <- read.csv("phenotype.txt", sep="\t")
pheno1 <- read.csv("phenotype_with_cell_type_estimate_svs.csv")

dim(pheno0) # 396  22
dim(pheno1) # 395  36

head(pheno0[["PublicID"]])
#[1] "16242I" "05181N" "06454W" "15773F" "05661W" "02659B"

head(pheno1[["Sample_ID"]])
#[1] "14960L-1-3544094585" "01456U-1-3544094586" "06285T-1-3544094587" "16453S-1-3544094588" "05038S-1-3544094589" "09578M-1-3544094590"

split_id <- strsplit( pheno1[["Sample_ID"]], split="-") # split Sample_ID up by -
pubid <- sapply(split_id,c)[1,] # grab just first part to match with PublicID of pheno0 
pheno1$PublicID <- pubid # add column to pheno1

# merge based off of Public_ID
pheno_merged <- merge(pheno0, pheno1, by.x="PublicID", by.y="PublicID")

dim(pheno_merged) # 391  58

outfile <- "phenotype_merged_with_cell_type_estimate_svs.tsv"
write.table(pheno_merged, file=outfile, quote=FALSE, row.names=FALSE, sep="\t")

In [None]:
#upload to S3
aws s3 cp phenotype_merged_with_cell_type_estimate_svs.tsv s3://rti-hhs2-preeclampsia/data/post_qc/phenotype/

## Methylation Data
See what the data looks like.

In [None]:
aws s3 cp s3://rti-hhs2-preeclampsia/data/post_qc/dnam/mVals_chr22.rda .

### R
load("mVals_chr22.rda")
head(m_chr, 1)
#           206451050062_R01C01 206451050062_R02C01 206451050062_R03C01
#cg10218524          -0.2960794          -0.6274631          -0.1558079

# EWAS



In [None]:
main_dir=/home/ubuntu/hhs2-preeclampsia/ewas/biocloud_gwas_workflows/ewas_association_testing

cd $main_dir
git rev-parse HEAD > git_hash.txt
cd ../../
zip -r biocloud_gwas_workflows/ewas_association_testing/biocloud_gwas_workflows.zip biocloud_gwas_workflows/ewas_association_testing/*

curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" \
    -F "workflowSource=@${main_dir}/main.wdl" \
    -F "workflowInputs=@${main_dir}/inputs.json" \
    -F "workflowDependencies=@${main_dir}/biocloud_gwas_workflows.zip" \
    -F "workflowOptions=@${main_dir}/cannabis_charge_code.json"


#job=9a3bdb8b-6f9e-402f-9f5c-7d99c444a458 # mod1 failed
#job=e49a9157-e6f0-44cd-9695-219d265b9412 # mod1 failed
#job=291bda72-8a79-4247-979d-90c4bc409d8b # mod1 failed
#job=c0499925-bdab-4329-b9ae-3002e587b4b6  # mod1 failed
#job=57fe52ed-6711-450c-97f1-19d562c3f5a7 # mod1
job=2fe6fc3f-76a7-411f-83d9-a800cc0be9c3

curl -X GET "http://localhost:8000/api/workflows/v1/${job}/status" 


In [None]:
pheno_file=phenotype_merged_with_cell_type_estimate_svs.rda
#dnam_file=mVals_chr9_noInf.rda
dnam_file=mVals_chr9.rda
sample_name=Basename
test_var=pss1
output_basename=test22
covariates='Bcell CD4T CD8T Mono Neu NK PC1 PC2 PC3 PC4 PC5 PC6 PC7'
Rscript /opt/ewas.R \
    --phenotype-file ${pheno_file} \
    --dnam ${dnam_file} \
    --sample-name ${sample_name} \
    --test-var ${test_var} \
    --covariates "$covariates" \
    --output ${output_basename} 