# UUI GWAS

Some collaborators reached out to Megan Carnes to ask if she wanted to include a previously published Urgency Urinary Incontinence (UUI) GWAS in a big analysis they are doing. All they need are the MAGMA results. Unfortunately, we cannot find the results of the previously published GWAS -- the analyses were performed on the old MIDAS computing platform  at RTI International, but it seems they were deleted. So, we need to rerun these analyses. Basically, just replicate the results from the 2015 paper, [Genetic Contributions to Urgency Urinary Incontinence in Women](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4439377/), then run MAGMA, and finally pass on these results to the collaborators.


**Data locations**: `s3://rti-common/dbGaP/phs000315_whi_garnet/`<br>
**charge code**: 0160470.000.044 (Grier Page Fellows Fund)<br>
**dbGaP**: [WHI GARNET](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000315.v8.p3&phv=173865&phd=&pha=&pht=2982&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1)



## Data description

### Imputed Genotype data
* ChildStudyConsentSet_phs000746.WHI.v3.p3.c1.HMB-IRB/Genotype/

For now we are going to focus on the observed genotype data because we need to process the observed genotypes in order to perform the PCA to incorporate the genotype PCs as covariates in the GWAS model. These were imputed with an older imputation panel I believe. 

### Observed Genotype
* ChildStudyConsentSet_phs000315.WHI.v8.p3.c1.HMB-IRB/GenotypeFiles/
* ChildStudyConsentSet_phs000315.WHI.v8.p3.c2.HMB-IRB/GenotypeFiles/

Consent group 1 and 2. We need to merge these and run them through the genotype array QC workflow.

# Phenotype

## Create a map file
We only want the GARNET subset (phs000315) of the data. So we will create a mapping file and filter the phenotype files down with it.

In [None]:
cd PhenotypeFiles/dbGaP-30943/
mkdir processing/

tail -n +11 phs000200.v12.pht001032.v9.p3.WHI_Sample.MULTI.txt  | cut -f1-5,8 > processing/all_subjectid.txt
wc -l processing/all_subjectid.txt # 118701

cd processing/

head -1 all_subjectid.txt | cut -f4-5 > phs000315_subject_sampleid_map.txt
awk -F "\t" '$6=="phs000315.v8.p3" {print $4,$5}' OFS="\t" all_subjectid.txt  >> \
    phs000315_subject_sampleid_map.txt

# total number of samples in this map file
wc -l phs000315_subject_sampleid_map.txt #4984 

# number of positive controls in the map file
awk 'control = substr($1, 0, 2)  {if (control == "NA" ){print control}}' \
    phs000315_subject_sampleid_map.txt | wc -l  # 54
# 4,929 + 54 = 4,983 (plus a header making it 4,984)

# remove positive controls
awk 'control = substr($1, 0, 2)  {if (control != "NA" ){print $0}}' \
    phs000315_subject_sampleid_map.txt > phs000315_subject_sampleid_map_no_positive_controls.txt


##  example map. notice that SAMPLE_ID matches second column of genotype FAM file. 
head -1 phs000315_subject_sampleid_map.txt ;\
    grep 122129 phs000315_subject_sampleid_map_no_positive_controls.txt
# SUBJID  SAMPLE_ID
# 753703 122129
zcat GARNET_WHI_TOP_sample_level_c2.fam.gz  | head -1 # genotype FAM file
# 111106895 122129 0 0 0 -9



## Investigate duplicates
It appears that the following have duplicate subjects:
* phs000200.v12.pht001005.v6.p3.c1.f37_rel1.HMB-IRB.txt # incont+frqincon+cghincon c1
* phs000200.v12.pht001005.v6.p3.c2.f37_rel1.HMB-IRB-NPU.txt # incont+frqincon+cghincon c2
* phs000200.v12.pht001019.v6.p3.c1.f80_rel1.HMB-IRB.txt # bmix+bmicx c1
* phs000200.v12.pht001019.v6.p3.c2.f80_rel1.HMB-IRB-NPU.txt # bmix+bmicx c2

<br>

It was decided to keep the duplicate subjects and decide what to do with them later.

In [None]:
# see if any of the phenotype files contain duplicates (multiple samples from same subject)
for file in phs000*t; do
    echo $file
    tail -n +12  $file  | cut -f2 | wc -l # col2 is SUBJID
    tail -n +12  $file  | cut -f2  | sort -u | wc -l 
done

phs000200.v12.pht000998.v6.p3.c1.f2_rel1.HMB-IRB.txt # age+race c1
  117675
  117675
phs000200.v12.pht000998.v6.p3.c2.f2_rel1.HMB-IRB-NPU.txt # age+race c1
   25538
   25538
phs000200.v12.pht001000.v7.p3.c1.f31_rel1.HMB-IRB.txt # parity c1
  117609
  117609
phs000200.v12.pht001000.v7.p3.c2.f31_rel1.HMB-IRB-NPU.txt # parity c2 
   25518
   25518
phs000200.v12.pht001005.v6.p3.c1.f37_rel1.HMB-IRB.txt # incont+frqincon+cghincon c1
  168638
  117671
phs000200.v12.pht001005.v6.p3.c2.f37_rel1.HMB-IRB-NPU.txt # incont+frqincon+cghincon c2
   26050
   25533
phs000200.v12.pht001019.v6.p3.c1.f80_rel1.HMB-IRB.txt # bmix+bmicx c1
  551936
  117675
phs000200.v12.pht001019.v6.p3.c2.f80_rel1.HMB-IRB-NPU.txt # bmix+bmicx c2
   63629
   25536
phs000200.v12.pht001032.v9.p3.WHI_Sample.MULTI.txt # study (all consent groups and substudies)
  118700
   59416
phs000200.v12.pht001514.v6.p3.c1.f134_rel1.HMB-IRB.txt # parkinsons+diabetes c1
  109964
  109964
phs000200.v12.pht001514.v6.p3.c2.f134_rel1.HMB-IRB-NPU.txt # parkinsons+diabetes c2
     621
     621

## Filter and merge phenotype files

| Data File Name* | Variable to keep | Variable Description                        | Note                                                     |
|-----------------|------------------|---------------------------------------------|----------------------------------------------------------|
| pht001032       | STUDY            | DbGaP top-level study or substudy accession | Filter to Value = E (GARNET STUDY phs000315)             |
| All             | SUBJID           | WHI dbGaP Subject ID                        | Used for file merges and should match genotype files     |
| pht000998       | AGE              | Age at screening                            |                                                          |
| pht001000       | PARITY           | Number of Term Pregnancies                  |                                                          |
| pht001019       | BMIX             | BMI                                         |                                                          |
| pht001019       | BMICX            | BMI Categorical                             |                                                          |
| pht000998       | RACE             | Racial or ethnic group                      |                                                          |
| pht001514       | F134PARKINS      | Parkinsons disease ever                     | We will drop where = 1 (yes). Expect low number (20-ish) |
| pht001005       | INCONT           | Ever leaked urine                           | Used to define case/control status                       |
| pht001005       | TOINCON          | Leak when can't get to toilet               | Used to define cases stats                               |
| pht001005       | FRQINCON         | How often leaked urine                      | Used to define case status                               |
| pht001005       | LEAKAMT          | How much urine do you lose                  | Used to define case status                               |
| pht000998       | DIAB             | Diabetes ever                               |                                                          |

**Cases (pht001005)**
* INCONT (ever leak) = Yes (1)
* TOINCON 
* FRQINCON (frequency) = 3,4, or 5 -> more than once a month
* LEAKAMT

**Controls**
* INCONT (ever leak) = Yes (0)

<br>

We will use the [Plink](https://www.cog-genomics.org/plink/1.9/formats#fam) standard for coding case/controls.<br>
Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)




In [226]:
setwd("~/projects/uui/PhenotypeFiles/dbGaP-30943/")
list.files()

In [227]:
# load all data
subject_sampleids <- read.delim("processing/phs000315_subject_sampleid_map_no_positive_controls.txt",
                              header = T, 
                              sep = "\t")

head(subject_sampleids) 
length(subject_sampleids$SUBJID)


age_race_c1 <- read.delim("phs000200.v12.pht000998.v6.p3.c1.f2_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)
age_race_c2 <- read.delim("phs000200.v12.pht000998.v6.p3.c2.f2_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)
parity_c1 <- read.delim("phs000200.v12.pht001000.v7.p3.c1.f31_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)
parity_c2 <- read.delim("phs000200.v12.pht001000.v7.p3.c2.f31_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

bmi_c1 <- read.delim("phs000200.v12.pht001019.v6.p3.c1.f80_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

bmi_c2 <- read.delim("phs000200.v12.pht001019.v6.p3.c2.f80_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

parkinsons_c1 <- read.delim("phs000200.v12.pht001514.v6.p3.c1.f134_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

parkinsons_c2 <- read.delim("phs000200.v12.pht001514.v6.p3.c2.f134_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

diabetes_c1 <- read.delim("phs000200.v12.pht000998.v6.p3.c1.f2_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

diabetes_c2 <- read.delim("phs000200.v12.pht000998.v6.p3.c2.f2_rel1.HMB-IRB-NPU.txt",,
                              header = T, 
                              sep = "\t",
                              skip = 10)

case_control_c1 <- read.delim("phs000200.v12.pht001005.v6.p3.c1.f37_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

case_control_c2 <- read.delim("phs000200.v12.pht001005.v6.p3.c2.f37_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "\t",
                              skip = 10)

Unnamed: 0_level_0,SUBJID,SAMPLE_ID
Unnamed: 0_level_1,<int>,<int>
1,729534,100034
2,716669,100046
3,719273,100134
4,857580,100146
5,777019,100155
6,725283,100210


## AGE & RACE

In [219]:
length(age_race_c1$SUBJID)
length(age_race_c2$SUBJID)

head(age_race_c1)

age_race_c1_matches <- which(age_race_c1$SUBJID %in% subject_sampleids$SUBJID)
age_race_c1_matches_df <- age_race_c1[age_race_c1_matches, c("SUBJID","AGE", "RACE")]


age_race_c2_matches <- which(age_race_c2$SUBJID %in% subject_sampleids$SUBJID)
age_race_c2_matches_df <- age_race_c2[age_race_c2_matches, c("SUBJID","AGE", "RACE")]

age_race_c1c2_merged <- rbind(age_race_c1_matches_df, age_race_c2_matches_df)
head(age_race_c1c2_merged)
length(age_race_c1c2_merged$SUBJID)

outfile <-  "processing/age_race_c1c2_merged.csv"
write.csv(age_race_c1c2_merged, outfile, quote=F, row.names=F)

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F2DAYS,AGE,AREA3Y,OTHSTDY,EXSTDY,BRCA_F2,COLON_F2,COLON10Y,⋯,AVAILDM,INTHRT,AVAILHRT,TALKDOC,HRTINFDR,HELPFILL,AGER,HORMSTAT,AGEHYST,DIABTRT
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,220079,700001,-41,74,1,0,,0,0,,⋯,,0,,,,0,3,1,2.0,0
2,221745,700003,-55,59,1,0,,0,0,,⋯,1.0,0,,,,0,1,1,2.0,0
3,215143,700004,-8,56,1,0,,0,0,,⋯,1.0,0,,,,0,1,1,,0
4,214904,700005,-241,64,1,0,,0,0,,⋯,,1,1.0,2.0,1.0,0,2,1,2.0,0
5,220352,700006,-44,58,1,0,,0,0,,⋯,,0,,,,0,1,0,1.0,0
6,216549,700007,-82,74,1,0,,0,0,,⋯,1.0,0,,,,0,3,2,2.0,0


Unnamed: 0_level_0,SUBJID,AGE,RACE
Unnamed: 0_level_1,<int>,<int>,<int>
31,700032,57,4
75,700078,59,3
88,700091,63,4
102,700106,52,3
115,700122,57,4
194,700216,60,3


## Parity

In [220]:
length(parity_c1$SUBJID)
length(parity_c2$SUBJID)
head(parity_c1)

parity_c1_matches <- which(parity_c1$SUBJID %in% subject_sampleids$SUBJID)
parity_c1_matches_df <- parity_c1[parity_c1_matches, c("SUBJID", "PARITY")]
length(parity_c1_matches)

parity_c2_matches <- which(parity_c2$SUBJID %in% subject_sampleids$SUBJID)
parity_c2_matches_df <- parity_c2[parity_c2_matches, c("SUBJID","PARITY")]
length(parity_c2_matches)

parity_c1c2_merged <- rbind(parity_c1_matches_df, parity_c2_matches_df)
head(parity_c1c2_merged)

outfile <-  "processing/parity_c1c2_merged.csv"
write.csv(parity_c1c2_merged, outfile, quote=F, row.names=F)

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F31DAYS,MENARCHE,MENSREG,MENSREGA,MENOPSEA,MENSWO1Y,MENSWOD,ANYMENSA,⋯,BRSTREMO,GRAVID,PARITY,FULLTRMR,NUMLIVER,AGEFBIR,BOOPH,BRSTFDMO,BRSTDIS,MENO
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,220079,700001,-30,5,2,8.0,47.0,0,,47,⋯,,3,5,1,3,2,0.0,3.0,0,47
2,221745,700003,-7,4,1,,47.0,0,,47,⋯,,2,1,1,1,2,,,0,48
3,215143,700004,-2,5,1,5.0,,0,,52,⋯,,2,2,1,2,2,0.0,0.0,2,52
4,214904,700005,-16,7,2,8.0,43.0,0,,43,⋯,,2,1,1,1,3,1.0,0.0,0,42
5,220352,700006,-22,4,1,,38.0,0,,38,⋯,,2,2,1,2,2,0.0,0.0,0,50
6,216549,700007,-13,5,1,5.0,42.0,0,,42,⋯,,2,0,0,0,0,0.0,0.0,0,42


Unnamed: 0_level_0,SUBJID,PARITY
Unnamed: 0_level_1,<int>,<int>
31,700032,
75,700078,5.0
88,700091,5.0
102,700106,1.0
115,700122,1.0
194,700216,5.0


## BMI

In [225]:
length(bmi_c1$SUBJID)
length(bmi_c2$SUBJID)
head(bmi_c1)

bmi_c1_matches <- which(bmi_c1$SUBJID %in% subject_sampleids$SUBJID)
bmi_c1_matches_df <- bmi_c1[bmi_c1_matches, c("SUBJID", "BMIX", "BMICX", "F80VTYP", "F80VY", "F80DAYS")]
length(bmi_c1_matches)

bmi_c2_matches <- which(bmi_c2$SUBJID %in% subject_sampleids$SUBJID)
bmi_c2_matches_df <- bmi_c2[bmi_c2_matches, c("SUBJID", "BMIX", "BMICX", "F80VTYP", "F80VY", "F80DAYS")]
length(bmi_c2_matches)

bmi_c1c2_merged <- rbind(bmi_c1_matches_df, bmi_c2_matches_df)
head(bmi_c1c2_merged)

outfile <-  "processing/bmi_c1c2_merged.csv"
write.csv(bmi_c1c2_merged, outfile, quote=F, row.names=F)

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F80VTYP,F80VY,F80DAYS,PULSE30,SYSTBP1,DIASBP1,SYSTBP2,DIASBP2,⋯,WAISTX,HIPX,WHEXPECT,SYST,SYSTOL,DIAS,DIASTOL,BMIX,BMICX,WHRX
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<dbl>,<dbl>,<int>,<dbl>,<int>,<dbl>,<int>,<dbl>,<int>,<dbl>
1,220079,700001,1,0,-30,38,130,,130,66,⋯,92.5,102.5,1,130,2,66,1,33.17012,4,0.90244
2,220079,700001,3,3,1001,36,138,86.0,140,86,⋯,91.0,103.5,1,139,2,86,1,31.63451,4,0.87923
3,221745,700003,1,0,-7,32,136,90.0,136,94,⋯,71.0,100.0,1,136,2,92,2,22.94974,2,0.71
4,215143,700004,1,0,-2,30,120,80.0,122,78,⋯,72.0,99.0,1,121,2,79,1,21.29872,2,0.72727
5,215143,700004,3,1,354,30,110,70.0,104,70,⋯,71.0,97.0,1,107,1,70,1,22.00399,2,0.73196
6,215143,700004,3,2,721,30,126,76.0,118,74,⋯,69.0,94.0,0,122,2,75,1,20.02002,2,0.73404


Unnamed: 0_level_0,SUBJID,BMIX,BMICX,F80VTYP,F80VY,F80DAYS
Unnamed: 0_level_1,<int>,<dbl>,<int>,<int>,<int>,<int>
140,700032,32.55208,4,1,0,-40
141,700032,30.89018,4,3,1,393
142,700032,32.03896,4,3,2,720
143,700032,31.8659,4,3,3,1093
144,700032,31.86117,4,3,4,1471
145,700032,31.6459,4,3,5,1835


## Parkinsons

In [246]:
length(parkinsons_c1$SUBJID)
length(parkinsons_c2$SUBJID)
head(parkinsons_c1)

parkinsons_c1_matches <- which(parkinsons_c1$SUBJID %in% subject_sampleids$SUBJID)
parkinsons_c1_matches_df <- parkinsons_c1[parkinsons_c1_matches, c("SUBJID","F134PARKINS")]
parkinsons_c1_matches_df_all_col <- parkinsons_c1[parkinsons_c1_matches, ]
length(parkinsons_c1_matches)

parkinsons_c2_matches <- which(parkinsons_c2$SUBJID %in% subject_sampleids$SUBJID)
parkinsons_c2_matches_df <- parkinsons_c2[parkinsons_c2_matches, c("SUBJID","F134PARKINS")]
parkinsons_c2_matches_df_all_col <- parkinsons_c2[parkinsons_c2_matches, ]
length(parkinsons_c2_matches)

parkinsons_c1c2_merged <- rbind(parkinsons_c1_matches_df, parkinsons_c2_matches_df)
parkinsons_c1c2_merged_all_col <- rbind(parkinsons_c1_matches_df_all_col, parkinsons_c2_matches_df_all_col)
head(parkinsons_c1c2_merged)


outfile <-  "processing/parkinsons_c1c2_merged.csv"
outfile2 <-  "processing/parkinsons_c1c2_merged_all_col.csv"
write.csv(parkinsons_c1c2_merged, outfile, quote=F, row.names=F)
write.csv(parkinsons_c1c2_merged_all_col, outfile2, quote=F, row.names=F)

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F134VTYP,F134VY,F134DAYS,F134WHOM,F134PARKINS,F134DIAB
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,220079,700001,3,7,2518,1,0,0
2,221745,700003,3,9,3263,1,0,0
3,215143,700004,3,10,3576,1,0,0
4,214904,700005,3,8,2839,1,0,0
5,220352,700006,3,8,2958,1,0,0
6,222081,700008,3,10,3688,1,0,0


Unnamed: 0_level_0,SUBJID,F134PARKINS
Unnamed: 0_level_1,<int>,<int>
30,700032,0
70,700078,0
94,700106,0
107,700122,0
179,700216,0
184,700222,0


## Diabetes

In [245]:
length(diabetes_c1$SUBJID)
length(diabetes_c2$SUBJID)
head(diabetes_c1)

diab_c1_matches <- which(diabetes_c1$SUBJID %in% subject_sampleids$SUBJID)
diab_c1_matches_df <- diabetes_c1[diab_c1_matches, c("SUBJID","DIAB")]
diab_c1_matches_df_all_col <- diabetes_c1[diab_c1_matches, ]
length(diab_c1_matches)

diab_c2_matches <- which(diabetes_c2$SUBJID %in% subject_sampleids$SUBJID)
diab_c2_matches_df <- diabetes_c2[diab_c2_matches, c("SUBJID","DIAB")]
diab_c2_matches_df_all_col <- diabetes_c2[diab_c2_matches, ]
length(diab_c2_matches)

diab_c1c2_merged <- rbind(diab_c1_matches_df, diab_c2_matches_df)
diab_c1c2_merged_all_col <- rbind(diab_c1_matches_df_all_col, diab_c2_matches_df_all_col)
head(diab_c1c2_merged)


outfile <-  "processing/diabetes_c1c2_merged.csv"
outfile2 <-  "processing/diabetes_c1c2_merged_all_col.csv"
write.csv(diab_c1c2_merged, outfile, quote=F, row.names=F)
write.csv(diab_c1c2_merged_all_col, outfile2, quote=F, row.names=F)

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F2DAYS,AGE,AREA3Y,OTHSTDY,EXSTDY,BRCA_F2,COLON_F2,COLON10Y,⋯,AVAILDM,INTHRT,AVAILHRT,TALKDOC,HRTINFDR,HELPFILL,AGER,HORMSTAT,AGEHYST,DIABTRT
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,220079,700001,-41,74,1,0,,0,0,,⋯,,0,,,,0,3,1,2.0,0
2,221745,700003,-55,59,1,0,,0,0,,⋯,1.0,0,,,,0,1,1,2.0,0
3,215143,700004,-8,56,1,0,,0,0,,⋯,1.0,0,,,,0,1,1,,0
4,214904,700005,-241,64,1,0,,0,0,,⋯,,1,1.0,2.0,1.0,0,2,1,2.0,0
5,220352,700006,-44,58,1,0,,0,0,,⋯,,0,,,,0,1,0,1.0,0
6,216549,700007,-82,74,1,0,,0,0,,⋯,1.0,0,,,,0,3,2,2.0,0


Unnamed: 0_level_0,SUBJID,DIAB
Unnamed: 0_level_1,<int>,<int>
31,700032,0
75,700078,0
88,700091,0
102,700106,0
115,700122,0
194,700216,0


## Case Control UUI
* INCONT
* TOINCON
* FRQINCON
* LEAKAMT

In [244]:
length(case_control_c1$SUBJID)
length(case_control_c2$SUBJID)
head(case_control_c1)

case_control_c1_matches <- which(case_control_c1$SUBJID %in% subject_sampleids$SUBJID)
case_control_c1_matches_df <- case_control_c1[case_control_c1_matches, c("SUBJID", "INCONT", "TOINCON", "FRQINCON", "LEAKAMT")]
case_control_c1_matches_df_all_col <- case_control_c1[case_control_c1_matches, ]

length(case_control_c1_matches_df$SUBJID)
length(case_control_c1_matches_df$SUBJID)

case_control_c2_matches <- which(case_control_c2$SUBJID %in% subject_sampleids$SUBJID)
case_control_c2_matches_df <- case_control_c2[case_control_c2_matches, c("SUBJID", "INCONT", "TOINCON", "FRQINCON", "LEAKAMT")]
case_control_c2_matches_df_all_col <- case_control_c2[case_control_c2_matches, ]
length(case_control_c2_matches_df$SUBJID)

case_control_c1c2_merged <- rbind(case_control_c1_matches_df, case_control_c2_matches_df)
case_control_c1c2_merged_all_col <- rbind(case_control_c1_matches_df_all_col, case_control_c2_matches_df_all_col)
head(case_control_c1c2_merged)
length(case_control_c1c2_merged$SUBJID)

outfile <-  "processing/case_control_c1c2_merged.csv"
outfile2 <-  "processing/case_control_c1c2_merged_all_col.csv"
write.csv(case_control_c1c2_merged, outfile, quote=F, row.names=F)
write.csv(case_control_c1c2_merged_all_col, outfile2, quote=F, row.names=F)

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F37VTYP,F37VY,F37DAYS,LISTEN,GOODADVC,TAKEDR,GOODTIME,HLPPROB,⋯,OPTIMISM,PAIN,PHYLIMIT,PHYSFUN,PSHTDEP,SLPDSTRB,SOCFUNC,SOCSTRN,SOCSUPP,SYMPTOM
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<dbl>,<int>,<int>,<dbl>
1,220079,700001,1,0,-30,4,4,5,4,5,⋯,,50.0,75,70,0.00173,11.0,100.0,10,41,
2,221745,700003,1,0,-7,4,4,4,4,4,⋯,20.0,100.0,100,100,0.00144,1.0,100.0,9,34,0.08824
3,215143,700004,1,0,-2,5,4,3,3,3,⋯,22.0,75.0,100,80,0.00132,9.0,100.0,10,30,0.38235
4,215143,700004,2,9,3102,3,3,2,3,3,⋯,19.0,87.5,100,95,0.00173,7.0,100.0,6,26,0.26471
5,214904,700005,1,0,-16,3,3,1,3,1,⋯,20.0,75.0,100,95,0.00084,,87.5,19,18,0.64706
6,214904,700005,2,7,2465,3,2,1,3,2,⋯,25.0,37.5,0,50,0.00144,4.0,100.0,9,19,


Unnamed: 0_level_0,SUBJID,INCONT,TOINCON,FRQINCON,LEAKAMT
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>
45,700032,0,,,
46,700032,0,,,
115,700078,1,1.0,2.0,2.0
116,700078,1,1.0,5.0,3.0
135,700091,0,,,
136,700091,1,1.0,4.0,2.0


## Merge all dataframes

In [224]:
#cbind(age_race_c1c2_merged, case_control_c1c2_merged)
#merge(age_race_c1c2_merged, case_control_c1c2_merged, by = "SUBJID")
#case_control_c1c2_merged
#parkinsons_diab_c1c2_merged
#bmi_c1c2_merged
#parity_c1c2_merged
#age_race_c1c2_merged

# Genotype QC

Update to build 37 from build 36.

See the head of the file showing it is in build 36. We also spot checked several variants. We don't know if these data are 
```
head HumanOmni1-Quad_v1-0_B.csv

Illumina, Inc.,,,,,,,,,,,,,,,,,,,
[Heading],,,,,,,,,,,,,,,,,,,,
Descriptor File Name,HumanOmni1-Quad_v1-0_B.bpm,,,,,,,,,,,,,,,,,,,
Assay Format,Infinium HD Super,,,,,,,,,,,,,,,,,,,
Date Manufactured,6/15/2009,,,,,,,,,,,,,,,,,,,
Loci Count ,1140419,,,,,,,,,,,,,,,,,,,
[Assay],,,,,,,,,,,,,,,,,,,,
IlmnID,Name,IlmnStrand,SNP,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,AlleleB_ProbeSeq,GenomeBuild,Chr,MapInfo,Ploidy,Species,Source,SourceVersion,SourceStrand,SourceSeq,TopGenomicSeq,BeadSetID,Exp_Clusters,Intensity_Only
200006-0_T_R_1526882018,200006,TOP,[A/G],60702346,AGACTGTGGATGAATAATGCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA,,,36,9,139046223,diploid,Homo sapiens,Unknown,0,BOT,ACATGCCCCACTCAGCGCCACCCCCGTCCTCCCCTCCCAGGTTGCCTAGCTGTCCCCAGC[T/C]TGGGCCTCCCCGAGGGCCAGACACTCACCAGCATTATTCATCCACAGTCTCCCAGGATCA,TGATCCTGGGAGACTGTGGATGAATAATGCTGGTGAGTGTCTGGCCCTCGGGGAGGCCCA[A/G]GCTGGGGACAGCTAGGCAACCTGGGAGGGGAGGACGGGGGTGGCGCTGAGTGGGGCATGT,163,3,0
```


Information about liftover on PLINK files.
https://www.biostars.org/p/252938/

In [None]:
mkdir -p ~/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/{c1,c2}

# download genotype data and array info
cd ~/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c1
aws s3 cp s3://rti-common/dbGaP/phs000315_whi_garnet/PhenoGenotypeFiles/ChildStudyConsentSet_phs000315.WHI.v8.p3.c1.HMB-IRB/GenotypeFiles/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1.HMB-IRB.tar .
aws s3 cp s3://rti-common/dbGaP/phs000315_whi_garnet/PhenoGenotypeFiles/ChildStudyConsentSet_phs000315.WHI.v8.p3.c1.HMB-IRB/GenotypeFiles/phg000139.v1.GARNET_WHI.marker-info.MULTI.tar .

cd ~/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c2
aws s3 cp s3://rti-common/dbGaP/phs000315_whi_garnet/PhenoGenotypeFiles/ChildStudyConsentSet_phs000315.WHI.v8.p3.c2.HMB-IRB-NPU/GenotypeFiles/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2.HMB-IRB-NPU.tar .
aws s3 cp s3://rti-common/dbGaP/phs000315_whi_garnet/PhenoGenotypeFiles/ChildStudyConsentSet_phs000315.WHI.v8.p3.c2.HMB-IRB-NPU/GenotypeFiles/phg000139.v1.GARNET_WHI.marker-info.MULTI.tar .

# get SNP intersection see the following:
# s3://rti-shared/shared_data/pre_qc/uhs1234/genotype/array/observed/0001/20210331_uhs1234_pre_qc_preparation.html


# unarchive
cd ../c1
tar -xvf phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1.HMB-IRB.tar
tar -xvf phg000139.v1.GARNET_WHI.marker-info.MULTI.tar

cd ../c2
tar -xvf phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2.HMB-IRB-NPU.tar
tar -xvf phg000139.v1.GARNET_WHI.marker-info.MULTI.tar

# get genotype array chip info
cd ../
ls c1/phg000139.v1.GARNET_WHI.marker-info.MULTI #HumanOmni1-Quad_v1-0_B.csv  README_SNP-info.txt
ls c2/phg000139.v1.GARNET_WHI.marker-info.MULTI #HumanOmni1-Quad_v1-0_B.csv  README_SNP-info.txt

# download the flip file from 
cd ~/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/
wget https://www.well.ox.ac.uk/~wrayner/strand/HumanOmni1-Quad_v1-0_B-b36-strand.zip
unzip HumanOmni1-Quad_v1-0_B-b36-strand.zip

## download strand updating tool
wget https://www.well.ox.ac.uk/~wrayner/strand/update_build.sh

## LiftOver

### consent group1

In [None]:
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/

# convert to PED/MAP format
mkdir liftover/
docker run -v $PWD:/data/ -it rtibiocloud/plink:v1.9_178bb91 plink \
    --bfile /data/GARNET_WHI_TOP_sample_level_c1 \
    --recode \
    --out /data/liftover/garnet_whi_c1

# download database and script
#wget https://raw.githubusercontent.com/Shicheng-Guo/Gscutility/master/ibdqc.pl
# apply liftOverPlink.py to update hg18 to hg19 or hg38
mkdir liftOver
python ~/bin/liftover/liftOverPlink.py \
    -m liftOver/garnet_whi_c1.map \
    -p liftOver/garnet_whi_c1.ped \
    -o liftOver/garnet_whi_c1_hg19 \
    -c ~/bin/liftover/hg18ToHg19.over.chain.gz \
    -e ~/bin/liftover/liftOver

#Converting MAP file to UCSC BED file...
#SUCC:  map->bed succ
#Lifting BED file...
#Reading liftover chains
#Mapping coordinates
#SUCC:  liftBed succ
#Converting lifted BED file back to MAP...
#SUCC:  bed->map succ
#Updating PED file...
#jSUCC:  liftPed succ
#cleaning up BED files...

# convert back to bed/bim/fam
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver
docker run -v $PWD:/data/ -it rtibiocloud/plink:v1.9_178bb91 plink \
    --file /data/garnet_whi_c1_hg19 \
    --make-bed \
    --out /data/garnet_whi_c1_hg19

### consent group2

In [None]:
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/
gunzip GARNET*

# convert bed/bim/fam to PED/MAP format
mkdir liftover/
docker run -v $PWD:/data/ -it rtibiocloud/plink:v1.9_178bb91 plink \
    --bfile /data/GARNET_WHI_TOP_sample_level_c2 \
    --recode \
    --out /data/liftover/garnet_whi_c2

# download database and script
#wget https://raw.githubusercontent.com/Shicheng-Guo/Gscutility/master/ibdqc.pl
# apply liftOverPlink.py to update hg18 to hg19 or hg38
python ~/bin/liftover/liftOverPlink.py \
    -m liftover/garnet_whi_c2.map \
    -p liftover/garnet_whi_c2.ped \
    -o liftover/garnet_whi_c2_hg19 \
    -c ~/bin/liftover/hg18ToHg19.over.chain.gz \
    -e ~/bin/liftover/liftOver

#Converting MAP file to UCSC BED file...
#SUCC:  map->bed succ
#Lifting BED file...
#Reading liftover chains
#Mapping coordinates
#SUCC:  liftBed succ
#Converting lifted BED file back to MAP...
#SUCC:  bed->map succ
#Updating PED file...
#jSUCC:  liftPed succ
#cleaning up BED files...

# convert back to bed/bim/fam
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover
docker run -v $PWD:/data/ -it rtibiocloud/plink:v1.9_178bb91 plink \
    --file /data/garnet_whi_c2_hg19 \
    --make-bed \
    --out /data/garnet_whi_c2_hg19

## Strand testing
Need to convert to forward strand. We don't know exactly which strand orientation it is currently because it is not explicitly stated. Could be in TOP, Illum, or Source. We will test these three different flips and align with a reference file to see which creates the best alignment. 

In [None]:
# download these files
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/
wget https://www.well.ox.ac.uk/~wrayner/strand/HumanOmni1-Quad_v1-0_B-b37-strand.zip
wget https://www.well.ox.ac.uk/~wrayner/strand/sourceStrand/HumanOmni1-Quad_v1-0_B-b37.Source.strand.zip
wget https://www.well.ox.ac.uk/~wrayner/strand/ilmnStrand/HumanOmni1-Quad_v1-0_B-b37.Ilmn.strand.zip
#s3://rti-common/chip_info/HumanOmni1-Quad_v1-0_B/
gunzip *zip

wget https://www.well.ox.ac.uk/~wrayner/strand/update_build.sh

# use python code from:
#https://github.com/RTIInternational/biocloud_docker_tools/blob/master/check_strand/v1/check_strand.py

# download reference legend file
aws s3 cp s3://rti-common/ref_panels/1000G/2014.10/legend_with_chr/1000GP_Phase3_chr1.legend.gz .


In [None]:
# apply flip with different stand orientations to discover which orientation our data is in
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/

# interactive session
docker run -it -v $PWD:/data/ rtibiocloud/plink:v1.9_178bb91 bash

cd /data/

#Required parameters:
#1. The original bed stem (not including file extension suffix)
#2. The strand file to apply
#3. The new stem for output


## c1
orig=c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19

# TOP strand
strand=HumanOmni1-Quad_v1-0_B-b37.strand
new_stem=c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19_top_strand_flipped
bash update_build.sh $orig $strand $new_stem

# Source strand
strand=HumanOmni1-Quad_v1-0_B-b37.Source.strand
new_stem=c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19_source_strand_flipped
bash update_build.sh $orig $strand $new_stem

# Illumina strand
strand=HumanOmni1-Quad_v1-0_B-b37.Ilmn.strand
new_stem=c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19_ilum_strand_flipped
bash update_build.sh $orig $strand $new_stem



## c2
orig=c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19

# TOP strand
strand=HumanOmni1-Quad_v1-0_B-b37.strand
new_stem=c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19_top_strand_flipped
bash update_build.sh $orig $strand $new_stem

# Source strand
strand=HumanOmni1-Quad_v1-0_B-b37.Source.strand
new_stem=c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19_source_strand_flipped
bash update_build.sh $orig $strand $new_stem

# Illumina strand
strand=HumanOmni1-Quad_v1-0_B-b37.Ilmn.strand
new_stem=c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19_ilum_strand_flipped
bash update_build.sh $orig $strand $new_stem

### consent group1
As you can see from the code below, our results were originally in TOP strand orientation. 
We flip them to forward strand.

In [None]:
# No strand flip
python3 check_strand.py \
    --bim c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#94456 variants in bim on chromosomes in ref
#75132 non-A/T, non-C/G, non-monomorphic variants in common with reference
#37259 plus strand variants
#37873 non-plus strand variants fixed by strand flip


# TOP strand flip
python3 check_strand.py \
    --bim c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19_top_strand_flipped.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#92340 variants in bim on chromosomes in ref
#74994 non-A/T, non-C/G, non-monomorphic variants in common with reference
#74992 plus strand variants
#2 non-plus strand variants fixed by strand flip


# Source strand flip
python3 check_strand.py \
    --bim c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19_source_strand_flipped.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#92340 variants in bim on chromosomes in ref
#74994 non-A/T, non-C/G, non-monomorphic variants in common with reference
#37375 plus strand variants
#37619 non-plus strand variants fixed by strand flip


# Illumina strand flip
python3 check_strand.py \
    --bim c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver/garnet_whi_c1_hg19_ilum_strand_flipped.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#92340 variants in bim on chromosomes in ref
#74994 non-A/T, non-C/G, non-monomorphic variants in common with reference
#38708 plus strand variants
#36286 non-plus strand variants fixed by strand flip


In [None]:
# upload the TOP strand flipped results to S3
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c1/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c1/sample_level_unfiltered_PLINK_set/liftOver
gzip *

for ext in {bed,bim,fam}; do
    aws s3 cp garnet_whi_c1_hg19_top_strand_flipped.$ext.gz \
    s3://rti-shared/shared_data/pre_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/garnet_whi_c1_hg19_forward_strand.$ext.gz
done

### consent group2
As you can see from the code below, our results were originally in TOP strand orientation. 
We flip them to forward strand.

In [None]:
# no stand flip
python3 check_strand.py \
    --bim c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#94456 variants in bim on chromosomes in ref
#74918 non-A/T, non-C/G, non-monomorphic variants in common with reference
#37139 plus strand variants
#37779 non-plus strand variants fixed by strand flip


# TOP strand flip
python3 check_strand.py \
    --bim c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19_top_strand_flipped.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#92340 variants in bim on chromosomes in ref
#74780 non-A/T, non-C/G, non-monomorphic variants in common with reference
#74778 plus strand variants
#2 non-plus strand variants fixed by strand flip


# Source strand flip
python3 check_strand.py \
    --bim c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19_source_strand_flipped.bim \
    --ref 1000GP_Phase3_chr1.legend.gz

#92340 variants in bim on chromosomes in ref
#74780 non-A/T, non-C/G, non-monomorphic variants in common with reference
#37258 plus strand variants
#37522 non-plus strand variants fixed by strand flip


# Illumina strand flip
python3 check_strand.py \
    --bim c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/garnet_whi_c2_hg19_ilum_strand_flipped.bim \
    --ref 1000GP_Phase3_chr1.legend.gz
# 92340 variants in bim on chromosomes in ref
#74780 non-A/T, non-C/G, non-monomorphic variants in common with reference
#38604 plus strand variants
#36176 non-plus strand variants fixed by strand flip

In [None]:
# upload the TOP strand flipped results to S3
cd /home/ec2-user/rti-shared/shared_data/pre_qc/whi_garnet/genotype/array/observed/0001/c2/phg000139.v1.GARNET_WHI.genotype-calls-matrixfmt.c2/sample_level_unfiltered_PLINK_set/liftover/
gzip *

for ext in {bed,bim,fam}; do
    aws s3 cp garnet_whi_c2_hg19_top_strand_flipped.$ext.gz \
    s3://rti-shared/shared_data/pre_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/garnet_whi_c2_hg19_forward_strand.$ext.gz
done


## Submit Workflow

### consent group1

In [None]:
# create directories
mkdir -p ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/{wf_input,wf_output}
mkdir -p ~/bioinformatics/methods/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/

# Set up config file for QC pipeline (use vim to modify)
cd ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/wf_input

# Zip biocloud_gwas_workflows repo
cd ~/
git clone --recursive https://github.com/RTIInternational/biocloud_gwas_workflows
cd biocloud_gwas_workflows/
git rev-parse HEAD > ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/wf_input/git_hash.txt
cd ~/
zip \
    --exclude=*/var/* \
    --exclude=*.git/* \
    --exclude=*/test/* \
    --exclude=*/.idea/* \
    -r ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/wf_input/biocloud_gwas_workflows.zip \
    biocloud_gwas_workflows/


# submit job
# Open session in terminal 1
ssh -i ~/.ssh/cromwell.pem -L localhost:8000:localhost:8000 ec2-user@54.208.171.34

cd ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/wf_output/
# Submit job in terminal 2
curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" \
    -F "workflowSource=@/home/ec2-user/biocloud_gwas_workflows/genotype_array_qc/genotype_array_qc_wf.wdl" \
    -F "workflowInputs=@/home/ec2-user/bioinformatics/methods/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/whi_garnet_c1_genotype_qc.json" \
    -F "workflowDependencies=@/home/ec2-user/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c1/wf_input/biocloud_gwas_workflows.zip" \
    -F "workflowOptions=@/home/ec2-user/bin/cromwell/gpp_fellows_fund_charge_code.json" >> job_id.txt

job=d94c9f96-1e89-4a3e-ac58-79966e4f2921

# check job status in terminal 2
curl -X GET "http://localhost:8000/api/workflows/v1/$job/status"   

# Monitor job in terminal 1
tail -f /tmp/cromwell-server.log



### consent group2

In [None]:
# create directories
mkdir -p ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/{wf_input,wf_output}
mkdir -p ~/bioinformatics/methods/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/

# Set up config file for QC pipeline (use vim to modify)
cd ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/wf_input

# Zip biocloud_gwas_workflows repo
cd ~/
git clone --recursive https://github.com/RTIInternational/biocloud_gwas_workflows
cd biocloud_gwas_workflows/
git rev-parse HEAD > ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/wf_input/git_hash.txt
cd ~/
zip \
    --exclude=*/var/* \
    --exclude=*.git/* \
    --exclude=*/test/* \
    --exclude=*/.idea/* \
    -r ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/wf_input/biocloud_gwas_workflows.zip \
    biocloud_gwas_workflows/


# submit job
# Open session in terminal 1
ssh -i ~/.ssh/cromwell.pem -L localhost:8000:localhost:8000 ec2-user@54.208.171.34

cd ~/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/wf_output/
# Submit job in terminal 2
curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" \
    -F "workflowSource=@/home/ec2-user/biocloud_gwas_workflows/genotype_array_qc/genotype_array_qc_wf.wdl" \
    -F "workflowInputs=@/home/ec2-user/bioinformatics/methods/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/whi_garnet_c1_genotype_qc.json" \
    -F "workflowDependencies=@/home/ec2-user/rti-shared/shared_data/post_qc/phs000315_whi_garnet/genotype/array/observed/0001/c2/wf_input/biocloud_gwas_workflows.zip" \
    -F "workflowOptions=@/home/ec2-user/bin/cromwell/gpp_fellows_fund_charge_code.json" >> job_id.txt

job=

# check job status in terminal 2
curl -X GET "http://localhost:8000/api/workflows/v1/$job/status"   
