# UUI GWAS

Some collaborators reached out to Megan Carnes to ask if she wanted to include a previously published Urgency Urinary Incontinence (UUI) GWAS in a big analysis they are doing. All they need are the MAGMA results. Unfortunately, we cannot find the results of the previously published GWAS -- the analyses were performed on the old MIDAS computing platform  at RTI International, but it seems they were deleted. So, we need to rerun these analyses. Basically, just replicate the results from the 2015 paper, [Genetic Contributions to Urgency Urinary Incontinence in Women](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4439377/), then run MAGMA, and finally pass on these results to the collaborators.


**Data locations**: `s3://rti-common/dbGaP/phs000315_whi_garnet/`<br>
**charge code**: 0160470.000.044 (Grier Page Fellows Fund)<br>
**dbGaP**: [WHI GARNET](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000315.v8.p3&phv=173865&phd=&pha=&pht=2982&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1)



## Data description

### Imputed Genotype data
* ChildStudyConsentSet_phs000746.WHI.v3.p3.c1.HMB-IRB/Genotype/

For now we are going to focus on the observed genotype data because we need to process the observed genotypes in order to perform the PCA to incorporate the genotype PCs as covariates in the GWAS model. These were imputed with an older imputation panel I believe. 

### Observed Genotype
* ChildStudyConsentSet_phs000315.WHI.v8.p3.c1.HMB-IRB/GenotypeFiles/
* ChildStudyConsentSet_phs000315.WHI.v8.p3.c2.HMB-IRB/GenotypeFiles/

Consent group 1 and 2. We need to merge these and run them through the genotype array QC workflow.

# Phenotype

## Create a map file
We only want the GARNET subset (phs000315) of the data. So we will create a mapping file and filter the phenotype files down with it.

In [None]:
cd PhenotypeFiles/dbGaP-30943/
mkdir processing/

tail -n +11 phs000200.v12.pht001032.v9.p3.WHI_Sample.MULTI.txt  | cut -f1-5,8 > processing/all_subjectid.txt
wc -l processing/all_subjectid.txt # 118701

cd processing/

head -1 all_subjectid.txt | cut -f4-5 > phs000315_subject_sampleid_map.txt
awk -F "\t" '$6=="phs000315.v8.p3" {print $4,$5}' all_subjectid.txt  >> \
    phs000315_subject_sampleid_map.txt

wc -l phs000315_subject_sampleid_map.txt #4984 


# example map
zcat GARNET_WHI_TOP_sample_level_c2.fam.gz  | head -1
# 111106895 122129 0 0 0 -9

head -1 phs000315_subject_sampleid_map.txt ;grep 122129 phs000315_subject_sampleid_map.txt
# SUBJID  SAMPLE_ID
# 753703 122129


## Filter and merge phenotype files

| Data File Name* | Variable to keep | Variable Description                        | Note                                                     |
|-----------------|------------------|---------------------------------------------|----------------------------------------------------------|
| pht001032       | STUDY            | DbGaP top-level study or substudy accession | Filter to Value = E (GARNET STUDY phs000315)             |
| All             | SUBJID           | WHI dbGaP Subject ID                        | Used for file merges and should match genotype files     |
| pht000998       | AGE              | Age at screening                            |                                                          |
| pht001000       | PARITY           | Number of Term Pregnancies                  |                                                          |
| pht001019       | BMIX             | BMI                                         |                                                          |
| pht001019       | BMICX            | BMI Categorical                             |                                                          |
| pht000998       | RACE             | Racial or ethnic group                      |                                                          |
| pht001514       | F134PARKINS      | Parkinsons disease ever                     | We will drop where = 1 (yes). Expect low number (20-ish) |
| pht001514       | F134DIAB         | Diabetes/high blood sugar ever              |                                                          |
| pht001005       | INCONT           | Ever leaked urine                           | Used to define case/control status                       |
| pht001005       | FRQINCON         | How often leaked urine                      | Used to define case status                               |
| pht001005       | CGHINCON         | Leak urine when cough, laugh                | Used to define cases status                              |

**Cases (pht001005)**
* INCONT (ever leak) = Yes (1)
* CGHINCON (Leak urine when cough) = Yes (1)
* FRQINCON (frequency) = 3,4, or 5 -> more than once a month

**Controls**
* INCONT (ever leak) = Yes (0)

<br>

We will use the [Plink](https://www.cog-genomics.org/plink/1.9/formats#fam) standard for coding case/controls.<br>
Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)

In [29]:
setwd("~/Downloads/uui-gwas/PhenotypeFiles/dbGaP-30943/")
list.files()

In [40]:
# load all data
subject_sampleids <- read.delim("processing/phs000315_subject_sampleid_map.txt",
                              header = T, 
                              sep = "",
                               colClasses = c("character", "character"))
head(subject_sampleids)
length(subject_sampleids$SUBJID)

age_race_c1 <- read.delim("phs000200.v12.pht000998.v6.p3.c1.f2_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "",
                              skip = 10)
age_race_c2 <- read.delim("phs000200.v12.pht000998.v6.p3.c2.f2_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "",
                              skip = 10)
parity_c1 <- read.delim("phs000200.v12.pht001000.v7.p3.c1.f31_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "",
                              skip = 10)
parity_c2 <- read.delim("phs000200.v12.pht001000.v7.p3.c2.f31_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "",
                              skip = 10)

bmi_c1 <- read.delim("phs000200.v12.pht001019.v6.p3.c1.f80_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "",
                              skip = 10)

bmi_c2 <- read.delim("phs000200.v12.pht001019.v6.p3.c2.f80_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "",
                              skip = 10)

parkinsons_diab_c1 <- read.delim("phs000200.v12.pht001514.v6.p3.c1.f134_rel1.HMB-IRB.txt.gz",
                              header = T, 
                              sep = "",
                              skip = 10)

parkinsons_diab_c2 <- read.delim("phs000200.v12.pht001514.v6.p3.c2.f134_rel1.HMB-IRB-NPU.txt.gz",
                              header = T, 
                              sep = "",
                              skip = 10)

case_control_c1 <- read.delim("phs000200.v12.pht001005.v6.p3.c1.f37_rel1.HMB-IRB.txt",
                              header = T, 
                              sep = "",
                              skip = 10)

case_control_c2 <- read.delim("phs000200.v12.pht001005.v6.p3.c2.f37_rel1.HMB-IRB-NPU.txt",
                              header = T, 
                              sep = "",
                              skip = 10)

Unnamed: 0_level_0,SUBJID,SAMPLE_ID
Unnamed: 0_level_1,<chr>,<chr>
1,729534,100034
2,716669,100046
3,719273,100134
4,857580,100146
5,777019,100155
6,725283,100210


In [46]:
# visual inspection of each loaded dataset
cat("Age and race")
head(age_race_c1)
head(age_race_c2)

cat("\n\n\n\n\n\nParity")
head(parity_c1)
head(parity_c2)

cat("\n\n\n\n\n\nBMI (BMIX) and Categorical BMI (BMICX)")
head(bmi_c1)
head(bmi_c2)

cat("\n\n\n\n\n\nParkinsons disease (F134PARKINS) and diabetes (F134DIAB)")
head(parkinsons_diab_c1)
head(parkinsons_diab_c2)

cat("\n\n\n\n\n\nCase/Control: NCONT, CGHINCON, and FRQINCON.")
head(case_control_c1)
head(case_control_c2)

Age and race

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F2DAYS,AGE,AREA3Y,OTHSTDY,EXSTDY,BRCA_F2,COLON_F2,COLON10Y,⋯,AVAILDM,INTHRT,AVAILHRT,TALKDOC,HRTINFDR,HELPFILL,AGER,HORMSTAT,AGEHYST,DIABTRT
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,220079,700001,-41,74,1,0,0,0,0,0,⋯,,,,,,,,,,
2,221745,700003,-55,59,1,0,0,0,0,0,⋯,,,,,,,,,,
3,215143,700004,-8,56,1,0,0,0,0,0,⋯,,,,,,,,,,
4,214904,700005,-241,64,1,0,0,0,0,0,⋯,,,,,,,,,,
5,220352,700006,-44,58,1,0,0,0,0,0,⋯,,,,,,,,,,
6,216549,700007,-82,74,1,0,0,0,0,0,⋯,,,,,,,,,,


Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F2DAYS,AGE,AREA3Y,OTHSTDY,EXSTDY,BRCA_F2,COLON_F2,COLON10Y,⋯,AVAILDM,INTHRT,AVAILHRT,TALKDOC,HRTINFDR,HELPFILL,AGER,HORMSTAT,AGEHYST,DIABTRT
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,213433,700002,-73,76,1,0,0,0,0,0,⋯,,,,,,,,,,
2,220665,700039,-191,52,1,1,0,1,0,0,⋯,,,,,,,,,,
3,218329,700062,-49,61,1,0,0,0,0,0,⋯,,,,,,,,,,
4,219588,700092,-111,68,1,0,0,0,0,0,⋯,,,,,,,,,,
5,212624,700112,-85,60,1,0,0,0,0,0,⋯,,,,,,,,,,
6,221933,700120,851,78,1,0,0,0,0,0,⋯,,,,,,,,,,








Parity

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F31DAYS,MENARCHE,MENSREG,MENSREGA,MENOPSEA,MENSWO1Y,MENSWOD,ANYMENSA,⋯,BRSTREMO,GRAVID,PARITY,FULLTRMR,NUMLIVER,AGEFBIR,BOOPH,BRSTFDMO,BRSTDIS,MENO
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<lgl>,<lgl>
1,220079,700001,-30,5,2,8,47,0,47,1,⋯,,,,,,,,,,
2,221745,700003,-7,4,1,47,0,47,1,49,⋯,,,,,,,,,,
3,215143,700004,-2,5,1,5,0,52,1,49,⋯,,,,,,,,,,
4,214904,700005,-16,7,2,8,43,0,43,1,⋯,,,,,,,,,,
5,220352,700006,-22,4,1,38,0,38,1,50,⋯,,,,,,,,,,
6,216549,700007,-13,5,1,5,42,0,42,1,⋯,,,,,,,,,,


Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F31DAYS,MENARCHE,MENSREG,MENSREGA,MENOPSEA,MENSWO1Y,MENSWOD,ANYMENSA,⋯,BRSTREMO,GRAVID,PARITY,FULLTRMR,NUMLIVER,AGEFBIR,BOOPH,BRSTFDMO,BRSTDIS,MENO
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<lgl>,<lgl>
1,213433,700002,-32,6,1,6,39,0,39,1,⋯,,,,,,,,,,
2,220665,700039,-37,4,1,6,46,0,48,1,⋯,,,,,,,,,,
3,218329,700062,-29,4,0,40,1,2,40,1,⋯,,,,,,,,,,
4,219588,700092,-34,3,1,3,0,61,1,40,⋯,,,,,,,,,,
5,212624,700112,-71,4,1,6,0,1,8,0,⋯,,,,,,,,,,
6,221933,700120,-3,6,1,6,0,40,0,1,⋯,,,,,,,,,,








BMI and Categorical BMI

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F80VTYP,F80VY,F80DAYS,PULSE30,SYSTBP1,DIASBP1,SYSTBP2,DIASBP2,⋯,WAISTX,HIPX,WHEXPECT,SYST,SYSTOL,DIAS,DIASTOL,BMIX,BMICX,WHRX
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,220079,700001,1,0,-30,38,130,130,66,155.3,⋯,102.5,1.0,130,2,66,1,33.17012,4.0,0.90244,
2,220079,700001,3,3,1001,36,138,86,140,86.0,⋯,91.0,103.5,1,139,2,86,1.0,31.63451,4.0,0.87923
3,221745,700003,1,0,-7,32,136,90,136,94.0,⋯,71.0,100.0,1,136,2,92,2.0,22.94974,2.0,0.71
4,215143,700004,1,0,-2,30,120,80,122,78.0,⋯,72.0,99.0,1,121,2,79,1.0,21.29872,2.0,0.72727
5,215143,700004,3,1,354,30,110,70,104,70.0,⋯,71.0,97.0,1,107,1,70,1.0,22.00399,2.0,0.73196
6,215143,700004,3,2,721,30,126,76,118,74.0,⋯,69.0,94.0,0,122,2,75,1.0,20.02002,2.0,0.73404


Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F80VTYP,F80VY,F80DAYS,PULSE30,SYSTBP1,DIASBP1,SYSTBP2,DIASBP2,⋯,WAISTX,HIPX,WHEXPECT,SYST,SYSTOL,DIAS,DIASTOL,BMIX,BMICX,WHRX
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,213433,700002,1,0,-32,25,108,68,104,70,⋯,97.6,119.0,1,106,1,69,1,32.4515,4,0.82017
2,213433,700002,3,3,1094,36,150,82,152,82,⋯,98.0,116.0,1,151,3,82,1,31.77369,4,0.84483
3,220665,700039,1,0,-37,37,122,82,118,82,⋯,118.0,122.0,1,120,1,82,1,35.77999,5,0.96721
4,220665,700039,3,3,1216,32,130,76,128,78,⋯,110.3,117.1,1,129,2,77,1,32.41437,4,0.94193
5,218329,700062,1,0,-29,42,166,90,166,90,⋯,77.5,104.0,1,166,3,90,2,28.70128,3,0.74519
6,219588,700092,1,0,-61,30,150,88,148,86,⋯,98.0,107.0,1,149,3,87,1,29.62069,3,0.91589








Parkinsons disease and diabetes

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F134VTYP,F134VY,F134DAYS,F134WHOM,F134PARKINS,F134DIAB
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,220079,700001,3,7,2518,1,0,0
2,221745,700003,3,9,3263,1,0,0
3,215143,700004,3,10,3576,1,0,0
4,214904,700005,3,8,2839,1,0,0
5,220352,700006,3,8,2958,1,0,0
6,222081,700008,3,10,3688,1,0,0


Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F134VTYP,F134VY,F134DAYS,F134WHOM,F134PARKINS,F134DIAB
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
1,218239,701801,3,11,4002,1,0,0
2,222619,702513,3,10,3449,1,0,1
3,1437884,712170,3,8,2993,1,0,0
4,1437980,712374,3,12,4152,1,0,0
5,387496,712453,3,9,3309,1,0,0
6,1438091,712595,3,10,3661,1,0,0








Case/Control: NCONT, CGHINCON, and FRQINCON.

Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F37VTYP,F37VY,F37DAYS,LISTEN,GOODADVC,TAKEDR,GOODTIME,HLPPROB,⋯,OPTIMISM,PAIN,PHYLIMIT,PHYSFUN,PSHTDEP,SLPDSTRB,SOCFUNC,SOCSTRN,SOCSUPP,SYMPTOM
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,220079,700001,1,0,-30,4,4,5,4,5,⋯,,,,,,,,,,
2,221745,700003,1,0,-7,4,4,4,4,4,⋯,,,,,,,,,,
3,215143,700004,1,0,-2,5,4,3,3,3,⋯,,,,,,,,,,
4,215143,700004,2,9,3102,3,3,2,3,3,⋯,,,,,,,,,,
5,214904,700005,1,0,-16,3,3,1,3,1,⋯,,,,,,,,,,
6,214904,700005,2,7,2465,3,2,1,3,2,⋯,,,,,,,,,,


Unnamed: 0_level_0,dbGaP_Subject_ID,SUBJID,F37VTYP,F37VY,F37DAYS,LISTEN,GOODADVC,TAKEDR,GOODTIME,HLPPROB,⋯,OPTIMISM,PAIN,PHYLIMIT,PHYSFUN,PSHTDEP,SLPDSTRB,SOCFUNC,SOCSTRN,SOCSUPP,SYMPTOM
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,213433,700002,1,0,-32,4,4,4,4,4,⋯,,,,,,,,,,
2,220665,700039,1,0,-37,5,5,4,4,4,⋯,,,,,,,,,,
3,218329,700062,1,0,-29,2,5,5,2,2,⋯,,,,,,,,,,
4,219588,700092,1,0,-34,3,4,3,2,4,⋯,,,,,,,,,,
5,212624,700112,1,0,-71,5,5,4,3,5,⋯,,,,,,,,,,
6,221933,700120,1,0,-3,5,5,5,4,5,⋯,,,,,,,,,,
