### Interaction analysis for HLA genotypes and phenotypes across UK Biobank 

#### Genotypes and phenotypes 

Genotypes and phenotypes used for analysis is from [DeBoever et al. 2018](https://www.nature.com/articles/s41467-018-03910-9).

#### Statistical analysis

We will be using [Interact](https://cran.r-project.org/web/packages/Interact/) package for testing, and the [glinternet](https://cran.r-project.org/web/packages/glinternet/index.html) and [hiernet](https://cran.r-project.org/web/packages/hierNet/index.html) packages for multivariate modeling. 

In [3]:
library('glinternet')
library('hierNet')
library('Interact')

In [5]:
# Load HLA genotypes and phenotypes data 
load('/oak/stanford/groups/mrivas/users/jolivier/repos/hla-assoc/scripts/output/make_dosage_rds/hla_phenotypes.Rda')

In [6]:
objects()

We have a few objects included Dataframes documented by [Guhan](https://med.stanford.edu/rivaslab/People.html):
- `covars`: Age, sex, array, and first 40 PCs for individuals listed.
- `dosages`: Raw allelotype dosages for individuals listed.
- `rounded_dosages` Rounded allelotype dosages (cleaner, but less information)
for individuals listed. Possible rounded dosages = [0, 1, 2].
- `phenotypes`: Subset of master phenotype file that has the phenotypes of
interest for the individuals listed. 1 = unaffected, 2 = affected, -9 = 
missing data of some sort.
- `master.phenotype`: Master phenotype file across all 500,000 individuals.
- `master.phenotype.genotype`: Phenotype and genotype data with HLA alleles across 337,208 individuals.

We will focus on the `master.phenotype.genotype` Dataframe for analysis. 

In [7]:
colnames(master.phenotype.genotype)

We will test the analysis for [type 1 diabetes](https://en.wikipedia.org/wiki/Diabetes_mellitus_type_1#Genetics), [celiac disease](https://en.wikipedia.org/wiki/Coeliac_disease#Genetics), [ankylosing spondylitis](https://en.wikipedia.org/wiki/Ankylosing_spondylitis#Genetic_testing), and [psoriasis](https://en.wikipedia.org/wiki/Psoriasis#Genetics). 

The codes can be mapped from the latest master phenotype file prepared by Matthew Aguirre in the following [Github link](https://github.com/rivas-lab/ukbb-tools/blob/master/02_phenotyping/icdinfo.txt).

We will begin with type 1 diabetes where the small GWAS can be seen in [Global Biobank Engine](https://biobankengine.stanford.edu/coding/HC337). 

In [8]:
dim(master.phenotype.genotype)

Select columns for covariates

In [10]:
colnames(master.phenotype.genotype)[c(3,4,seq(6,15,1))]

Select columns for HLA alleles

In [11]:
which(colnames(master.phenotype.genotype) == "A_101")

In [12]:
which(colnames(master.phenotype.genotype) == "DPA1_401")

In [13]:
which(colnames(master.phenotype.genotype) == "HC337")

In [14]:
library(ggplot2)

In [23]:
dim(cbind(master.phenotype.genotype[,2655:3016], master.phenotype.genotype[,c(3,4,seq(6,15,1))]))
nrow(cbind(master.phenotype.genotype[,2655:3016],master.phenotype.genotype[,c(3,4,seq(6,15,1))]))

In [25]:
ncol(cbind(master.phenotype.genotype[,2655:3016],master.phenotype.genotype[,c(3,4,seq(6,15,1))]))

In [27]:
# have to convert phenotypes to 0,1 
master.phenotype.genotype[master.phenotype.genotype[,2206] == 1,2206] <- 0 
master.phenotype.genotype[master.phenotype.genotype[,2206] == 2,2206] <- 1

In [None]:
fit = glinternet(cbind(master.phenotype.genotype[master.phenotype.genotype[,2206] != -9,2655:3016],master.phenotype.genotype[master.phenotype.genotype[,2206] != -9,c(3,4,seq(6,15,1))]),master.phenotype.genotype[master.phenotype.genotype[,2206] != -9,2206], numLevels = rep(1,374), family = "binomial")


In [30]:
unique(master.phenotype.genotype[,2206])