analysis/Fig.Uk3.Rmd

---
title: "Primary correlation patterns identified by mash in GTEx data"
author: "Sarah Urbut"
output: workflowr::wflow_html
---

"Uk3" is the covariance matrix corresponding to the output of the
ExtremeDeconvolution algorithm that was initialized with the rank3 SVD
approximation of $X^TX$. It is the pattern of sharing identified from
the dominant covariance matrix (the one with the largest mixture
weight).

Here we plot the correlation matrix and the first 3 eigenvectors of
"Uk3". This provides a visualization of the primary patterns of
genetic sharing identified by our method, MASH. This code should
reproduce Figure 3 of the paper.

```{r knitr, message = FALSE, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, fig.width = 8,
                      fig.height = 4,fig.align = "center")
```

## Set up environment

First, we load some packages into the R environment.

```{r load-pkgs, message = FALSE}
library(lattice)
library(ggplot2)
library(colorRamps)
library(fields)
```

## Load data and MASH results

We load some GTEx summary statistics, as well as some of the results
generated from the MASH analysis of the GTEx data.

```{r load-results, echo = TRUE}
covmat <- readRDS(paste("../output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3",
                        "lite.single.expanded.rds",sep = "."))
pis    <- readRDS(paste("../output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3",
                        "lite.single.expanded.V1.pihat.rds",sep = "."))$pihat
z.stat <- readRDS("../data/MatrixEQTLSumStats.Portable.Z.rds")$test.z
pi.mat <- matrix(pis[-length(pis)],ncol = 54,nrow = 22,byrow = TRUE)
names  <- colnames(z.stat)
colnames(pi.mat) <-
  c("ID","X'X","SVD","F1","F2","F3","F4","F5","SFA_Rank5",names,"ALL")
```

Next, we load the tissue indices and tissue names:

```{r load-tissues-names}
k <- 3
x           <- cov2cor(covmat[[k]])
x[x < 0]    <- 0
colnames(x) <- names
rownames(x) <- names
h <- read.table("../output/uk3rowindices.txt")[,1]
```

For the plots of the eigenvectors, we load the colours that are
conventionally used to represent the tissues in plots.

```{r load-tissue-colors}
missing.tissues <- c(7,8,19,20,24,25,31,34,37)
color.gtex <- read.table("../data/GTExColors.txt",sep = "\t",
                         comment.char = '')[-missing.tissues,]
```

## Summarize relative importance of the covariance matrices

The posterior mixture weights give the relative importance of the
covariance matrices for capturing patterns in the data.

```{r plot-mixture-weights, fig.width = 8,fig.height = 4}
barplot(colSums(pi.mat),las = 2,cex.names = 0.5)
```

Here we see that the SVD component has the largest weight.

## Generate heatmap of Uk3 covariance matrix

Now we produce the heatmap showing the full covariance matrix.

```{r heatmap-uk3, fig.width = 10, fig.height = 8}
smat <- (x[(h),(h)])
smat[lower.tri(smat)] <- NA
clrs <- colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
                               "#E0F3F8","#91BFDB","#4575B4")))(64)
lat <- x[rev(h),rev(h)]
lat[lower.tri(lat)] <- NA
n <- nrow(lat)
print(levelplot(lat[n:1,],col.regions = clrs,xlab = "",ylab = "",
                colorkey = TRUE,at = seq(0,1,length.out = 64),
                scales = list(cex = 0.6,x = list(rot = 45))))
```

The eigenvectors capture the predominant patterns in the UK3
covariance matrix.

```{r plot-eigenvectors, fig.width = 6, fig.height = 3}
k <- 3
vold  <- svd(covmat[[k]])$v
u     <- svd(covmat[[k]])$u
d     <- svd(covmat[[k]])$d
v     <- vold[h,] # Shuffle so correct order
names <- names[h]
color.gtex <- color.gtex[h,]
for (j in 1:3)
  barplot(v[,j]/v[,j][which.max(abs(v[,j]))],names = "",cex.names = 0.5,
          las = 2,main = paste0("EigenVector",j,"Uk",k),
          col = as.character(color.gtex[,2]))
```

The first eigenvector reflects broad sharing among tissues, with all
effects in the same direction; the second eigenvector captures
differences between brain (and, to a less extent, testis and
pituitary) vs other tissues; the third eigenvector primarily captures
effects that are stronger in whole blood than elsewhere.