-
Notifications
You must be signed in to change notification settings - Fork 11
/
Fig.Uk3.Rmd
120 lines (97 loc) · 4 KB
/
Fig.Uk3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: "Primary correlation patterns identified by mash in GTEx data"
author: "Sarah Urbut"
output: workflowr::wflow_html
---
"Uk3" is the covariance matrix corresponding to the output of the
ExtremeDeconvolution algorithm that was initialized with the rank3 SVD
approximation of $X^TX$. It is the pattern of sharing identified from
the dominant covariance matrix (the one with the largest mixture
weight).
Here we plot the correlation matrix and the first 3 eigenvectors of
"Uk3". This provides a visualization of the primary patterns of
genetic sharing identified by our method, MASH. This code should
reproduce Figure 3 of the paper.
```{r knitr, message = FALSE, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, fig.width = 8,
fig.height = 4,fig.align = "center")
```
## Set up environment
First, we load some packages into the R environment.
```{r load-pkgs, message = FALSE}
library(lattice)
library(ggplot2)
library(colorRamps)
library(fields)
```
## Load data and MASH results
We load some GTEx summary statistics, as well as some of the results
generated from the MASH analysis of the GTEx data.
```{r load-results, echo = TRUE}
covmat <- readRDS(paste("../output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3",
"lite.single.expanded.rds",sep = "."))
pis <- readRDS(paste("../output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3",
"lite.single.expanded.V1.pihat.rds",sep = "."))$pihat
z.stat <- readRDS("../data/MatrixEQTLSumStats.Portable.Z.rds")$test.z
pi.mat <- matrix(pis[-length(pis)],ncol = 54,nrow = 22,byrow = TRUE)
names <- colnames(z.stat)
colnames(pi.mat) <-
c("ID","X'X","SVD","F1","F2","F3","F4","F5","SFA_Rank5",names,"ALL")
```
Next, we load the tissue indices and tissue names:
```{r load-tissues-names}
k <- 3
x <- cov2cor(covmat[[k]])
x[x < 0] <- 0
colnames(x) <- names
rownames(x) <- names
h <- read.table("../output/uk3rowindices.txt")[,1]
```
For the plots of the eigenvectors, we load the colours that are
conventionally used to represent the tissues in plots.
```{r load-tissue-colors}
missing.tissues <- c(7,8,19,20,24,25,31,34,37)
color.gtex <- read.table("../data/GTExColors.txt",sep = "\t",
comment.char = '')[-missing.tissues,]
```
## Summarize relative importance of the covariance matrices
The posterior mixture weights give the relative importance of the
covariance matrices for capturing patterns in the data.
```{r plot-mixture-weights, fig.width = 8,fig.height = 4}
barplot(colSums(pi.mat),las = 2,cex.names = 0.5)
```
Here we see that the SVD component has the largest weight.
## Generate heatmap of Uk3 covariance matrix
Now we produce the heatmap showing the full covariance matrix.
```{r heatmap-uk3, fig.width = 10, fig.height = 8}
smat <- (x[(h),(h)])
smat[lower.tri(smat)] <- NA
clrs <- colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(64)
lat <- x[rev(h),rev(h)]
lat[lower.tri(lat)] <- NA
n <- nrow(lat)
print(levelplot(lat[n:1,],col.regions = clrs,xlab = "",ylab = "",
colorkey = TRUE,at = seq(0,1,length.out = 64),
scales = list(cex = 0.6,x = list(rot = 45))))
```
The eigenvectors capture the predominant patterns in the UK3
covariance matrix.
```{r plot-eigenvectors, fig.width = 6, fig.height = 3}
k <- 3
vold <- svd(covmat[[k]])$v
u <- svd(covmat[[k]])$u
d <- svd(covmat[[k]])$d
v <- vold[h,] # Shuffle so correct order
names <- names[h]
color.gtex <- color.gtex[h,]
for (j in 1:3)
barplot(v[,j]/v[,j][which.max(abs(v[,j]))],names = "",cex.names = 0.5,
las = 2,main = paste0("EigenVector",j,"Uk",k),
col = as.character(color.gtex[,2]))
```
The first eigenvector reflects broad sharing among tissues, with all
effects in the same direction; the second eigenvector captures
differences between brain (and, to a less extent, testis and
pituitary) vs other tissues; the third eigenvector primarily captures
effects that are stronger in whole blood than elsewhere.