-
Notifications
You must be signed in to change notification settings - Fork 11
/
ExpressionAnalysis.Rmd
160 lines (130 loc) · 6.02 KB
/
ExpressionAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: "Comparison of genes with tissue-specific eQTLs against other genes"
author: "Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens"
output: workflowr::wflow_html
---
In this analysis, we assess whether tissue-specific eQTLs we
identified can be explained by tissue-specific expression.
Specifically, we take genes with tissue-specific eQTLs, and examine
the distribution of expression in the eQTL-affected tissue relative to
expression in other tissues.
Compare plots below with Supplementary Figure 6 in the manuscript.
```{r knitr, message = FALSE, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, fig.width = 5.5,
fig.height = 4,fig.align = "center",
comment = "#")
```
## Load data and mash results
In the next code chunk, we load some GTEx summary statistics (average
gene expression values and *z*-scores), as well as some of the results
generated from the mash analysis of the GTEx data.
Expression is here defined as median across individuals of the log
Reads per Kilobase Mapped (RPKM).
```{r load-results}
data <- read.csv("../data/AvgExpr.csv.gz",header = TRUE)
out <- readRDS("../data/MatrixEQTLSumStats.Portable.Z.rds")
maxz <- out$test.z
qtl.names <-
sapply(1:length(rownames(maxz)),
function(x) unlist(strsplit(rownames(maxz)[x], "[_]"))[[1]])
rownames(data) <- data[,1]
expr.data <- data[,-1]
out <- readRDS(paste("../output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3",
"lite.single.expanded.V1.posterior.rds",sep = "."))
pmash <- out$posterior.means
lfsr <- out$lfsr
colnames(lfsr) <- colnames(pmash) <- colnames(maxz)
rownames(lfsr) <- rownames(pmash) <- rownames(maxz)
expr.sort <- expr.data[rownames(expr.data)%in%qtl.names,]
a <- match(qtl.names,rownames(expr.sort))
expr.sort <- expr.sort[a,]
missing.tissues <- c(7,8,19,20,24,25,31,34,37)
exp.sort <- expr.sort[,-missing.tissues]
colnames(exp.sort) <- colnames(maxz)
```
## Define plotting functions
Here we define a couple functions used to compare the densities and
CDFs of gene expression levels.
```{r define-plotting-functions}
plot_tissuespecifictwo = function(tissuename,lfsr,ymax,curvedata,title,
thresh=0.05,subset=1:44,exclude=0.01) {
index_tissue=which(colnames(lfsr) %in% tissuename);
ybar=title
# Create a matrix showing whether or not lfsr satisfies threshold.
sigmat = lfsr <= thresh;
sigs=which(rowSums(sigmat[,index_tissue,drop=FALSE])==length(tissuename) &
rowSums(sigmat[,-index_tissue,drop=FALSE])==0)
norm.spec=density(curvedata[sigs,index_tissue])
norm.other=density(curvedata[-sigs,index_tissue])
xmin=min(norm.spec$x,norm.other$x)-1
ymin=min(norm.spec$y,norm.other$y)
xmax=max(norm.spec$x,norm.other$x)+1
plot(norm.other,xlim = c(xmin,xmax),ylim=c(0,ymax),
col="black",main=tissuename,xlab="log(RPKM)")
lines(norm.spec,col="red")
legend("right",legend = c("All Genes","Tissue Specific"),
col=c("black","red"),pch=c(1,1))
}
plot_tissuespecificthree = function(tissuename,lfsr,ymax,curvedata,title,
thresh=0.05,subset=1:44,exclude=0.01) {
index_tissue=which(colnames(lfsr) %in% tissuename);
ybar=title
# Create a matrix showing whether or not lfsr satisfies threshold.
sigmat = lfsr <= thresh
sigs=which(rowSums(sigmat[,index_tissue,drop=FALSE])==length(tissuename) &
rowSums(sigmat[,-index_tissue,drop=FALSE])==0)
norm.spec=ecdf(curvedata[sigs,index_tissue])
norm.other=ecdf(curvedata[-sigs,index_tissue])
plot(norm.other,ylim=c(0,ymax),
col="black",main=tissuename,xlab="log(RPKM)")
lines(norm.spec,col="red",cex=0.1)
legend("right",legend = c("All Genes","Tissue Specific"),
col=c("black","red"),pch=c(1,1))
}
```
## Distribution of expression levels in testis
The two plots below compare the densities and cumulative distribution
functions of the expression levels for all genes (black), and for
genes identified as having a “tissue-specific” eQTL (red) in testis.
```{r testis}
plot_tissuespecifictwo(tissuename = "Testis",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=0.5)
plot_tissuespecificthree(tissuename = "Testis",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=1)
```
The distribution functions are reasonably similar.
## Distribution of expression levels in thyroid
Next we show the same two plots for thyroid.
```{r thyroid}
plot_tissuespecifictwo(tissuename = "Thyroid",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax = 0.5)
plot_tissuespecificthree(tissuename = "Thyroid",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax = 1)
```
## Distribution of expression levels in whole blood
Next, we look at the distributions in whole blood cells.
```{r whole-blood}
plot_tissuespecifictwo(tissuename = "Whole_Blood",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax=0.5)
plot_tissuespecificthree(tissuename = "Whole_Blood",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=1)
```
## Distribution of expression levels in fibroblasts
Finally, we examine the gene expression distributions in fibroblasts.
```{r fibroblasts}
plot_tissuespecifictwo(tissuename = "Cells_Transformed_fibroblasts",
lfsr = lfsr,curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax=0.5)
plot_tissuespecificthree(tissuename = "Cells_Transformed_fibroblasts",
lfsr = lfsr,curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=1)
```
In each case, the distribution functions are similar. This tells us
that tissue-specific eQTLs are not simply reflecting tissue-specific
expression.