-
Notifications
You must be signed in to change notification settings - Fork 11
/
Fig.ExpressionAnalysis.Rmd
152 lines (124 loc) · 5.71 KB
/
Fig.ExpressionAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
title: "Comparison of genes with tissue-specific eQTLs against other genes"
author: "Sarah Urbut"
output: workflowr::wflow_html
---
In this analysis, we assess whether tissue-specific eQTLs we
identified can be explained by tissue-specific expression.
Specifically, we take genes with tissue-specific eQTLs, and examine
the distribution of expression in the eQTL-affected tissue relative to
expression in other tissues.
## Load data and MASH results
In the next code chunk, we load some GTEx summary statistics ( average
gene expression values and z-scores), as well as some of the results
generated from the MASH analysis of the GTEx data.
Expression is here defined as median across individuals of the log
Reads per Kilobase Mapped (RPKM).
```{r load-results}
data <- read.csv("../data/AvgExpr.csv.gz",header = TRUE)
out <- readRDS("../data/MatrixEQTLSumStats.Portable.Z.rds")
maxz <- out$test.z
qtl.names <-
sapply(1:length(rownames(maxz)),
function(x) unlist(strsplit(rownames(maxz)[x], "[_]"))[[1]])
rownames(data) <- data[,1]
expr.data <- data[,-1]
out <- readRDS(paste("../output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3",
"lite.single.expanded.V1.posterior.rds",sep = "."))
pmash <- out$posterior.means
lfsr <- out$lfsr
colnames(lfsr) <- colnames(pmash) <- colnames(maxz)
rownames(lfsr) <- rownames(pmash) <- rownames(maxz)
expr.sort <- expr.data[rownames(expr.data)%in%qtl.names,]
a <- match(qtl.names,rownames(expr.sort))
expr.sort <- expr.sort[a,]
missing.tissues <- c(7,8,19,20,24,25,31,34,37)
exp.sort <- expr.sort[,-missing.tissues]
colnames(exp.sort) <- colnames(maxz)
```
## Define plotting functions
Here we define a couple functions used to compare the densities and
CDFs of gene expression levels.
```{r define-plotting-functions}
plot_tissuespecifictwo = function(tissuename,lfsr,ymax,curvedata,title,
thresh=0.05,subset=1:44,exclude=0.01) {
index_tissue=which(colnames(lfsr) %in% tissuename);
ybar=title
# Create a matrix showing whether or not lfsr satisfies threshold.
sigmat = lfsr <= thresh;
sigs=which(rowSums(sigmat[,index_tissue,drop=FALSE])==length(tissuename) &
rowSums(sigmat[,-index_tissue,drop=FALSE])==0)
norm.spec=density(curvedata[sigs,index_tissue])
norm.other=density(curvedata[-sigs,index_tissue])
xmin=min(norm.spec$x,norm.other$x)-1
ymin=min(norm.spec$y,norm.other$y)
xmax=max(norm.spec$x,norm.other$x)+1
plot(norm.other,xlim = c(xmin,xmax),ylim=c(0,ymax),
col="black",main=tissuename,xlab="log(RPKM)")
lines(norm.spec,col="red")
legend("right",legend = c("All Genes","Tissue Specific"),
col=c("black","red"),pch=c(1,1))
}
plot_tissuespecificthree = function(tissuename,lfsr,ymax,curvedata,title,
thresh=0.05,subset=1:44,exclude=0.01) {
index_tissue=which(colnames(lfsr) %in% tissuename);
ybar=title
# Create a matrix showing whether or not lfsr satisfies threshold.
sigmat = lfsr <= thresh
sigs=which(rowSums(sigmat[,index_tissue,drop=FALSE])==length(tissuename) &
rowSums(sigmat[,-index_tissue,drop=FALSE])==0)
norm.spec=ecdf(curvedata[sigs,index_tissue])
norm.other=ecdf(curvedata[-sigs,index_tissue])
plot(norm.other,ylim=c(0,ymax),
col="black",main=tissuename,xlab="log(RPKM)")
lines(norm.spec,col="red",cex=0.1)
legend("right",legend = c("All Genes","Tissue Specific"),
col=c("black","red"),pch=c(1,1))
}
```
## Distribution of expression levels in testis
The two plots below compare the densities and cumulative distribution
functions of the expression levels for all genes (black), and for
genes identified as having a “tissue-specific” eQTL (red) in testis.
```{r testis}
plot_tissuespecifictwo(tissuename = "Testis",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=0.5)
plot_tissuespecificthree(tissuename = "Testis",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=1)
```
The distribution functions are reasonably similar.
## Distribution of expression levels in thyroid
Next we show the same two plots for thyroid.
```{r thyroid}
plot_tissuespecifictwo(tissuename = "Thyroid",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax = 0.5)
plot_tissuespecificthree(tissuename = "Thyroid",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax = 1)
```
## Distribution of expression levels in whole blood
Next, we look at the distributions in whole blood cells.
```{r whole-blood}
plot_tissuespecifictwo(tissuename = "Whole_Blood",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax=0.5)
plot_tissuespecificthree(tissuename = "Whole_Blood",lfsr = lfsr,
curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=1)
```
## Distribution of expression levels in fibroblasts
Finally, we examine the gene expression distributions in fibroblasts.
```{r fibroblasts}
plot_tissuespecifictwo(tissuename = "Cells_Transformed_fibroblasts",
lfsr = lfsr,curvedata = log(exp.sort),title = "Test",
thresh = 0.05,ymax=0.5)
plot_tissuespecificthree(tissuename = "Cells_Transformed_fibroblasts",
lfsr = lfsr,curvedata = log(exp.sort),title = "Test",
thresh = 0.05 ,ymax=1)
```
In each case, the distribution functions are similar. This tells us
that tissue-specific eQTLs are not simply reflecting tissue-specific
expression.