/
peco.Rmd
155 lines (121 loc) · 5 KB
/
peco.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Predicting cell cycle phase using peco"
author: "Joyce Hsiao"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{peco}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
`peco` is a supervised approach for predicting cell cycle phase in a
continuum using single-cell RNA sequencing data. The R package
provides functions to build a training data set and predict cell cycle
on a continuum.
Our work shows that peco is able to predict continuous cell
cylce phase using a small set of cyclic genes, _CDK1_, _UBE2C_,
_TOP2A_, _HISTH1E_, and _HISTH1C_ (These were dentified as cell cycle marker
genes in studies of yeast ([Spellman et al, 1998][spellman]) and HeLa
cells ([Whitfield et al, 2002][whitfield]).
Below, we present two use cases.
In the first use case, we show how to use the built-in training
dataset to predict continuous cell cycle.
In the second use case, we show how to create a training data set,
then build a predictor using the training data.
```{r setup, include=FALSE}
knitr::opts_chunk$set(results = "hold",collapse = TRUE,comment = "#>",
fig.align = "center")
```
`peco` uses `SingleCellExperiment` class objects:
```{r load-pkgs, message=FALSE, warning=FALSE}
library(SingleCellExperiment)
library(doParallel)
library(foreach)
library(peco)
```
About the training data
-----------------------
`training_human` provides training data for 101 significant
cyclic genes. The data include:
+ `predict.yy`, a gene-by-sample matrix (101 x 888) that storing
predicted cyclic expression values.
+ `cellcycle_peco_ordered`, cell cycle phase in a unit circle
(angle), ordered from 0 to 2$\pi$.
+ `cellcycle_function`, functions used for prediction corresponding to
the top 101 cyclic genes identified.
+ `sigma`, standard error associated with cyclic trends of gene
expression.
+ `pve`, proportion of variance explained by the cyclic trend.
```{r loading-training-data}
data(training_human)
```
Predict cell cycle phase using gene expression data
---------------------------------------------------
`peco` is integrated with `SingleCellExperiment` objects in
Bioconductor. Here we illustrate using a `SingleCellExperiment` object
to perform cell cycle phase prediction.
`sce_top101genes` is a `SingleCellExpression` object for 101 genes and
888 single-cell samples.
```{r load-genes}
data(sce_top101genes)
assays(sce_top101genes)
```
Transform expression to quantile-normalizesd counts-per-million (CPM)
values; `peco` uses the `cpm_quantNormed` slot as the input data for
predictions.
```{r transform-expression, message=FALSE}
sce_top101genes <- data_transform_quantile(sce_top101genes)
assays(sce_top101genes)
```
Generate predictions using `cycle_npreg_outsample`.
```{r predict-cell-cycle}
pred_top101genes <-
cycle_npreg_outsample(Y_test = sce_top101genes,
sigma_est = training_human$sigma[rownames(sce_top101genes),],
funs_est = training_human$cellcycle_function[rownames(sce_top101genes)],
method.trend = "trendfilter",get_trend_estimates = FALSE)
```
`pred_top101genes$Y` contains a `SingleCellExperiment` object with
the predicted cell cycle phase in the `colData` slot:
```{r inspect-predictions}
head(colData(pred_top101genes$Y)$cellcycle_peco)
```
View the predictions for the *CDK1* gene (Ensembl id
ENSG00000170312). Because *CDK1* is a known cell cycle gene, this
visualization serves as a "sanity check" for the predictions.
```{r plot-predictions-single-gene, fig.height=4, fig.width=5}
x <- seq(0,2*pi,length.out = 100)
plot(x = colData(pred_top101genes$Y)$theta_shifted,
y = assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",],
pch = 21,col = "white",bg = "darkblue",
main = "CDK1",xlab = "FUCCI phase",
ylab = "quantile-normalized expression")
lines(x = x,y = training_human$cellcycle_function[["ENSG00000170312"]](x),
col = "tomato",lwd = 2)
```
Visualize cyclic expression trend based on predicted phase
----------------------------------------------------------
Next, we view the predictions for the top six genes. Here we use
`fit_cyclical_many` to estimate cell cycle.
```{r plot-predictions-top-genes, fig.height=8, fig.width=7, message=FALSE}
theta_predict <- colData(pred_top101genes$Y)$cellcycle_peco
names(theta_predict) <- rownames(colData(pred_top101genes$Y))
yy_input <- assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,]
fit_cyclic <- fit_cyclical_many(Y = yy_input,theta = theta_predict)
gene_symbols <- rowData(pred_top101genes$Y)[rownames(yy_input),"hgnc"]
x <- seq(0,2*pi,length.out = 100)
par(mfrow = c(3,2))
for (i in 1:6) {
plot(x = fit_cyclic$cellcycle_peco_ordered,y = yy_input[i,],
pch = 1,col = "darkblue",main = gene_symbols[i],
xlab = "FUCCI phase",ylab = "normalized expression")
lines(x = x,y = fit_cyclic$cellcycle_function[[i]](x),
col = "tomato",lwd = 2)
}
```
## Session information
```{r session-info}
sessionInfo()
```
[spellman]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC25624
[whitfield]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC117619