/
PRECAST.DLPFC4.Rmd
269 lines (217 loc) · 9.6 KB
/
PRECAST.DLPFC4.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
---
title: 'PRECAST: Four DLPFC Sample Analysis'
author: "Wei Liu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{PRECAST: Four DLPFC Sample Analysis}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
tidy = TRUE,
fig.width = 11,
tidy.opts = list(width.cutoff = 95),
message = FALSE,
warning = FALSE,
time_it = TRUE
)
all_times <- list() # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
now <- NULL
function(before, options) {
if (before) {
now <<- Sys.time()
} else {
res <- difftime(Sys.time(), now, units = "secs")
all_times[[options$label]] <<- res
}
}
}))
```
This vignette introduces the PRECAST workflow for the analysis of four spatial transcriptomics datasets. The workflow consists of three steps
* Independent preprocessing and model setting
* Probabilistic embedding and clustering using PRECAST model
* Downstream analysis (i.e. visualization of clusters and embeddings)
We demonstrate the use of PRECAST to four human dorsolateral prefrontal cortex Visium data. Download the data to R
```{r eval =FALSE}
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(SingleCellExperiment))
name_ID4 <- as.character(c(151673, 151674, 151675, 151676))
### Read data in an online manner
n_ID <- length(name_ID4)
url_brainA <- "https://github.com/feiyoung/DR-SC.Analysis/raw/main/data/DLPFC_data/"; url_brainB <- ".rds"
seuList <- list()
if(!require(ProFAST)){
remotes::install_github("feiyoung/ProFAST")
}
for(i in 1:n_ID){
# i <- 1
cat('input brain data', i, '\n')
# load and read data
dlpfc <- readRDS(url(paste0(url_brainA, name_ID4[i],url_brainB) ))
count <- dlpfc@assays@data$counts
row.names(count) <- ProFAST::transferGeneNames(row.names(count), species = "Human")
seu1 <- CreateSeuratObject(counts = count,
meta.data = as.data.frame(colData(dlpfc)),
min.cells = 10,min.features = 10)
seuList[[i]] <- seu1
}
# saveRDS(seuList, file='seuList4.RDS')
```
The package can be loaded with the command:
```{r eval = FALSE}
library(PRECAST)
```
## Compare PRECAST with DR-SC in analyzing one sample
First, we view the the spatial transcriptomics data with Visium platform.
```{r eval = FALSE}
seuList <- readRDS("seuList4.RDS")
seuList ## a list including Seurat objects
```
check the meta data that must include the spatial coordinates named "row" and "col", respectively.
If the names are not, they are required to rename them.
```{r eval = FALSE}
metadataList <- lapply(seuList, function(x) x@meta.data)
for(r in seq_along(metadataList)){
meta_data <- metadataList[[r]]
cat(all(c("row", "col") %in% colnames(meta_data)), '\n') ## the names are correct!
}
```
### Prepare the PRECASTObject.
Create a PRECASTObj object to prepare for PRECAST models. Here, we only show the `HVGs` method to select the 2000 highly variable genes, but users are able to choose more genes or also use the `SPARK-X` to choose the spatially variable genes.
```{r eval = FALSE}
set.seed(2023)
preobj <- CreatePRECASTObject(seuList = seuList, selectGenesMethod="HVGs",
gene.number = 2000) #
```
### Add the model setting
```{r eval = FALSE}
## check the number of genes/features after filtering step
preobj@seulist
## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(preobj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = TRUE, coreNum = 1,
maxIter=30, verbose = TRUE)
```
### Fit PRECAST
For function `PRECAST`, users can specify the number of clusters $K$ or set `K` to be an integer vector by using modified BIC(MBIC) to determine $K$. Here, we use user-specified number of clusters.
```{r eval = FALSE}
### Given K
PRECASTObj <- PRECAST(PRECASTObj, K= 7)
```
Use the function `SelectModel()` to re-organize the fitted results in PRECASTObj.
```{r eval = FALSE}
## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- SelectModel(PRECASTObj)
ari_precast <- sapply(1:length(seuList), function(r) mclust::adjustedRandIndex(PRECASTObj@resList$cluster[[r]], PRECASTObj@seulist[[r]]$layer_guess_reordered))
mat <- matrix(round(ari_precast,2), nrow=1)
name_ID4 <- as.character(c(151673, 151674, 151675, 151676))
colnames(mat) <- name_ID4
DT::datatable(mat)
```
<details>
<summary>**Other options**</summary>
Users are also able to set multiple K, then choose the best one
```{r eval =FALSE}
PRECASTObj2 <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, coreNum = 4,
maxIter=30, verbose = TRUE) # set 4 cores to run in parallel.
PRECASTObj2 <- PRECAST(PRECASTObj2, K= 5:8)
## backup the fitting results in resList
resList2 <- PRECASTObj2@resList
PRECASTObj2 <- SelectModel(PRECASTObj2)
```
Besides, user can also use different initialization method by setting `int.model`, for example, set `int.model=NULL`; see the functions `AddParSetting()` and `model_set()` for more details.
</details>
### Integration and Visualization
Integrate the all samples using the `IntegrateSpaData` function. For computational efficiency, this function exclusively integrates the variable genes. Specifically, in cases where users do not specify the `PRECASTObj@seuList` or `seuList` argument within the `IntegrateSpaData` function, it automatically focuses on integrating only the variable genes. The default setting for `PRECASTObj@seuList` is `NULL` when `rawData.preserve` in `CreatePRECASTObject` is set to `FALSE`. For instance:
```{r eval = FALSE}
print(PRECASTObj@seuList)
seuInt <- IntegrateSpaData(PRECASTObj, species='Human')
seuInt
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.
```
<details>
<summary>**Integrating all genes**</summary>
There are two ways to use `IntegrateSpaData` integrating all genes, which will require more memory. We recommand running for all genes on server. The first one is to set value for `PRECASTObj@seuList`.
```{r eval =FALSE}
## assign the raw Seurat list object to it.
## For illustration, we generate a new seuList with more genes;
## For integrating all genes, users can set `seuList <- bc2`.
genes <- c(row.names(PRECASTObj@seulist[[1]]), row.names(seuList[[1]])[1:10])
seuList_sub <- lapply(seuList, function(x) x[genes,])
PRECASTObj@seuList <- seuList_sub #
seuInt <- IntegrateSpaData(PRECASTObj, species='Human')
seuInt
```
The second method is to set a value for the argument `seuList`:
```{r eval =FALSE}
PRECASTObj@seuList <- NULL
## At the same time, we can set subsampling to speed up the computation.
seuInt <- IntegrateSpaData(PRECASTObj, species='Human', seuList=seuList_sub, subsample_rate = 0.5)
seuInt
```
</details>
First, user can choose a beautiful color schema using `chooseColors()`.
```{r eval = FALSE}
cols_cluster <- chooseColors(palettes_name = 'Classic 20', n_colors=7, plot_colors = TRUE)
```
Show the spatial scatter plot for clusters
```{r eval = FALSE, fig.height = 8, fig.width=9}
p12 <- SpaPlot(seuInt, item='cluster', batch=NULL,point_size=1, cols=cols_cluster, combine=TRUE, nrow.legend=7)
p12
# users can plot each sample by setting combine=FALSE
```
Users can re-plot the above figures for specific need by returning a ggplot list object. For example, we plot the spatial heatmap using a common legend.
```{r eval = FALSE, fig.height = 8, fig.width=8.5}
library(ggplot2)
pList <- SpaPlot(seuInt, item='cluster', batch=NULL,point_size=2.5, cols=cols_cluster, combine=FALSE, nrow.legend=7)
pList <- lapply(pList, function(x) x + coord_flip() + scale_x_reverse())
drawFigs(pList, layout.dim = c(2,2), common.legend = TRUE, legend.position = 'right', align='hv')
```
Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.
```{r eval = FALSE, fig.height = 6, fig.width=5.5}
seuInt <- AddUMAP(seuInt)
p13List <- SpaPlot(seuInt, batch=NULL,item='RGB_UMAP',point_size=2, combine=FALSE, text_size=15)
p13List <- lapply(p13List, function(x) x + coord_flip() + scale_x_reverse())
drawFigs(p13List, layout.dim = c(2,2), common.legend = TRUE, legend.position = 'right', align='hv')
#seuInt <- AddTSNE(seuInt)
#SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2, combine=T, text_size=15)
```
Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.
```{r eval = FALSE, fig.height = 4.5, fig.width=12}
seuInt <- AddTSNE(seuInt, n_comp = 2)
p1 <- dimPlot(seuInt, item='cluster', point_size = 0.5, font_family='serif', cols=cols_cluster,border_col="gray10", nrow.legend=14, legend_pos='right') # Times New Roman
p2 <- dimPlot(seuInt, item='batch', point_size = 0.5, font_family='serif', legend_pos='right')
drawFigs(list(p1, p2), layout.dim = c(1,2), legend.position = 'right', align='hv')
```
Combined differential expression analysis
```{r eval = FALSE}
library(Seurat)
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 5
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
```
Plot DE genes' heatmap for each spatial domain identified by PRECAST.
```{r eval = FALSE, fig.height = 6, fig.width=8}
library(ggplot2)
## HeatMap
p1 <- DotPlot(seuInt, features = unique(top10$gene), col.min = 0, col.max = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=8))
p1
```
<details>
<summary>**Session Info**</summary>
```{r}
sessionInfo()
```
</details>