/
FunctionalAndStructuralPipeline.Rmd
430 lines (309 loc) · 15.6 KB
/
FunctionalAndStructuralPipeline.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
---
title: "Learning functional and structural spatial relationships with MISTy"
author:
- name: Leoni Zimmermann
affiliation:
- Heidelberg University, Heidelberg, Germany
- name: Jovan Tanevski
affiliation:
- Heidelberg University and Heidelberg University Hospital, Heidelberg, Germany
- Jožef Stefan Institute, Ljubljana, Slovenia
email: jovan.tanevski@uni-heidelberg.de
date: "`r Sys.Date()`"
package: mistyR
output:
rmarkdown::pdf_document:
df_print: kable
extra_dependencies:
nowidow: ["defaultlines=3", "all"]
vignette: >
%\VignetteIndexEntry{Learning functional and structural spatial relationships with MISTy}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Introduction
10X Visium captures spatially resolved transcriptomic profiles in spots containing multiple cells. In this vignette, we will use the gene expression information from Visium data to infer pathway and transcription factor activity and separately investigate spatial relationships between them and the cell-type composition. In addition, we will examine spatial relationships of ligands and receptors.
Load the necessary R packages:
```{r message=FALSE, warning=FALSE}
# MISTy
library(mistyR)
# For using Python
library(reticulate)
# Seurat
library(Seurat)
# Data manipulation
library(tidyverse)
# Pathways
library(decoupleR)
#Cleaning names
library(janitor)
```
We will use some functions in python since the computation time is significantly shorter than in R. Python chunks start with a #In Python. Install and load the necessary package for Python:
```{r}
py_install(c("decoupler","omnipath"), pip =TRUE)
```
```{python}
# In Python:
import decoupler as dc
```
## Get and load data
For this showcase, we use a 10X Visium spatial slide from [Kuppe et al., 2022](https://doi.org/10.1038/s41586-022-05060-x), where they created a spatial multi-omic map of human myocardial infarction. The tissue example data comes from the human heart of patient 14, which is in a chronic state following myocardial infarction. The Seurat object contains, among other things, the normalized and raw gene counts. First, we have to download and extract the file:
```{r}
download.file("https://zenodo.org/records/6580069/files/10X_Visium_ACH005.tar.gz?download=1",
destfile = "10X_Visium_ACH005.tar.gz", method = "curl")
untar("10X_Visium_ACH005.tar.gz")
```
The next step is to load the data, extract the normalized gene counts of genes expressed in at least 5% of the spots, and pixel coordinates. It is recommended to use pixel coordinates instead of row and column numbers since the rows are shifted and therefore do not express the real distance between the spots.
```{r}
seurat <- readRDS("ACH005/ACH005.rds")
expression_raw <- as.matrix(GetAssayData(seurat, layer = "counts", assay = "SCT"))
geometry <- GetTissueCoordinates(seurat, scale = NULL)
# Only take genes that expressed in at least 5% of the spots
expression <- expression_raw[rownames(expression_raw[(rowSums(expression_raw > 0) / ncol(expression_raw)) >= 0.05,]),]
```
Let's take a look at the slide itself and some of the cell-type niches defined by [Kuppe et al.](https://doi.org/10.1038/s41586-022-05060-x):
```{r}
SpatialPlot(seurat, alpha = 0)
SpatialPlot(seurat, group.by = "celltype_niche")
```
## Extract cell-type composition
The Seurat Object of the tissue slide also contains the estimated cell type proportions from cell2location. We extract them into a separate object we will later use with MISTy and visualize some of the cell types:
```{r fig.height=10, fig.width=10}
# Rename to more informative names
rownames(seurat@assays$c2l_props@data) <- rownames(seurat@assays$c2l_props@data) %>%
recode('Adipo' = 'Adipocytes',
'CM' = 'Cardiomyocytes',
'Endo' = 'Endothelial',
'Fib' = 'Fibroblasts',
'PC' = 'Pericytes',
'prolif' = 'Proliferating',
'vSMCs' = 'Vascular-SMCs')
# Extract into a separate object
composition <- as_tibble(t(seurat[["c2l_props"]]$data))
# Visualize cell types
DefaultAssay(seurat) <- "c2l_props"
SpatialFeaturePlot(seurat,
keep.scale = NULL,
features = c('Vascular-SMCs', "Cardiomyocytes", "Endothelial", "Fibroblasts"),
ncol = 2)
```
## Pathway activities on cell-type composition
Let's investigate the relationship between the cell-type compositions and pathway activities in our example slide. But before we create the views, we need to estimate the pathway activities. For this we will take pathway gene sets from [`PROGENy`](https://saezlab.github.io/progeny/index.html) and estimate the activity with [`decoupleR`](https://saezlab.github.io/decoupleR/):
```{r}
# Obtain genesets
model <- get_progeny(organism = "human", top = 500)
# Use multivariate linear model to estimate activity
est_path_act <- run_mlm(expression, model,.mor = NULL)
```
We add the result to the Seurat Object and plot the estimated activities to see the distribution over the slide:
```{r message=FALSE, warning=FALSE}
# Delete progeny assay from Kuppe et al.
seurat[['progeny']] <- NULL
# Put estimated pathway activities object into the correct format
est_path_act_wide <- est_path_act %>%
pivot_wider(id_cols = condition, names_from = source, values_from = score) %>%
column_to_rownames("condition")
# Clean names
colnames(est_path_act_wide) <- est_path_act_wide %>%
clean_names(parsing_option = 0) %>%
colnames(.)
# Add
seurat[['progeny']] <- CreateAssayObject(counts = t(est_path_act_wide))
SpatialFeaturePlot(seurat, features = c("jak.stat", "hypoxia"), image.alpha = 0)
```
### MISTy Views
For the MISTy view, we will use cell type compositions per spot as the intraview and add the estimated [`PROGENy`](https://saezlab.github.io/progeny/index.html) pathway activities as juxta and paraviews. The size of the neighborhood and the kernel, as well as the kernel family, should be chosen depending on the experiment. Here both distances were chosen to enclose only a small number of neighboring spots.
```{r message=FALSE, warning=FALSE}
# Clean names
colnames(composition) <- composition %>% clean_names(parsing_option = 0) %>% colnames(.)
# create intra from cell-type composition
comp_views <- create_initial_view(composition)
# juxta & para from pathway activity
path_act_views <- create_initial_view(est_path_act_wide) %>%
add_juxtaview(geometry, neighbor.thr = 130) %>%
add_paraview(geometry, l= 200, family = "gaussian")
# Combine views
com_path_act_views <- comp_views %>%
add_views(create_view("juxtaview.path.130", path_act_views[["juxtaview.130"]]$data, "juxta.path.130"))%>%
add_views(create_view("paraview.path.200", path_act_views[["paraview.200"]]$data, "para.path.200"))
```
Then run MISTy and collect the results:
```{r message=FALSE, warning=FALSE}
run_misty(com_path_act_views, "result/comp_path_act")
misty_results_com_path_act <- collect_results("result/comp_path_act/")
```
### Downstream analysis
With the collected results, we can now answer the following questions:
#### 1. To what extent can the analyzed surrounding tissues' pathway activities explain the cell-type composition of the spot compared to the intraview?
Here we can look at two different statistics: `intra.R2` shows the variance explained by the intraview alone, and `gain.R2` shows the increase in explainable variance when we additionally consider the other views (here juxta and para).
```{r}
misty_results_com_path_act %>%
plot_improvement_stats("intra.R2")%>%
plot_improvement_stats("gain.R2")
```
The juxta and paraview particularly increase the explained variance for mast cells and adipocytes.
In general, the significant gain in R2 can be interpreted as the following:
"We can better explain the expression of marker X when we consider additional views other than the intrinsic view."
To see the individual contributions of the views we can use:
```{r}
misty_results_com_path_act %>%
plot_view_contributions()
```
We see, that the intraview explains the most variance for nearly all cell types (as expected).
#### 2. What are the specific relations that can explain the cell-type composition?
We can individually show the importance of the markers from each viewpoint as predictors of the spot intrinsic cell-type composition to explain the contributions.
Let's look at the juxtaview:
```{r}
misty_results_com_path_act %>%
plot_interaction_heatmap("juxta.path.130", clean = TRUE)
```
We observe that TNFa is a significant predictor for adipocytes. We can compare their distributions:
```{r message=FALSE, warning=FALSE}
SpatialFeaturePlot(seurat, features = "tnfa", image.alpha = 0)
DefaultAssay(seurat) <- "c2l_props"
SpatialFeaturePlot(seurat, features = "Adipocytes", image.alpha = 0)
```
We observe similar distributions for both.
## Pathway activities on cell-type composition - Linear Model
The default model used by MISTy to model each view is the random forest. However, there are different models to choose from, like the faster and more interpretable linear model.
Another option we haven't used yet is `bypass.intra`. With this, we bypass training the baseline model that predicts the intraview with features from the intraview itself. We will still be able to see how the other views explain the intraview. We will use the same view composition as before:
```{r message=FALSE, warning=FALSE}
run_misty(com_path_act_views, "result/comp_path_act_linear", model.function = linear_model, bypass.intra = TRUE)
misty_results_com_path_act_linear <- collect_results("result/comp_path_act_linear")
```
### Downstream analysis
Let's check again the `gain.R2` and view contributions:
```{r}
misty_results_com_path_act_linear %>%
plot_improvement_stats("gain.R2") %>%
plot_view_contributions()
```
For the specific target-predictor interaction, we look again at the juxtaview:
```{r}
misty_results_com_path_act_linear %>%
plot_interaction_heatmap("juxta.path.130", clean = TRUE)
```
Visualize the activity of the JAK-STAT pathway and myeloid distribution:
```{r}
SpatialFeaturePlot(seurat, features = "jak.stat", image.alpha = 0)
DefaultAssay(seurat) <- "c2l_props"
SpatialFeaturePlot(seurat, features = "Myeloid", image.alpha = 0)
```
## Pathway activities and Transcriptionfactors on cell-type composition
In addition to the estimated pathway activities, we can also add a view to examine the relationship between cell-type composition and TF activity. First, we need to estimate the TF activity with decoupler. It is recommended to compute it with Python, as it is significantly faster:
```{r}
expression_df <- as.data.frame(t(expression))
```
```{python}
# In Python:
net = dc.get_collectri()
acts_tfs= dc.run_ulm(
mat = r.expression_df,
net = net,
verbose = True,
use_raw = False,
)
```
The object with the estimation contains two elements: The first are the estimates and their respective p-values can be found in the second element.
```{r}
est_TF <- py$acts_tfs
```
To speed up the following model training, we calculate the 1000 most variable genes expressed. We then extract the TF from the highly variable genes to create a MISTy view.
```{r}
# Highly variable genes
hvg <- FindVariableFeatures(expression, selection.method = "vst", nfeatures = 1000) %>%
filter(variable == TRUE)
hvg_expr <- expression[rownames(hvg), ]
# Extract TF from the highly variable genes
hvg_TF<- est_TF[[1]][, colnames(est_TF[[1]]) %in% rownames(hvg_expr)]
```
### Misty Views
We will combine the intraview from the cell-type composition and paraviews from the estimated pathway and TF activities:
```{r message=FALSE, warning=FALSE}
TF_view <- create_initial_view(hvg_TF) %>%
add_paraview(geometry, l = 200) # This may still take some time
# Combine Views
comp_TF_path_views <- comp_views %>% add_views(create_view("paraview.TF.200", TF_view[["paraview.200"]]$data, "para.TF.200")) %>%
add_views(create_view("paraview.path.200", path_act_views[["paraview.200"]]$data, "para.path.200"))
# Run Misty
run_misty(comp_TF_path_views, "result/comp_TF_path", model.function = linear_model, bypass.intra = TRUE)
misty_results_comp_TF_pathway <- collect_results("result/comp_TF_path")
```
### Downstream analysis
```{r}
misty_results_comp_TF_pathway %>%
plot_improvement_stats("gain.R2") %>%
plot_view_contributions()
```
When plotting the interaction heatmap, we can restrict the result by applying a `trim`, that only shows targets above a defined value for a chosen metric like `gain.R2`.
```{r}
misty_results_comp_TF_pathway %>%
plot_interaction_heatmap("para.TF.200",
clean = TRUE,
trim.measure = "gain.R2",
trim = 20)
```
The TF MYC is an important predictor of fibroblasts:
```{r}
DefaultAssay(seurat) <- "SCT"
SpatialFeaturePlot(seurat, features = "MYC", image.alpha = 0)
DefaultAssay(seurat) <- "c2l_props"
SpatialFeaturePlot(seurat, features = "Fibroblasts", image.alpha = 0)
```
Indeed, we can see a similar distribution.
## Ligand-Receptor
Finally, we want to learn about the spatial relationship of receptors and ligands on the tissue slide. We will access the consensus resource from [`LIANA`](https://liana-py.readthedocs.io/en/latest/) after downloading it from Github, pulling out the ligands and receptors from the before-determined highly variable genes:
```{r message=FALSE, warning=FALSE}
download.file("https://raw.githubusercontent.com/saezlab/liana-py/main/liana/resource/omni_resource.csv",
destfile = "omni_resource.csv", method = "curl")
# Ligand Receptor Resource
omni_resource <- read_csv("omni_resource.csv")%>%
filter(resource == "consensus")
# Get highly variable ligands
ligands <- omni_resource %>%
pull(source_genesymbol) %>%
unique()
hvg_lig <- hvg_expr[rownames(hvg_expr) %in% ligands,]
# Get highly variable receptors
receptors <- omni_resource %>%
pull(target_genesymbol) %>%
unique()
hvg_recep <- hvg_expr[rownames(hvg_expr) %in% receptors,]
# Clean names
rownames(hvg_lig) <- hvg_lig %>%
clean_names(parsing_option = 0) %>%
rownames(.)
rownames(hvg_recep) <- hvg_recep %>% clean_names(parsing_option = 0) %>%
rownames(.)
```
### Misty Views
We are going to create a combined view with the receptors in the intraview as targets and the ligands in the paraview as predictors:
```{r message=FALSE, warning=FALSE}
# Create views and combine them
receptor_view <- create_initial_view(as.data.frame(t(hvg_recep)))
ligand_view <- create_initial_view(as.data.frame(t(hvg_lig))) %>%
add_paraview(geometry, l = 200, family = "gaussian")
lig_recep_view <- receptor_view %>% add_views(create_view("paraview.ligand.200", ligand_view[["paraview.200"]]$data, "para.lig.200"))
run_misty(lig_recep_view, "results/lig_recep", bypass.intra = TRUE)
misty_results_lig_recep <- collect_results("results/lig_recep")
```
### Downstream analysis
Let's look at important interactions. An additional way to reduce the number of interactions shown in the heatmap is applying a `cutoff`, that introduces an importance threshold:
```{r}
misty_results_lig_recep %>%
plot_interaction_heatmap("para.lig.200", clean = TRUE, cutoff = 2, trim.measure ="gain.R2", trim = 10)
```
Remember that MISTy does not only infer interactions between ligands and their respective receptor, but rather all possible interactions between ligands and receptors. We can visualize one of the interactions with high importance:
```{r}
DefaultAssay(seurat) <- "SCT"
SpatialFeaturePlot(seurat, features = "CRLF1", image.alpha = 0)
SpatialFeaturePlot(seurat, features = "COMP", image.alpha = 0)
```
The plots show a co-occurrence of the ligand and receptor, although they are not an annotated receptor-ligand pair.
## See also
`browseVignettes("mistyR")`
## Session Info
Here is the output of `sessionInfo()` at the point when this document was compiled.
```{r}
sessionInfo()
```