-
Notifications
You must be signed in to change notification settings - Fork 4
/
dimensionality-reduction.Rmd
115 lines (88 loc) · 4.14 KB
/
dimensionality-reduction.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: "Dimensionality reduction"
author: "Timothy Keyes"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
description: >
Read this vignette to learn how visualize single-cell phenotypes in
low-dimensional space using dimensionality reduction algorithms (PCA, UMAP, tSNE)
vignette: >
%\VignetteIndexEntry{Dimensionality reduction}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, message = FALSE}
library(tidytof)
library(dplyr)
library(ggplot2)
```
A useful tool for visualizing the phenotypic relationships between single cells and clusters of cells is dimensionality reduction, a form of unsupervised machine learning used to represent high-dimensional datasets in a smaller number of dimensions.
`{tidytof}` includes several dimensionality reduction algorithms commonly used by biologists: Principal component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), and uniform manifold approximation and projection (UMAP). To apply these to a dataset, use `tof_reduce_dimensions()`.
## Dimensionality reduction with `tof_reduce_dimensions()`.
Here is an example call to `tof_reduce_dimensions()` in which we use tSNE to visualize data in `{tidytof}`'s built-in `phenograph_data` dataset.
```{r}
data(phenograph_data)
# perform the dimensionality reduction
phenograph_tsne <-
phenograph_data |>
tof_preprocess() |>
tof_reduce_dimensions(method = "tsne")
# select only the tsne embedding columns
phenograph_tsne |>
select(contains("tsne")) |>
head()
```
By default, `tof_reduce_dimensions` will add reduced-dimension feature embeddings to the input `tof_tbl` and return the augmented `tof_tbl` (that is, a `tof_tbl` with new columns for each embedding dimension) as its result. To return only the features embeddings themselves, set `augment` to `FALSE` (as in `tof_cluster`).
```{r}
phenograph_data |>
tof_preprocess() |>
tof_reduce_dimensions(method = "tsne", augment = FALSE)
```
Changing the `method` argument results in different low-dimensional embeddings:
```{r}
phenograph_data |>
tof_reduce_dimensions(method = "umap", augment = FALSE)
phenograph_data |>
tof_reduce_dimensions(method = "pca", augment = FALSE)
```
## Method specifications for `tof_reduce_*()` functions
`tof_reduce_dimensions()` provides a high-level API for three lower-level functions: `tof_reduce_pca()`, `tof_reduce_umap()`, and `tof_reduce_tsne()`. The help files for each of these functions provide details about the algorithm-specific method specifications associated with each of these dimensionality reduction approaches. For example, `tof_reduce_pca` takes the `num_comp` argument to determine how many principal components should be returned:
```{r}
# 2 principal components
phenograph_data |>
tof_reduce_pca(num_comp = 2)
```
```{r}
# 3 principal components
phenograph_data |>
tof_reduce_pca(num_comp = 3)
```
see `?tof_reduce_pca`, `?tof_reduce_umap`, and `?tof_reduce_tsne` for additional details.
## Visualization using `tof_plot_cells_embedding()`
Regardless of the method used, reduced-dimension feature embeddings can be visualized using `{ggplot2}` (or any graphics package). `{tidytof}` also provides some helper functions for easily generating dimensionality reduction plots from a `tof_tbl` or tibble with columns representing embedding dimensions:
```{r}
# plot the tsne embeddings using color to distinguish between clusters
phenograph_tsne |>
tof_plot_cells_embedding(
embedding_cols = contains(".tsne"),
color_col = phenograph_cluster
)
# plot the tsne embeddings using color to represent CD11b expression
phenograph_tsne |>
tof_plot_cells_embedding(
embedding_cols = contains(".tsne"),
color_col = cd11b
) +
ggplot2::scale_fill_viridis_c()
```
Such visualizations can be helpful in qualitatively describing the phenotypic differences between the clusters in a dataset. For example, in the example above, we can see that one of the clusters has high CD11b expression, whereas the others have lower CD11b expression.
# Session info
```{r}
sessionInfo()
```