/
quickstart.Rmd
177 lines (127 loc) · 5.35 KB
/
quickstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
title: "Quick start to Harmony"
author: "Korsunsky et al.: Fast, sensitive, and accurate integration of single
cell data with Harmony"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Quick start to Harmony}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo=FALSE}
library(ggplot2)
colors_use <- c(`jurkat` = '#810F7C', `t293` = '#D09E2D',`half` = '#006D2C')
do_scatter <- function(umap_use, meta_data, label_name, no_guides = TRUE,
do_labels = TRUE, nice_names,
palette_use = colors_use,
pt_size = 4, point_size = .5, base_size = 12,
do_points = TRUE, do_density = FALSE, h = 6, w = 8) {
umap_use <- umap_use[, 1:2]
colnames(umap_use) <- c('X1', 'X2')
plt_df <- umap_use %>% data.frame() %>%
cbind(meta_data) %>%
dplyr::sample_frac(1L)
plt_df$given_name <- plt_df[[label_name]]
if (!missing(nice_names)) {
plt_df %<>%
dplyr::inner_join(nice_names, by = "given_name") %>%
subset(nice_name != "" & !is.na(nice_name))
plt_df[[label_name]] <- plt_df$nice_name
}
plt <- plt_df %>%
ggplot2::ggplot(aes_string("X1", "X2", col = label_name, fill = label_name)) +
theme_test(base_size = base_size) +
theme(panel.background = element_rect(fill = NA, color = "black")) +
guides(color = guide_legend(override.aes = list(stroke = 1, alpha = 1,
shape = 16, size = 4)),
alpha = FALSE) +
scale_color_manual(values = palette_use) +
scale_fill_manual(values = palette_use) +
theme(plot.title = element_text(hjust = .5)) +
labs(x = "PC 1", y = "PC 2")
if (do_points)
plt <- plt + geom_point(shape = '.')
if (do_density)
plt <- plt + geom_density_2d()
if (no_guides)
plt <- plt + guides(col = FALSE, fill = FALSE, alpha = FALSE)
if (do_labels) {
data_labels <- plt_df %>%
dplyr::group_by_(label_name) %>%
dplyr::summarise(X1 = mean(X1), X2 = mean(X2)) %>%
dplyr::ungroup()
plt <- plt + geom_label(data = data_labels, label.size = NA,
aes_string(label = label_name),
color = "white", size = pt_size, alpha = 1,
segment.size = 0) +
guides(col = FALSE, fill = FALSE)
}
return(plt)
}
```
# Introduction
Harmony is an algorithm for performing integration of single cell genomics
datasets. Please check out our latest
[manuscript on Nature Methods](https://www.nature.com/articles/s41592-019-0619-0).
![](main.jpg){width=100%}
# Installation
Install Harmony from CRAN with standard commands.
```{r eval=FALSE}
install.packages('harmony')
```
Once Harmony is installed, load it up!
```{r}
library(harmony)
```
# Integrating cell line datasets from 10X
The example below follows Figure 2 in the manuscript.
We downloaded 3 cell line datasets from the 10X website. The first two (jurkat
and 293t) come from pure cell lines while the *half* dataset is a 50:50
mixture of Jurkat and HEK293T cells. We inferred cell type with the canonical
marker XIST, since the two cell lines come from 1 male and 1 female donor.
* support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/jurkat
* support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/293t
* support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/jurkat:293t_50:50
We library normalized the cells, log transformed the counts, and scaled the
genes. Then we performed PCA and kept the top 20 PCs. The PCA embeddings and
meta data are available as part of this package.
```{r}
data(cell_lines)
V <- cell_lines$scaled_pcs
meta_data <- cell_lines$meta_data
```
Initially, the cells cluster by both dataset (left) and cell type (right).
```{r, fig.width=5, fig.height=3, fig.align="center"}
p1 <- do_scatter(V, meta_data, 'dataset') +
labs(title = 'Colored by dataset')
p2 <- do_scatter(V, meta_data, 'cell_type') +
labs(title = 'Colored by cell type')
cowplot::plot_grid(p1, p2)
```
Let's run Harmony to remove the influence of dataset-of-origin from the cell
embeddings.
```{r}
harmony_embeddings <- harmony::RunHarmony(
V, meta_data, 'dataset', verbose=FALSE
)
```
After Harmony, the datasets are now mixed (left) and the cell types are still
separate (right).
```{r, fig.width=5, fig.height=3, fig.align="center"}
p1 <- do_scatter(harmony_embeddings, meta_data, 'dataset') +
labs(title = 'Colored by dataset')
p2 <- do_scatter(harmony_embeddings, meta_data, 'cell_type') +
labs(title = 'Colored by cell type')
cowplot::plot_grid(p1, p2, nrow = 1)
```
# Next Steps
## Interfacing to software packages
You can also run Harmony as part of an established pipeline in several packages, such as Seurat and SingleCellExperiment. For these vignettes, please [visit our website](https://portals.broadinstitute.org/harmony/).
## Detailed breakdown of the Harmony algorithm
For more details on how each part of Harmony works, consult our more detailed
[vignette](https://portals.broadinstitute.org/harmony/advanced.html)
"Detailed Walkthrough of Harmony Algorithm".
# Session Info
```{r}
sessionInfo()
```