Skip to content

Commit

Permalink
Update vignettes
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristophH committed Mar 20, 2019
1 parent a6ae237 commit 55ff33b
Show file tree
Hide file tree
Showing 20 changed files with 277 additions and 114 deletions.
6 changes: 4 additions & 2 deletions inst/doc/batch_correction.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@ knitr::opts_chunk$set(
old_theme <- theme_set(theme_classic(base_size=8))

## ------------------------------------------------------------------------
# some of the vst steps can use multiple cores, set number here
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

# load data and cluster identities, and calculate some cell attributes
cm <- readRDS(file = '~/Projects/data/BipolarCell2016_GSE81904_sparse_matrix.Rds')
Expand Down
6 changes: 4 additions & 2 deletions inst/doc/batch_correction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@ In this vignette we show how the regression model in the variance stabilizing tr
We first load the data and transform without using the batch information.

```{r}
# some of the vst steps can use multiple cores, set number here
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
# load data and cluster identities, and calculate some cell attributes
cm <- readRDS(file = '~/Projects/data/BipolarCell2016_GSE81904_sparse_matrix.Rds')
Expand Down
18 changes: 10 additions & 8 deletions inst/doc/batch_correction.html

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion inst/doc/correcting.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@ knitr::opts_chunk$set(
old_theme <- theme_set(theme_classic(base_size=8))

## ------------------------------------------------------------------------
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

cm <- readRDS('~/Projects/data/in-lineage_dropseq_CGE3_digitial_expression.Rds')

Expand Down
5 changes: 4 additions & 1 deletion inst/doc/correcting.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ We use data from a recent publication: [Mayer, Hafemeister, Bandler et al., Natu

First load the data and run variance stabilizing transformation.
```{r}
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
cm <- readRDS('~/Projects/data/in-lineage_dropseq_CGE3_digitial_expression.Rds')
Expand Down
11 changes: 7 additions & 4 deletions inst/doc/correcting.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

<meta name="author" content="Christoph Hafemeister" />

<meta name="date" content="2019-03-14" />
<meta name="date" content="2019-03-20" />

<title>Correcting UMI counts</title>

Expand Down Expand Up @@ -309,7 +309,7 @@

<h1 class="title toc-ignore">Correcting UMI counts</h1>
<h4 class="author"><em>Christoph Hafemeister</em></h4>
<h4 class="date"><em>2019-03-14</em></h4>
<h4 class="date"><em>2019-03-20</em></h4>

</div>

Expand All @@ -319,7 +319,10 @@ <h4 class="date"><em>2019-03-14</em></h4>
<h2>Load data and transform</h2>
<p>We use data from a recent publication: <a href="https://dx.doi.org/10.1038/nature25999">Mayer, Hafemeister, Bandler et al., Nature 2018</a> <a href="http://rdcu.be/JA5l">(free read-only version)</a>. We load a subset of the cells, namely one of the CGE E12.5 dropseq samples with contaminating cell populations removed. These cells come from a developing continuum and provide a nice example for de-noising and count correction.</p>
<p>First load the data and run variance stabilizing transformation.</p>
<pre class="r"><code>options(mc.cores = 4)
<pre class="r"><code># some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

cm &lt;- readRDS('~/Projects/data/in-lineage_dropseq_CGE3_digitial_expression.Rds')

Expand All @@ -331,7 +334,7 @@ <h2>Load data and transform</h2>
#&gt; Get Negative Binomial regression parameters per gene
#&gt; Using 2000 genes, 4686 cells
#&gt; Second step: Pearson residuals using fitted parameters for 19249 genes
#&gt; Wall clock passed: Time difference of 1.003154 mins</code></pre>
#&gt; Wall clock passed: Time difference of 52.24091 secs</code></pre>
<p>To place the cells on a maturation trajectory, we perform PCA and fit a principal curve to the first two dimensions, similar to the approach used in the original publication.</p>
<pre class="r"><code>pca &lt;- irlba::prcomp_irlba(t(vst_out$y), n = 2)

Expand Down
6 changes: 5 additions & 1 deletion inst/doc/differential_expression.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ class(pbmc_data)
dim(pbmc_data)

## ---- fig.width=4, fig.height=2.5----------------------------------------
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

set.seed(43)
vst_out <- sctransform::vst(pbmc_data, latent_var = c('log_umi_per_gene'), return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE)

Expand Down
6 changes: 5 additions & 1 deletion inst/doc/differential_expression.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,11 @@ dim(pbmc_data)
`pbmc_data` is a sparse matrix of UMI counts (32,738 genes as rows and 2,638 cells as columns). Perform the variance stabilizing transformation:

```{r, fig.width=4, fig.height=2.5}
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
set.seed(43)
vst_out <- sctransform::vst(pbmc_data, latent_var = c('log_umi_per_gene'), return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE)
```
Expand Down
12 changes: 8 additions & 4 deletions inst/doc/differential_expression.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

<meta name="author" content="Christoph Hafemeister" />

<meta name="date" content="2019-03-14" />
<meta name="date" content="2019-03-20" />

<title>Differential Expression</title>

Expand Down Expand Up @@ -309,7 +309,7 @@

<h1 class="title toc-ignore">Differential Expression</h1>
<h4 class="author"><em>Christoph Hafemeister</em></h4>
<h4 class="date"><em>2019-03-14</em></h4>
<h4 class="date"><em>2019-03-20</em></h4>

</div>

Expand All @@ -329,7 +329,11 @@ <h3>Load some data</h3>
dim(pbmc_data)
#&gt; [1] 32738 2638</code></pre>
<p><code>pbmc_data</code> is a sparse matrix of UMI counts (32,738 genes as rows and 2,638 cells as columns). Perform the variance stabilizing transformation:</p>
<pre class="r"><code>options(mc.cores = 4)
<pre class="r"><code># some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

set.seed(43)
vst_out &lt;- sctransform::vst(pbmc_data, latent_var = c('log_umi_per_gene'), return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE)
#&gt; Calculating cell attributes for input UMI matrix
Expand All @@ -340,7 +344,7 @@ <h3>Load some data</h3>
#&gt; Found 14 outliers - those will be ignored in fitting/regularization step
#&gt; Second step: Pearson residuals using fitted parameters for 12519 genes
#&gt; Calculating gene attributes
#&gt; Wall clock passed: Time difference of 33.44188 secs</code></pre>
#&gt; Wall clock passed: Time difference of 28.50119 secs</code></pre>
<p>Perform differential expression test between the two monocyte clusters.</p>
<pre class="r"><code>res1 &lt;- sctransform::compare_expression(x = vst_out, umi = pbmc_data, group = pbmc_clusters,
val1 = 'CD14+ Monocytes',
Expand Down
37 changes: 29 additions & 8 deletions inst/doc/seurat.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,37 +8,58 @@ knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
digits = 2,
fig.width=8, fig.height=5, dpi=100, out.width = '70%'
fig.width=13, fig.height=6.5, dpi=100, out.width = '95%'
)
library('Seurat')
#old_theme <- theme_set(theme_classic(base_size=8))
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

## ----eval=FALSE----------------------------------------------------------
# devtools::install_github(repo = 'ChristophH/sctransform', ref = 'develop')
# devtools::install_github(repo = 'satijalab/seurat', ref = 'release/3.0')
# library(Seurat)
# library(sctransform)
# library(ggplot2)

## ----load_data, warning=FALSE, message=FALSE, cache = T------------------
pbmc_data <- Read10X(data.dir = "~/Downloads/pbmc3k_filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc_data)

## ----logtrans, warning=FALSE, message=FALSE, cache = T-------------------
pbmc_logtransform <- pbmc
pbmc_logtransform <- NormalizeData(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- FindVariableFeatures(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- ScaleData(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- RunPCA(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- RunUMAP(pbmc_logtransform,dims = 1:20, verbose = FALSE)

## ----apply_sct, warning=FALSE, message=FALSE, cache = T------------------
# Note that this single command replaces NormalizeData, ScaleData, and FindVariableFeatures.
# Transformed data will be available in the SCT assay, which is set as the default after running sctransform
pbmc <- SCTransform(object = pbmc, verbose = FALSE)

## ----pca, fig.width=5, fig.height=5, cache = T---------------------------
## ----pca, cache = T------------------------------------------------------
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)
pbmc <- FindNeighbors(object = pbmc, dims = 1:20, verbose = FALSE)
pbmc <- FindClusters(object = pbmc, verbose = FALSE)
DimPlot(object = pbmc, label = TRUE) + NoLegend()

## ----fplot, fig.width = 10, fig.height=10, cache = F---------------------
# These are now standard steps in the Seurat workflow for visualization and clustering
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4"), pt.size = 0.3)
FeaturePlot(object = pbmc, features = c("S100A4","CCR7","CD4","ISG15"), pt.size = 0.3)
FeaturePlot(object = pbmc, features = c("TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.3)
## ----viz, cache = T------------------------------------------------------
pbmc_logtransform$clusterID <- Idents(pbmc)
Idents(pbmc_logtransform) <- 'clusterID'
plot1 <- DimPlot(object = pbmc, label = TRUE) + NoLegend() + ggtitle('sctransform')
plot2 <- DimPlot(object = pbmc_logtransform, label = TRUE)
plot2 <- plot2 + NoLegend() + ggtitle('Log-normalization')
CombinePlots(list(plot1,plot2))

## ----fplot, cache = T----------------------------------------------------
VlnPlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4","S100A4","CCR7","ISG15","CD3D"), pt.size = 0.2,ncol = 4)

## ----fplot2, cache = T---------------------------------------------------
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4","S100A4","CCR7"), pt.size = 0.2,ncol = 3)

## ----fplot3, cache = T---------------------------------------------------
FeaturePlot(object = pbmc, features = c("CD3D","ISG15","TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.2,ncol = 3)

74 changes: 59 additions & 15 deletions inst/doc/seurat.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,48 +21,92 @@ knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
digits = 2,
fig.width=8, fig.height=5, dpi=100, out.width = '70%'
fig.width=13, fig.height=6.5, dpi=100, out.width = '95%'
)
library('Seurat')
#old_theme <- theme_set(theme_classic(base_size=8))
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
```

This vignette shows how to use the sctransform wrapper in Seurat.
Install sctransform and Seurat v3

Install sctransform and Seurat v3.

```{r eval=FALSE}
devtools::install_github(repo = 'ChristophH/sctransform', ref = 'develop')
devtools::install_github(repo = 'satijalab/seurat', ref = 'release/3.0')
library(Seurat)
library(sctransform)
library(ggplot2)
```

Load data and create Seurat object

```{r load_data, warning=FALSE, message=FALSE, cache = T}
pbmc_data <- Read10X(data.dir = "~/Downloads/pbmc3k_filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc_data)
```
Apply sctransform normalization

For reference, we first apply the standard Seurat workflow, with log-normalization

```{r logtrans, warning=FALSE, message=FALSE, cache = T}
pbmc_logtransform <- pbmc
pbmc_logtransform <- NormalizeData(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- FindVariableFeatures(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- ScaleData(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- RunPCA(pbmc_logtransform, verbose = FALSE)
pbmc_logtransform <- RunUMAP(pbmc_logtransform,dims = 1:20, verbose = FALSE)
```

For comparison, we now apply sctransform normalization

```{r apply_sct, warning=FALSE, message=FALSE, cache = T}
# Note that this single command replaces NormalizeData, ScaleData, and FindVariableFeatures.
# Transformed data will be available in the SCT assay, which is set as the default after running sctransform
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
```

Perform dimensionality reduction by PCA and UMAP embedding
```{r pca, fig.width=5, fig.height=5, cache = T}

```{r pca, cache = T}
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)
pbmc <- FindNeighbors(object = pbmc, dims = 1:20, verbose = FALSE)
pbmc <- FindClusters(object = pbmc, verbose = FALSE)
DimPlot(object = pbmc, label = TRUE) + NoLegend()
```
Users can individually annotate clusters based on canonical markers. However, the sctransform normalization reveals sharper biological distinctions compared to the [standard Seurat workflow](https://satijalab.org/seurat/pbmc3k_tutorial.html), in a few ways:
* Clear separation of three CD8 T cell populations (naive, memory, effector), based on CD8A, GZMK, CCL5, GZMK expression
* Clear separation of three CD4 T cell populations (naive, memory, IFN-activated) based on S100A4, CCR7, IL32, and ISG15
* Additional developmental sub-structure in B cell cluster, based on TCL1A, FCER2
* Additional separation of NK cells into CD56dim vs. bright clusters, based on XCL1 and FCGR3A
```{r fplot, fig.width = 10, fig.height=10, cache = F}
# These are now standard steps in the Seurat workflow for visualization and clustering
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4"), pt.size = 0.3)
FeaturePlot(object = pbmc, features = c("S100A4","CCR7","CD4","ISG15"), pt.size = 0.3)
FeaturePlot(object = pbmc, features = c("TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.3)

Visualize the clustering results on the sctransform and log-normalized embeddings.
```{r viz, cache = T}
pbmc_logtransform$clusterID <- Idents(pbmc)
Idents(pbmc_logtransform) <- 'clusterID'
plot1 <- DimPlot(object = pbmc, label = TRUE) + NoLegend() + ggtitle('sctransform')
plot2 <- DimPlot(object = pbmc_logtransform, label = TRUE)
plot2 <- plot2 + NoLegend() + ggtitle('Log-normalization')
CombinePlots(list(plot1,plot2))
```

Users can individually annotate clusters based on canonical markers. However, the sctransform normalization reveals sharper biological distinctions compared to the log-normalized analysis. For example, note how clusters 0, 1, 9, and 11 (all distinct clusters of T cells), are blended together in log-normalized analyses. The sctransform analysis reveals:


* Clear separation of three CD8 T cell clusters (naive, memory, effector), based on CD8A, GZMK, CCL5, GZMK expression
* Clear separation of three CD4 T cell clusters (naive, memory, IFN-activated) based on S100A4, CCR7, IL32, and ISG15
* Additional developmental sub-structure in B cell cluster, based on TCL1A, FCER2
* Additional sub-structure within NK cell cluster (CD56dim vs. bright), based on XCL1 and FCGR3A

Visualize canonical marker genes as violin plots.

```{r fplot, cache = T}
VlnPlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4","S100A4","CCR7","ISG15","CD3D"), pt.size = 0.2,ncol = 4)
```

Visualize canonical marker genes on the sctransform embedding.

```{r fplot2, cache = T}
FeaturePlot(object = pbmc, features = c("CD8A","GZMK","CCL5","S100A4","S100A4","CCR7"), pt.size = 0.2,ncol = 3)
```

```{r fplot3, cache = T}
FeaturePlot(object = pbmc, features = c("CD3D","ISG15","TCL1A","FCER2","XCL1","FCGR3A"), pt.size = 0.2,ncol = 3)
```
54 changes: 35 additions & 19 deletions inst/doc/seurat.html

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion inst/doc/variance_stabilizing_transformation.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,10 @@ ggplot(cell_attr, aes(n_umi, n_gene)) +
geom_density_2d(size = 0.3)

## ---- fig.width=4, fig.height=2.5----------------------------------------
options(mc.cores = 4)
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)

set.seed(44)
vst_out <- sctransform::vst(pbmc_data, latent_var = c('log_umi'), return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE)
sctransform::plot_model_pars(vst_out)
Expand Down
5 changes: 4 additions & 1 deletion inst/doc/variance_stabilizing_transformation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,10 @@ Here the other variables can be used to model the differences in sequencing dept
The `vst` function estimates model parameters and performs the variance stabilizing transformation. Here we use the log10 of the total UMI counts of a cell as variable for sequencing depth for each cell. After data transformation we plot the model parameters as a function of gene mean (geometric mean).

```{r, fig.width=4, fig.height=2.5}
options(mc.cores = 4)
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
set.seed(44)
vst_out <- sctransform::vst(pbmc_data, latent_var = c('log_umi'), return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE)
sctransform::plot_model_pars(vst_out)
Expand Down
57 changes: 30 additions & 27 deletions inst/doc/variance_stabilizing_transformation.html

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions vignettes/batch_correction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@ In this vignette we show how the regression model in the variance stabilizing tr
We first load the data and transform without using the batch information.

```{r}
# some of the vst steps can use multiple cores, set number here
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
# load data and cluster identities, and calculate some cell attributes
cm <- readRDS(file = '~/Projects/data/BipolarCell2016_GSE81904_sparse_matrix.Rds')
Expand Down
5 changes: 4 additions & 1 deletion vignettes/correcting.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ We use data from a recent publication: [Mayer, Hafemeister, Bandler et al., Natu

First load the data and run variance stabilizing transformation.
```{r}
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
cm <- readRDS('~/Projects/data/in-lineage_dropseq_CGE3_digitial_expression.Rds')
Expand Down
6 changes: 5 additions & 1 deletion vignettes/differential_expression.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,11 @@ dim(pbmc_data)
`pbmc_data` is a sparse matrix of UMI counts (32,738 genes as rows and 2,638 cells as columns). Perform the variance stabilizing transformation:

```{r, fig.width=4, fig.height=2.5}
options(mc.cores = 4)
# some of the vst steps can use multiple cores
# We use the Future API for parallel processing; set parameters here
future::plan(strategy = 'multicore', workers = 4)
options(future.globals.maxSize = 10 * 1024 ^ 3)
set.seed(43)
vst_out <- sctransform::vst(pbmc_data, latent_var = c('log_umi_per_gene'), return_gene_attr = TRUE, return_cell_attr = TRUE, show_progress = FALSE)
```
Expand Down

0 comments on commit 55ff33b

Please sign in to comment.