Skip to content

Commit

Permalink
Update 10x scRNA preprocessing tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
pavanvidem committed May 18, 2023
1 parent 62154c9 commit 5d8104e
Show file tree
Hide file tree
Showing 4 changed files with 493 additions and 622 deletions.
19 changes: 9 additions & 10 deletions topics/single-cell/tutorials/scrna-preprocessing-tenx/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ contributors:
- hrhotz
- blankenberg
- nomadscientist
- pavanvidem

gitter: Galaxy-Training-Network/galaxy-single-cell

Expand Down Expand Up @@ -426,7 +427,7 @@ To get a high quality count matrix we must apply the **DropletUtils** tool, whic
> - *"Expected Number of Cells"*: `3000`
> - *"Upper Quantile"*: `0.99`
> - *"Lower Proportion"*: `0.1`
> - *"Format for output matrices"*: `Tabular`
> - *"Format for output matrices"*: `Bundled (barcodes.tsv, genes.tsv, matrix.mtx)`
>
> > <comment-title>Default Parameter</comment-title>
> >
Expand Down Expand Up @@ -487,22 +488,20 @@ A useful diagnostic for droplet-based data is the barcode rank plot, which shows

![knee]({% link topics/single-cell/images/scrna-pre-processing/tenx_knee.png %} "Barcode Ranks: The separating thresholds of high and low quality cells")

The knee and inflection points on the curve mark the transition between two components of the total count distribution. This is assumed to represent the difference between empty droplets with little RNA and cell-containing droplets with much more RNA, and gives us a rough idea of how many cells to expect in our sample.
The knee and inflection points on the curve mark the transition between two components of the total count distribution. This is assumed to represent the difference between cells with high RNA content and empty droplets with little RNA, and gives us a rough idea of how many cells to expect in our sample.

> ### Question {% icon question %} Questions
>
> 1. How many cells do we expect to see in our sample based on the above plot?
> 1. How many high quality cells do we expect to see in our sample bed on the above plot?
> 1. What is the minimum number of UMIs that we can expect to see for high quality cells?
> 1. How many presumably real cells can we expect to see in our sample based on the above plot?
> 1. What is the number of cells that we can expect to see with a cut-off detected at inflection line?
>
> > <solution-title></solution-title>
> > 1. We see the blue knee line cross the threshold of barcodes at just below than the 10000 Rank on the horizontal log scale, which is shown in the expanded view of our data as `"knee = 5300"`. This is in good accordance with the `5200` cells shown in the STARsolo log output previously.
> > 1. This threshold is given by the inflection line, which is given at `"inflection = 260"`, so 260 cells.
> > 1. The vertical drop in the chart occurs at a log X-axis position just above `1e+02`, so we can estimate ~ 200 UMIs minimum per cell.
> > 1. We see the blue knee line cross the threshold of barcodes at just below than the 10000 Rank on the horizontal log scale, which is shown in the expanded view of our data as `"knee = 4861"`. This line intersects with the ranked cells near 100 on x-axis. We can expect 100-200 cells with high RNA content.
> > 1. This threshold is given by the inflection line, which is given at `"inflection = 260"`. The vertical drop in the ranked cells at the inflection line is between 100 and 10000 on the x-axis. The axis is log scaled it is close to 100. Hence, we can expect between 200 and 400 cells.
> {: .solution}
{: .question}

On large 10x datasets we can use these thresholds as metrics to utilise in our own custom filtering, which is once again provided by the **DropletUtils** tool.
On large 10x datasets we can use these thresholds as metrics to utilise in our own custom filtering, which is once again provided by the **DropletUtils** tool. In this case, we will use a bit less stringent threshold of 200 (instead of 260 at inflection) and use a FDR threshold of 0.01 so that our detected cells contain only 1% of false positives.

> <hands-on-title>Custom Filtering</hands-on-title>
>
Expand All @@ -515,7 +514,7 @@ On large 10x datasets we can use these thresholds as metrics to utilise in our o
> - *"Method"*: `EmptyDrops`
> - *"Lower-bound Threshold"*: `200`
> - *"FDR Threshold"*: `0.01`
> - *"Format for output matrices"*: `Tabular`
> - *"Format for output matrices"*: `Bundled (barcodes.tsv, genes.tsv, matrix.mtx)`
>
{: .hands_on}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
- doc: Test outline for scRNA-seq-Preprocessing-TenX
job:
Homo_sapiens.GRCh37.75.gtf:
class: File
location: https://zenodo.org/api/files/ef2cd156-eff7-43b1-91cd-e86f91d5b315/Homo_sapiens.GRCh37.75.gtf
filetype: gtf
subset_pbmc_1k_v3_S1_L001_R1_001.fastq.gz:
class: File
location: https://zenodo.org/api/files/ef2cd156-eff7-43b1-91cd-e86f91d5b315/subset_pbmc_1k_v3_S1_L001_R1_001.fastq.gz
filetype: fastqsanger.gz
subset_pbmc_1k_v3_S1_L001_R2_001.fastq.gz:
class: File
location: https://zenodo.org/api/files/ef2cd156-eff7-43b1-91cd-e86f91d5b315/subset_pbmc_1k_v3_S1_L001_R2_001.fastq.gz
filetype: fastqsanger.gz
subset_pbmc_1k_v3_S1_L002_R1_001.fastq.gz:
class: File
location: https://zenodo.org/api/files/ef2cd156-eff7-43b1-91cd-e86f91d5b315/subset_pbmc_1k_v3_S1_L002_R1_001.fastq.gz
filetype: fastqsanger.gz
subset_pbmc_1k_v3_S1_L002_R2_001.fastq.gz:
class: File
location: https://zenodo.org/api/files/ef2cd156-eff7-43b1-91cd-e86f91d5b315/subset_pbmc_1k_v3_S1_L002_R2_001.fastq.gz
filetype: fastqsanger.gz
3M-february-2018.txt.gz:
class: File
location: https://zenodo.org/api/files/ef2cd156-eff7-43b1-91cd-e86f91d5b315/3M-february-2018.txt.gz
filetype: txt
outputs: {}
Loading

0 comments on commit 5d8104e

Please sign in to comment.