Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video transcripts to plates, batches, and barcodes slide deck #2493

Merged
merged 2 commits into from Apr 13, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -6,6 +6,7 @@
subtopic: single-cell
priority: 4

video: yes
zenodo_link: ""
tags:
- single-cell
Expand Down Expand Up @@ -51,12 +52,22 @@

.left[Plates are *N x M* arrays of wells that cells are sorted, to then be individually amplified and sequenced.]

???
- Sorting plates are 2 dimensional arrays of wells that individual cells are placed into.
- Each cell in each well is tagged with a unique barcode and then primed to then be amplified and sequenced
- The method in which this is performed can yield significant pros and cons as we will see.

---

### Sorting Plates

.image-80[![slide6](../../images/wabexampleplates.png)]

???
- Here we see 3 different sorting strategies with different advantages for same treatment sample.
- In the top image: We see all wells are filled, to maximise the number of cells on a plate
- In the middle image: We see only the inner wells are filled, to maintain a protective border during library preparation
- In the bottom image: We see a striping pattern, which mitigates contamination of material from adjoining wells

---

Expand All @@ -66,6 +77,11 @@

.center[What is the problem with this plate setup?]

???
- However, if we consider different treatment samples, then we must consider different challenges.
- What is the advantage of the top example, i.e. Several different treatments on one plate?
- What is the advantage of the bottom example, i.e. One treatment on one plate?

---

### Setting up Plates
Expand All @@ -74,13 +90,24 @@

.center[Batch effect (plate vs plate) cannot be separated from treatment effect in either scenario.]

???
- Trick question!
- In both examples, the treatment effect cannot be separated from any technical effect from the plate itself.
- For example, the biological effect of the yellow cells might just be the technical confounding effect from Batch 1.
- What can we do instead to reduce the effect of these technical confounding effects across treatments?

---
### Setting up Plates

.image-50[![slide7](../../images/wabbalancedbatches.png)]

.left[Either of these are better set-ups. Mixing columns is good, but not required. Ultimately, batch effect can now be separated from variable effect.]

???
- The answer is that we can balance the treatments across our plates better.
- By ensuring the same treatment occurs in comparable amounts in different plates, we can reduce the effect that each plate would have on the treatment.


---

### Setting up Plates
Expand All @@ -90,8 +117,9 @@
.center[Can't mix samples on plates? Separate replicates evenly and process together.]

???
- If putting multiple treatments on the same plate is not an option, then having enough separated replicates will also allow for batch correction.
- Here, you can assess the variation between replicates, and thus batches, as well as between treatments.

If putting multiple treatments on the same plate is not an option, then having enough separated replicates will also allow for batch correction. Here, you can assess the variation between replicates (and thus batches) as well as between treatments.
---

### What about sequencing lanes?
Expand All @@ -102,6 +130,11 @@

.center[This works well, but what if you have too many samples for one lane?]

???
- So now it's time to sequence our samples! How do we combine samples into sequencing lanes?
- In many cases, it is enough to sequence 1 or 2 plates together in a single sequencing lane.
- But what if we wish to sequence 8 plates?

---

### What about sequencing lanes?
Expand All @@ -112,21 +145,32 @@

.center[Does this look ok?]

???
- In the example show, Treatment A would go to the first sequencing lane.
- Treatment B would go to the second sequencing lane.
- Is it okay to sequence different treatments across different sequencing lanes?

--


.center[No! You've turned each treatment (A & B) into a batch!]

???
- No, because you've turned Treatment A and Treatment B into separate batches.
- This gives us the same unbalanced design we saw with the sorting plates.

---

### What about sequencing lanes?



.image-75[![slide7](../../images/wablanesgood.png)]

.center[This is the way to balance your batches at the lane-level.]

???
- Instead we should balance the availability of each treatment across different sequencing lanes
- This way, each treatment is sequenced at the same time as another treatment, reducing the technical effect from the sequencing on the treatment.

---

### Distinguishing cells in a plate
Expand All @@ -137,6 +181,11 @@
* Cells are selected from a plate by their *barcodes*
* Barcodes must be unique
+ e.g. 96 wells in a plate, need 96 barcodes to sequence them together

???
- How do we distinguish each cell in each well?
- Do we remember the physical location of that cell on that well, or do we tag it somehow?
- The answer is that each cell is given a unique barcode that distinguishes from other cells which have different cell barcodes.

---

Expand All @@ -148,7 +197,9 @@
* Transcripts with different cell barcodes originate from different cells

???

- What exactly are cell barcodes?
- These are usually short nucleotide barcodes that are added to all transcripts of a specific cell
- The idea is that transcripts with different cell barcodes originate from different cells.

---

Expand All @@ -159,6 +210,9 @@
* Many different cell barcodes are used across many different cells
* Each well in a plate contains a cell, indexed by its cell barcode

???
- Within a plate, many different barcodes are used, and each cell is indexed by the cell barcode assigned to that well.

---

### Questions about Cell Barcodes
Expand All @@ -173,11 +227,24 @@
1. How many cell barcodes are needed if you combine 10 plates into a single sequencing lane?

1. What would be the minimum length of the barcodes for each of the previous questions?

???
- Consider a 12 by 8 plate with 96 wells.
- How many cell barcodes are needed if you sequence a single plate?
- How many cell barcodes are needed if you sequence 10 plates into a single sequencing lane?
- What would be the minimum length of the barcodes for each of the previous questions?

--

.footnote[
1. *96 unique barcodes per lane*
1. *96 x 10 = 960 unique barcodes per plate*
]

???
- For one plate you would just need 96 unique barcodes
- For 10 plates you would need 960 unique barcodes

---

### Questions about Cell Barcodes
Expand All @@ -197,6 +264,12 @@
| $$4^4 = 256$$ | <small>Yes, 4 bases is enough to cover 96 barcodes (and more!) |

]

???
- To answer the third question, we need to make some assumptions about the barcodes.
- If we assume a barcode is made up of a chain of 4 possible nucleotides, then we can solve the problem as a power of 4.
- For a single lane with only 96 barcodes, we require our barcodes to be of at least 4 base pairs in length.

--
.pull-right[

Expand All @@ -211,15 +284,14 @@

]

???
- For 10 lanes with 960 barcodes, we require our barcodes to be of at least 5 base pairs in length.
- However these calculations also make one serious assumption about the barcodes.

---

### Barcode Safeguarding

<!-- TODO:
* Find a way to get <style>.MathJax_Display { display: inline; }</style> working so that MathJax can render inline
* The MathJax selector seems to apply directly to the element, which is impossible to override!
-->

.pull-left[
* Is 5 nucleotides really enough to capture 960 cells?

Expand All @@ -239,9 +311,12 @@


???
If we assume that every barcode is separated from every other barcode by 1 bp, then the answer is 'yes'
- For example, we assume that all our barcodes are at minimum separated by a single base pair.
- This is known as an edit distance of 1.
- If we assume that every barcode is separated from every other barcode by 1 base pair, then 5 nucleotides is indeed enough to delineate 960 cells.
- However if there was even just one small sequencing error of 1 base pair on any barcode, it would immediately change that erronous barcode into another real barcode.
- That is to say, we have no way of detecting sequencing errors if we use an edit distance of 1.

That's right, sequencing errors can mislabel a read to a completely different cell than from where the transcript originated from.

---

Expand All @@ -266,9 +341,19 @@

]

???
- Let us try to rectify this by increasing the edit distance to 2.
- In the example barcodes shown we see barcodes of length 5, separated by 2 base pairs.
- How many unique cell barcodes can we make given these restrictions?

--

$$ 4^{5-1} = 512 $$


???
- The answer is half the amount that we need.

---

### Edit Distance : General Principle
Expand All @@ -288,6 +373,13 @@
.center[`AAA CCC GGG TTT`]
.center[<small>4 barcodes</small>]
]

???
- Let us explore this more explicitly.
- Here we will be using barcodes only of length 3, and we will explore different edit distances of 1, 2, and 3.
- These yield 64, 16, and 4 barcodes respectively.


--
.pull-right[
Number of Barcodes :
Expand All @@ -299,6 +391,9 @@

]

???
- This can be summarized by the formula given, where the length of the barcodes, minus one minus the edit distance, gives the number of available barcodes.

---

### How many available barcodes are there?
Expand All @@ -315,6 +410,12 @@
* Must take sequencing errors into account

]

???
- To summarize, barcodes are typically limited to 4 nucleotides, and the number of available barcodes depends on the length and edit distance.
- This is to ensure that sequencing errors are taken into account.


--
.pull-right[
* Availability is balance between *barcode size* vs. *sequencing errors*
Expand All @@ -328,11 +429,10 @@
]

???
* The availability of barcodes and the number of cells they can label is a design choice, where the technician must balance two opposing forces: barcode size vs sequencing errors.

* For the first, this means that for a barcode $$N$$ bases long, there will $$4^N$$ barcodes available. Typically barcodes tend to span 4-10 bases ($$4^4 = 256$$ to $$4^{10} = 1048576$$), since longer barcodes tend to be more subjectable to sequencing errors.

* The true number of barcodes used is actually smaller than $$4^N$$ due to the measures used to space barcodes apart from one another to reduce sequencing errors.
- The availability of barcodes and the number of cells they can label is a design choice, where the technician must balance two opposing forces: barcode size vs sequencing errors.
- For the size, this means that for a barcode N bases long, there will 4 to the N barcodes available.
- Typically barcodes tend to span 4 to 10 bases, since longer barcodes tend to be more subjectable to sequencing errors.
- The true number of barcodes used is smaller than 4 to the N, due to the countermeasures used to space barcodes apart from one another, in order to reduce sequencing errors.

---

Expand All @@ -347,6 +447,9 @@
| Must increase edit distance significantly | A smaller edit distance is more acceptable |
| Can accommodate large edit distances | *Cannot* accommodate large edit distances |

???
- To weigh the pros and cons of longer and shorter barcodes, we need to take into account the likelihood of sequencing errors for a given size, and the resulting barcode address space given by the edit distance.

---

### Cell Barcodes: Summary
Expand All @@ -362,6 +465,14 @@
* Barcode use is *limited* by length and read depth

]

???
- To summarize cell barcodes, we should note the following
- A single barcode sequence indexes a single cell
- Every transcript in a specific cell has the same cell barcode
- Barcodes are designed for smaller plate-based protocols, while for split-pool and similar techniques they are randomised
- Barcode use is limited by length and read depth

--
<hline>
.pull-right[#### Need to balance:
Expand All @@ -376,9 +487,9 @@
</small>
]


???
The number of reads you want per cell determines how many cells you run in a sequencing lane, which in turn tells you how many barcodes you need.
- The number of reads wanted per cell determines how many cells you run in a sequencing lane, which in turn tells you how many barcodes you need.

---

# Summary
Expand All @@ -390,11 +501,23 @@
* Cell Barcodes are *designed* to the Plate/Lane setup

]

???
- From the content shown here, you have learned the following.
- Cell Barcodes are sequences attached to transcripts to indicate what cell a transcript came from
- Cell Barcodes are designed for the Plate and Lane setup.


--
.pull-right[

* Indicate what cell a transcript came from.
* Indicates what cell a transcript came from.

* Reduce sequencing errors

]

???
- Cell barcodes indicate what cell a transcript came from
- They reduce sequencing errors when spaced appropriately