# Lecture 4: Quantification of Single-Cell RNA-seq Data

**Course:** Single-Cell Neurogenomics  
**Date:** December 13, 2025  
**Estimated Time:** 60 minutes  

---

## Learning Objectives

- Understand the principles of FASTQ data processing in single-cell RNA-seq
- Learn to interpret QC metrics and gene-cell matrices
- Generate and analyze barcode rank plots
- Prepare outputs for downstream single-cell analysis

---

## Introduction

Single-cell quantification pipelines (CellRanger, kb-python, alevin) convert raw sequencing reads into count matrices. This assignment focuses on understanding and interpreting these outputs.

---

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scanpy as sc

sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(8, 6))
print("Libraries loaded successfully!")

---

## Task 1: Understanding Count Matrix Structure (20 points)

### Instructions
1. Load the PBMC 3k dataset
2. Examine the count matrix structure (cells, genes, sparsity)
3. Visualize the distribution of total counts per cell
4. Create a barcode rank plot (cells ranked by total counts)

### Hints
- Use `sc.datasets.pbmc3k()`
- Plot log-scale for barcode rank
- Identify the "knee" point separating cells from empty droplets

In [None]:
# TODO: Load data and examine structure


# Calculate total counts per cell


# Create barcode rank plot


---

## Task 2: Interpreting QC Metrics (25 points)

### Instructions
1. Calculate key QC metrics:
   - Total UMI counts per cell
   - Number of genes detected per cell
   - Percentage of mitochondrial reads
2. Create violin plots for each metric
3. Identify potential quality thresholds
4. Visualize relationships between metrics using scatter plots

In [None]:
# TODO: Calculate and visualize QC metrics


# Create violin plots


# Scatter plots for relationships


---

## Task 3: Sequencing Saturation Analysis (20 points)

### Instructions
1. Simulate sequencing saturation by subsampling reads
2. Calculate genes detected at different sequencing depths
3. Plot saturation curve (depth vs genes detected)
4. Determine if sequencing depth is sufficient

In [None]:
# TODO: Analyze sequencing saturation


# Create saturation curve


---

## Task 4: Gene Detection Analysis (20 points)

### Instructions
1. Calculate how many cells each gene is detected in
2. Identify highly detected genes (>50% of cells)
3. Identify rarely detected genes (<1% of cells)
4. Visualize gene detection distribution

In [None]:
# TODO: Analyze gene detection


# Identify gene categories


# Visualize distribution


---

## Task 5: Preparing Data for Analysis (15 points)

### Instructions
1. Filter cells based on QC thresholds
2. Filter genes (detected in <3 cells)
3. Compare dataset before and after filtering
4. Save filtered data

In [None]:
# TODO: Filter and prepare data


# Compare before/after


---

## Grading Rubric

| Task | Points |
|------|--------|
| Task 1 | 20 |
| Task 2 | 25 |
| Task 3 | 20 |
| Task 4 | 20 |
| Task 5 | 15 |
| **Total** | **100** |