# Lecture 5: Quality Control, Normalization, and Preprocessing

**Date:** December 19, 2025 | **Time:** 60 minutes

## Learning Objectives
- Perform quality control on scRNA-seq data using scanpy
- Apply normalization and feature selection
- Handle technical artifacts and batch effects
- Perform dimensionality reduction (PCA)

---

## Setup

In [None]:
import scanpy as sc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(6, 6))

## Task 1: Quality Control Filtering (25 points)

### Instructions
1. Load PBMC 3k dataset
2. Calculate QC metrics (use `sc.pp.calculate_qc_metrics`)
3. Filter cells: min_genes=200, min_counts=500, max_counts=10000, max_pct_mt=5
4. Filter genes: detected in at least 3 cells
5. Compare cell/gene counts before and after

In [None]:
# TODO: Perform QC filtering


## Task 2: Normalization (20 points)

### Instructions
1. Normalize counts to 10,000 per cell (`sc.pp.normalize_total`)
2. Log-transform the data (`sc.pp.log1p`)
3. Store raw counts in `.raw`
4. Visualize distribution before/after normalization

In [None]:
# TODO: Normalize and log-transform


## Task 3: Feature Selection (25 points)

### Instructions
1. Identify highly variable genes using `sc.pp.highly_variable_genes`
2. Select top 2000 HVGs
3. Plot variance vs mean expression
4. Print top 20 HVGs by variance

In [None]:
# TODO: Identify highly variable genes


## Task 4: Dimensionality Reduction with PCA (20 points)

### Instructions
1. Scale data to unit variance (`sc.pp.scale`)
2. Perform PCA with 50 components (`sc.tl.pca`)
3. Plot variance ratio (elbow plot)
4. Visualize cells in PC space
5. Determine optimal number of PCs

In [None]:
# TODO: Perform PCA and visualize


## Task 5: Cell Cycle Scoring (10 points)

### Instructions
1. Score cell cycle phase using `sc.tl.score_genes_cell_cycle`
2. Visualize cell cycle distribution
3. Check if cell cycle affects PC1/PC2

In [None]:
# TODO: Score and visualize cell cycle
