# Lecture 6: Downstream Analysis and Batch Correction

**Date:** December 20, 2025 | **Time:** 60 minutes

## Learning Objectives
- Perform clustering and cell type annotation
- Conduct differential expression analysis
- Apply batch correction methods
- Visualize results with UMAP

---

## Setup

In [None]:
import scanpy as sc
import numpy as np
import pandas as pd

sc.settings.verbosity = 3

## Task 1: Neighborhood Graph and UMAP (20 points)

### Instructions
1. Load preprocessed PBMC data (or continue from Lecture 5)
2. Compute neighborhood graph using `sc.pp.neighbors`
3. Calculate UMAP embedding (`sc.tl.umap`)
4. Visualize UMAP colored by total counts and n_genes

In [None]:
# TODO: Compute neighbors and UMAP


## Task 2: Clustering (25 points)

### Instructions
1. Perform Leiden clustering (`sc.tl.leiden`)
2. Try different resolutions (0.4, 0.8, 1.2)
3. Visualize clusters on UMAP
4. Calculate cluster sizes

In [None]:
# TODO: Perform clustering


## Task 3: Marker Gene Identification (25 points)

### Instructions
1. Find marker genes using `sc.tl.rank_genes_groups`
2. Use Wilcoxon test, compare each cluster vs rest
3. Visualize top markers with `sc.pl.rank_genes_groups`
4. Plot known markers: CD3D, CD79A, CST3, NKG7

In [None]:
# TODO: Find and visualize marker genes


## Task 4: Cell Type Annotation (20 points)

### Instructions
1. Annotate clusters based on marker genes:
   - CD3D/CD3E: T cells
   - CD79A/MS4A1: B cells  
   - CD14/LYZ: Monocytes
   - NKG7/GNLY: NK cells
2. Create new column with cell type labels
3. Visualize on UMAP

In [None]:
# TODO: Annotate cell types


## Task 5: Batch Correction with scVI (10 points)

### Instructions
1. Load dataset with batch information (e.g., `sc.datasets.pbmc3k_processed()`)
2. Apply scVI for batch correction
3. Compare UMAP before/after correction
4. Check if batch effect is removed

In [None]:
# TODO: Apply batch correction
