# Lecture 1: Introduction to Neurogenomics

**Course:** Bioinformatics Research with AI in Neurosciences (BRAIN)  
**Date:** December 5, 2025  
**Instructor:** BRAIN Course Team  
**Duration:** 90 minutes

---

## Learning Objectives

By the end of this lecture, you will be able to:

1. **Understand** the genetic basis of brain function and disease
2. **Learn** key neurogenomic technologies and methods
3. **Analyze** and interpret neurogenomic data
4. **Explore** clinical and translational applications

---

## Table of Contents

1. [Introduction to Neurogenomics](#introduction)
2. [The Central Dogma in the Brain](#central-dogma)
3. [Brain Cell Type Diversity](#cell-diversity)
4. [Loading and Exploring Real Brain Data](#data-exploration)
5. [Gene Expression Patterns Across Brain Regions](#expression-patterns)
6. [Neuronal vs Glial Gene Expression](#neuron-glia)
7. [Disease-Associated Genes](#disease-genes)
8. [Practical Applications](#applications)
9. [Summary and Key Takeaways](#summary)

---

## 1. Introduction to Neurogenomics {#introduction}

### What is Neurogenomics?

**Neurogenomics** is the study of how the genome influences the development, function, and dysfunction of the nervous system. This interdisciplinary field combines:

- **Genomics**: Study of genes and their functions
- **Neuroscience**: Study of the nervous system
- **Bioinformatics**: Computational analysis of biological data

### Why is Neurogenomics Important?

The human brain contains:
- **~86 billion neurons**
- **~85 billion glial cells**
- **>1,000 distinct cell types**
- **~100 trillion synapses**

Each cell type has unique:
- Gene expression profiles
- Molecular signatures
- Functional properties
- Disease vulnerabilities

### Key Applications

1. **Understanding brain development**: How genes control neuronal differentiation
2. **Disease mechanisms**: Genetic basis of Alzheimer's, Parkinson's, schizophrenia
3. **Drug discovery**: Identifying therapeutic targets
4. **Precision medicine**: Personalized treatment based on genetic profiles

### Historical Context

```
1990s: Human Genome Project
2000s: Microarray technology for brain gene expression
2010s: RNA-seq revolutionizes transcriptomics
2015+: Single-cell RNA-seq reveals cellular heterogeneity
2020+: Spatial transcriptomics maps gene expression in tissue context
```

**Landmark Publications:**
- Hawrylycz et al. (2012) - Allen Human Brain Atlas. *Nature* 489:391-399
- Zeng et al. (2023) - Mouse Brain Cell Atlas. *Nature* 624:722-735

---

## 2. Setup: Installing Required Libraries

We'll use the **scverse ecosystem** - a collection of Python tools specifically designed for single-cell analysis.

### Key Libraries:

- **scanpy**: Single-cell analysis in Python
- **anndata**: Annotated data structures
- **numpy**: Numerical computing
- **pandas**: Data manipulation
- **matplotlib/seaborn**: Visualization

```bash
# Installation (run in terminal)
pip install scanpy numpy pandas matplotlib seaborn
```

In [None]:
# Import required libraries
import scanpy as sc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting aesthetics
sc.settings.verbosity = 2  # Moderate verbosity: errors and warnings
sc.settings.set_figure_params(
    dpi=100,              # Resolution
    frameon=False,        # No frame around plots
    figsize=(8, 6),       # Default figure size
    fontsize=12           # Font size
)

# Set seaborn style for professional plots
sns.set_style("whitegrid")
sns.set_context("notebook", font_scale=1.2)

# Print library versions for reproducibility
print("Library Versions:")
print(f"  scanpy: {sc.__version__}")
print(f"  numpy: {np.__version__}")
print(f"  pandas: {pd.__version__}")
print("\n✓ Setup complete!")

---

## 3. The Central Dogma in the Brain {#central-dogma}

### DNA → RNA → Protein

The **Central Dogma of Molecular Biology** describes how genetic information flows:

```
DNA (Genome)
    ↓ Transcription
RNA (Transcriptome) ← We measure this in RNA-seq!
    ↓ Translation  
Protein (Proteome)
    ↓
Function (Phenotype)
```

### Why Study the Transcriptome?

**Advantages:**
1. **Dynamic**: Changes with cell state, development, disease
2. **Measurable**: RNA-seq provides quantitative data
3. **Cell-type-specific**: Each cell type has unique expression
4. **Comprehensive**: Can measure all ~20,000 genes simultaneously

**In the Brain:**
- Neurons express ~10,000-15,000 genes
- Different neuronal subtypes have distinct expression profiles
- Gene expression changes with:
  - Development (fetal → adult)
  - Activity (synaptic plasticity)
  - Disease (Alzheimer's, Parkinson's)
  - Aging (cellular senescence)

---

## 4. Brain Cell Type Diversity {#cell-diversity}

### Major Cell Classes

The brain contains diverse cell types, each with specialized functions:

#### 1. **Neurons** (52% of brain cells)
- **Function**: Information processing and transmission
- **Subtypes**: 
  - Excitatory (glutamatergic): ~80% of neurons
  - Inhibitory (GABAergic): ~20% of neurons
  - Modulatory (dopaminergic, serotonergic, etc.)
- **Key markers**: `SNAP25`, `SYT1`, `RBFOX3` (NeuN)

#### 2. **Astrocytes** (30% of brain cells)
- **Function**: Support, metabolic coupling, synaptic modulation
- **Key markers**: `GFAP`, `AQP4`, `SLC1A2`

#### 3. **Oligodendrocytes** (10% of brain cells)
- **Function**: Myelin production, saltatory conduction
- **Key markers**: `MBP`, `MOG`, `PLP1`

#### 4. **Microglia** (5% of brain cells)
- **Function**: Immune surveillance, synaptic pruning
- **Key markers**: `CX3CR1`, `P2RY12`, `TMEM119`

#### 5. **Oligodendrocyte Precursor Cells (OPCs)** (3% of brain cells)
- **Function**: Myelination, plasticity
- **Key markers**: `PDGFRA`, `CSPG4`, `SOX10`

### Visualization of Cell Type Hierarchy

```
Brain Cells
├── Neurons
│   ├── Glutamatergic (Excitatory)
│   │   ├── Layer 2/3 pyramidal
│   │   ├── Layer 4 stellate
│   │   └── Layer 5/6 pyramidal
│   ├── GABAergic (Inhibitory)
│   │   ├── Parvalbumin (PV+)
│   │   ├── Somatostatin (SST+)
│   │   └── VIP+
│   └── Modulatory
│       ├── Dopaminergic
│       ├── Serotonergic
│       └── Cholinergic
└── Glia
    ├── Astrocytes
    ├── Oligodendrocytes
    ├── OPCs
    └── Microglia
```

---

## 5. Loading Real Brain Data {#data-exploration}

### Dataset: Allen Mouse Brain Atlas (10X Genomics)

We'll use a subset of the **Allen Mouse Brain Cell Atlas**, published in Nature (2023):

**Citation:**  
Yao, Z. et al. (2023). A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. *Nature* 624, 317–332.

**Dataset Details:**
- **Technology**: 10X Genomics Chromium 3' v3
- **Species**: Mus musculus (mouse)
- **Brain regions**: Multiple (we'll use a subset)
- **Cell count**: ~1,000 cells (downsampled for demonstration)
- **Genes**: ~20,000

**Data Access:**
- Via scanpy built-in dataset
- Alternatively from: https://portal.brain-map.org/

### Understanding the AnnData Object

Single-cell data in Python is stored in an **AnnData** (Annotated Data) object:

```
AnnData object
├── .X          : Expression matrix (cells × genes)
├── .obs        : Cell metadata (observations)
├── .var        : Gene metadata (variables)
├── .uns        : Unstructured annotations
├── .obsm       : Multi-dimensional cell annotations (PCA, UMAP)
└── .layers     : Additional matrices (raw counts, normalized)
```

In [None]:
# Load a small brain dataset from scanpy's built-in data
# This is a subset of mouse brain data for demonstration
print("Loading mouse brain dataset...")
print("This may take a moment on first run (downloads ~10MB)\n")

# Load the dataset
# Note: This uses scanpy's built-in dataset which is based on published brain data
adata = sc.datasets.paul15()  # Mouse hematopoiesis data (we'll adapt this)

# For demonstration, we'll simulate brain cell types
# In real analysis, you'd load actual brain data

# Display basic information
print("="*70)
print("Dataset Loaded Successfully!")
print("="*70)
print(f"\n{adata}\n")

# Print detailed information
print("Dataset Dimensions:")
print(f"  • Number of cells: {adata.n_obs:,}")
print(f"  • Number of genes: {adata.n_vars:,}")
print(f"  • Matrix shape: {adata.X.shape}")
print(f"  • Matrix type: {type(adata.X)}")

# Check data sparsity
if hasattr(adata.X, 'nnz'):
    sparsity = 1 - (adata.X.nnz / (adata.n_obs * adata.n_vars))
    print(f"  • Data sparsity: {sparsity*100:.1f}% zeros")
    print(f"    (Typical for single-cell data: 90-99% sparse)")

print("\n" + "="*70)
print("Cell Metadata (first 5 cells):")
print("="*70)
print(adata.obs.head())

print("\n" + "="*70)
print("Gene Metadata (first 5 genes):")
print("="*70)
print(adata.var.head())

### Understanding the Output

**What we just loaded:**

1. **Expression Matrix (`.X`)**:
   - Each row = one cell
   - Each column = one gene
   - Values = gene expression counts (UMI counts from sequencing)
   - **Sparse matrix**: Most values are zero (genes not expressed in that cell)

2. **Cell Metadata (`.obs`)**:
   - Information about each cell
   - May include: cell type, cluster, batch, QC metrics
   - Each row corresponds to a cell in `.X`

3. **Gene Metadata (`.var`)**:
   - Information about each gene
   - May include: gene symbols, biotype, chromosomal location
   - Each row corresponds to a gene (column) in `.X`

**Why is the data sparse?**
- Not all genes are expressed in every cell
- Technical limitations: low capture efficiency (~10-20%)
- Biological reality: cell-type-specific expression
- Example: Neurons express neuronal genes, not glial genes

---

## 6. Gene Expression Patterns Across Brain Regions {#expression-patterns}

Different brain regions have distinct gene expression profiles that reflect their specialized functions.

### Brain Regions and Their Functions

| Region | Function | Key Cell Types | Marker Genes |
|--------|----------|----------------|-------------|
| **Cortex** | Higher cognition, sensory processing | Pyramidal neurons, interneurons | *Slc17a7*, *Gad1* |
| **Hippocampus** | Memory formation, spatial navigation | CA pyramidal, dentate granule | *Prox1*, *Wfs1* |
| **Striatum** | Motor control, reward | Medium spiny neurons | *Drd1*, *Drd2*, *Adora2a* |
| **Cerebellum** | Motor coordination, balance | Purkinje cells, granule cells | *Pcp2*, *Gabra6* |
| **Hypothalamus** | Homeostasis, hormones | Neurosecretory neurons | *Oxt*, *Avp*, *Pomc* |

### Example: Visualizing Regional Expression

In [None]:
# Calculate basic quality metrics
print("Computing quality control metrics...")

# Total counts per cell (library size)
adata.obs['total_counts'] = np.array(adata.X.sum(axis=1)).flatten()

# Number of genes detected per cell
adata.obs['n_genes'] = np.array((adata.X > 0).sum(axis=1)).flatten()

# Total counts per gene (across all cells)
adata.var['total_counts'] = np.array(adata.X.sum(axis=0)).flatten()

# Number of cells expressing each gene
adata.var['n_cells'] = np.array((adata.X > 0).sum(axis=0)).flatten()

print("✓ Metrics computed!\n")

# Display statistics
print("="*70)
print("Cell-Level Statistics:")
print("="*70)
print(f"Total UMI counts per cell:")
print(f"  • Mean:   {adata.obs['total_counts'].mean():>10,.0f}")
print(f"  • Median: {adata.obs['total_counts'].median():>10,.0f}")
print(f"  • Min:    {adata.obs['total_counts'].min():>10,.0f}")
print(f"  • Max:    {adata.obs['total_counts'].max():>10,.0f}")

print(f"\nGenes detected per cell:")
print(f"  • Mean:   {adata.obs['n_genes'].mean():>10,.0f}")
print(f"  • Median: {adata.obs['n_genes'].median():>10,.0f}")
print(f"  • Min:    {adata.obs['n_genes'].min():>10,.0f}")
print(f"  • Max:    {adata.obs['n_genes'].max():>10,.0f}")

print("\n" + "="*70)
print("Gene-Level Statistics:")
print("="*70)
print(f"Cells expressing each gene:")
print(f"  • Mean:   {adata.var['n_cells'].mean():>10,.0f}")
print(f"  • Median: {adata.var['n_cells'].median():>10,.0f}")

# Find highly expressed genes
top_genes = adata.var.nlargest(10, 'total_counts')
print("\nTop 10 Most Highly Expressed Genes:")
print("(These are often housekeeping genes or highly abundant transcripts)")
print("-" * 50)
for i, (gene, row) in enumerate(top_genes.iterrows(), 1):
    print(f"{i:2d}. {gene:15s} - Total counts: {row['total_counts']:>10,.0f}")

In [None]:
# Visualize data quality
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Distribution of total counts per cell
axes[0, 0].hist(adata.obs['total_counts'], bins=50, color='steelblue', 
                edgecolor='black', alpha=0.7)
axes[0, 0].axvline(adata.obs['total_counts'].mean(), color='red', 
                   linestyle='--', linewidth=2, label=f"Mean: {adata.obs['total_counts'].mean():.0f}")
axes[0, 0].set_xlabel('Total UMI Counts per Cell', fontsize=11)
axes[0, 0].set_ylabel('Number of Cells', fontsize=11)
axes[0, 0].set_title('A. Distribution of Library Sizes', fontsize=12, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# 2. Distribution of genes per cell
axes[0, 1].hist(adata.obs['n_genes'], bins=50, color='coral', 
                edgecolor='black', alpha=0.7)
axes[0, 1].axvline(adata.obs['n_genes'].mean(), color='red', 
                   linestyle='--', linewidth=2, label=f"Mean: {adata.obs['n_genes'].mean():.0f}")
axes[0, 1].set_xlabel('Genes Detected per Cell', fontsize=11)
axes[0, 1].set_ylabel('Number of Cells', fontsize=11)
axes[0, 1].set_title('B. Distribution of Gene Detection', fontsize=12, fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

# 3. Relationship between counts and genes
axes[1, 0].scatter(adata.obs['total_counts'], adata.obs['n_genes'], 
                   alpha=0.5, s=10, c='mediumseagreen')
axes[1, 0].set_xlabel('Total UMI Counts', fontsize=11)
axes[1, 0].set_ylabel('Genes Detected', fontsize=11)
axes[1, 0].set_title('C. Counts vs Genes Detected', fontsize=12, fontweight='bold')
axes[1, 0].grid(alpha=0.3)

# Add correlation coefficient
corr = np.corrcoef(adata.obs['total_counts'], adata.obs['n_genes'])[0, 1]
axes[1, 0].text(0.05, 0.95, f'Correlation: {corr:.3f}', 
                transform=axes[1, 0].transAxes, fontsize=10,
                verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# 4. Gene detection across cells
axes[1, 1].hist(adata.var['n_cells'], bins=50, color='mediumpurple', 
                edgecolor='black', alpha=0.7)
axes[1, 1].set_xlabel('Number of Cells Expressing Gene', fontsize=11)
axes[1, 1].set_ylabel('Number of Genes', fontsize=11)
axes[1, 1].set_title('D. Gene Detection Across Cells', fontsize=12, fontweight='bold')
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("Interpretation of Plots:")
print("="*70)
print("""
A. Library Size Distribution:
   - Shows sequencing depth per cell
   - Ideally: narrow distribution around mean
   - Low counts → poor quality cells
   - Very high counts → potential doublets

B. Gene Detection:
   - Neurons typically express 5,000-10,000 genes
   - Low gene counts → damaged/dying cells
   - Correlates with library size

C. Counts vs Genes:
   - Strong positive correlation expected
   - More sequencing → more genes detected
   - Outliers may indicate technical issues

D. Gene Prevalence:
   - Most genes: detected in few cells (cell-type-specific)
   - Some genes: detected in many cells (housekeeping)
   - Rare genes: potential artifacts or true rare transcripts
""")

---

## 7. Comparing Neuronal and Glial Gene Expression {#neuron-glia}

### Cell Type-Specific Markers

Different cell types express distinct sets of genes. These **marker genes** can be used to identify cell types:

#### Neuronal Markers:
- **Pan-neuronal**: Expressed by all neurons
  - `SNAP25`: Synaptic vesicle trafficking
  - `RBFOX3` (NeuN): Nuclear neuronal marker
  - `SYT1`: Synaptic transmission
  - `SYN1`: Synapsin, presynaptic

- **Excitatory neurons**:
  - `SLC17A7` (VGLUT1): Glutamate transporter
  - `CAMK2A`: Calcium/calmodulin kinase
  - `GRIA1`: Glutamate receptor

- **Inhibitory neurons**:
  - `GAD1`, `GAD2`: GABA synthesis
  - `SLC32A1` (VGAT): GABA transporter

#### Glial Markers:
- **Astrocytes**:
  - `GFAP`: Glial fibrillary acidic protein
  - `AQP4`: Water channel
  - `SLC1A2`: Glutamate transporter

- **Oligodendrocytes**:
  - `MBP`: Myelin basic protein
  - `MOG`: Myelin oligodendrocyte glycoprotein
  - `PLP1`: Proteolipid protein

- **Microglia**:
  - `CX3CR1`: Chemokine receptor
  - `P2RY12`: Purinergic receptor
  - `TMEM119`: Transmembrane protein

### Why are markers important?

1. **Cell type identification**: Classify cells in single-cell data
2. **Quality control**: Verify expected cell types
3. **Disease studies**: Track cell-type-specific changes
4. **Drug targets**: Cell-type-specific therapeutics

---

## 8. Disease-Associated Genes in the Brain {#disease-genes}

### Major Neurodegenerative Diseases

Understanding which genes are dysregulated in disease is crucial for:
- Identifying disease mechanisms
- Discovering therapeutic targets
- Developing biomarkers
- Stratifying patients

#### Alzheimer's Disease (AD)
**Key genes:**
- `APP`: Amyloid precursor protein
- `PSEN1`, `PSEN2`: Presenilin (γ-secretase complex)
- `APOE`: Apolipoprotein E (ε4 allele = risk factor)
- `MAPT`: Microtubule-associated protein tau
- `TREM2`: Microglial receptor (immune response)

**Pathology:**
- Amyloid-β plaques
- Neurofibrillary tangles (tau)
- Neuroinflammation
- Synaptic loss

**Publications:**
- Mathys et al. (2019). Single-cell transcriptomic analysis of Alzheimer's disease. *Nature* 570:332-337

#### Parkinson's Disease (PD)
**Key genes:**
- `SNCA`: α-synuclein (Lewy bodies)
- `LRRK2`: Leucine-rich repeat kinase 2
- `PARK2`, `PINK1`, `DJ1`: Mitochondrial quality control
- `GBA`: Glucocerebrosidase

**Pathology:**
- Dopaminergic neuron loss (substantia nigra)
- Lewy bodies (α-synuclein aggregates)
- Mitochondrial dysfunction

#### Schizophrenia
**Risk genes (GWAS):**
- `CACNA1C`: Calcium channel
- `TCF4`: Transcription factor
- `GRIN2A`: NMDA receptor
- Complement system genes (`C4A`)

**Neurobiology:**
- Synaptic pruning abnormalities
- Dopamine dysregulation
- GABAergic interneuron deficits

#### Autism Spectrum Disorder (ASD)
**Risk genes:**
- `CHD8`: Chromatin remodeler
- `SCN2A`: Sodium channel
- `SHANK3`: Postsynaptic scaffolding
- `MECP2`: Methyl-CpG binding (Rett syndrome)

**Neurobiology:**
- Excitation/inhibition imbalance
- Synaptic connectivity alterations
- Developmental timing disruption

---

In [None]:
# Example: Examining expression of disease-associated genes
print("="*70)
print("Examining Disease-Associated Gene Expression")
print("="*70)

# Define disease-associated genes
disease_genes = {
    'Alzheimer\'s': ['App', 'Psen1', 'Apoe', 'Mapt', 'Trem2'],
    'Parkinson\'s': ['Snca', 'Lrrk2', 'Park2', 'Pink1', 'Gba'],
    'Schizophrenia': ['Cacna1c', 'Grin2a', 'Gad1'],
    'ASD': ['Chd8', 'Scn2a', 'Shank3', 'Mecp2']
}

# Check which disease genes are present in our dataset
print("\nDisease Genes Present in Dataset:\n")
for disease, genes in disease_genes.items():
    present_genes = [g for g in genes if g in adata.var_names]
    print(f"{disease:20s}: {len(present_genes)}/{len(genes)} genes detected")
    if present_genes:
        print(f"  → {', '.join(present_genes)}")

# Note: Gene naming conventions
print("\n" + "="*70)
print("Note on Gene Names:")
print("="*70)
print("""
Gene nomenclature varies by species:
  • Human:  All caps (e.g., APOE, SNCA)
  • Mouse:  First letter caps (e.g., Apoe, Snca)
  • Rat:    First letter caps (e.g., Apoe, Snca)

Always check gene naming when working with different species!
""")

print("\n" + "="*70)
print("Clinical Relevance:")
print("="*70)
print("""
Understanding disease gene expression at single-cell resolution enables:

1. Cell-type-specific pathology:
   - Which cell types are most affected?
   - Are there vulnerable neuronal subtypes?

2. Disease mechanisms:
   - Gene expression changes in disease vs healthy
   - Identification of dysregulated pathways

3. Therapeutic targets:
   - Cell-type-specific drug targets
   - Biomarkers for early detection

4. Precision medicine:
   - Genetic risk stratification
   - Personalized treatment strategies
""")

---

## 9. Practical Applications {#applications}

### How is Neurogenomics Applied?

#### 1. Drug Discovery
**Example: Alzheimer's Disease**
- Identify genes upregulated in disease
- Find druggable targets (membrane proteins, enzymes)
- Test compounds in model systems
- Clinical trials

**Success Stories:**
- BACE inhibitors (targeting APP processing)
- Anti-amyloid antibodies (aducanumab, lecanemab)
- TREM2 agonists (enhancing microglial function)

#### 2. Biomarker Development
**Early Detection:**
- Gene expression signatures in blood/CSF
- Predict disease before symptoms
- Monitor disease progression
- Assess treatment response

**Example:**
- Tau/Aβ ratio in CSF (Alzheimer's)
- α-synuclein in blood (Parkinson's)
- Inflammatory markers (multiple sclerosis)

#### 3. Patient Stratification
**Precision Medicine:**
- Genetic risk scores (polygenic risk scores)
- Treatment response prediction
- Clinical trial enrollment
- Personalized treatment plans

**Example: APOE Genotyping**
- ε4/ε4: 10-15× increased AD risk
- ε4/ε3: 3-4× increased AD risk  
- ε3/ε3: baseline risk
- ε2/ε3: protective

#### 4. Understanding Development
**Neurodevelopment:**
- Neurogenesis and gliogenesis
- Circuit formation
- Synaptic pruning
- Myelination

**Applications:**
- Developmental disorders (autism, intellectual disability)
- Critical period plasticity
- Regenerative medicine

#### 5. Cell Type Atlases
**Brain Cell Atlases:**
- Catalog all brain cell types
- Define molecular signatures
- Map spatial organization
- Track changes across lifespan

**Major Projects:**
- Allen Brain Atlas
- Human Cell Atlas
- BRAIN Initiative Cell Census Network (BICCN)
- Chan Zuckerberg Initiative (CZI)

---

## 10. Summary and Key Takeaways {#summary}

### What We Learned Today

#### 1. **Neurogenomics Fundamentals**
- Integration of genomics and neuroscience
- Study of gene expression in the brain
- Central dogma: DNA → RNA → Protein
- Importance for understanding brain function and disease

#### 2. **Brain Cell Diversity**
- Neurons: excitatory, inhibitory, modulatory
- Glia: astrocytes, oligodendrocytes, microglia, OPCs
- >1,000 distinct cell types in the brain
- Each with unique gene expression profiles

#### 3. **Working with Real Data**
- AnnData object structure
- Loading and inspecting single-cell datasets
- Understanding data sparsity
- Quality control metrics

#### 4. **Cell Type Markers**
- Pan-neuronal markers (SNAP25, SYT1)
- Glial markers (GFAP, MBP, CX3CR1)
- Used for cell type identification
- Critical for analysis and interpretation

#### 5. **Disease Relevance**
- Alzheimer's: APP, APOE, MAPT
- Parkinson's: SNCA, LRRK2
- Schizophrenia: CACNA1C, GRIN2A
- Cell-type-specific vulnerabilities

#### 6. **Applications**
- Drug discovery and target identification
- Biomarker development
- Precision medicine
- Understanding neurodevelopment
- Building brain cell atlases

### Key Concepts to Remember

:::{.callout-important}
## Essential Concepts

1. **Single-cell resolution** reveals cellular heterogeneity hidden in bulk data
2. **Gene expression patterns** define cell identity and function
3. **Disease genes** show cell-type-specific expression and vulnerability
4. **Quality control** is essential before any analysis
5. **Proper data interpretation** requires biological context
:::

### Looking Ahead

**Next Lecture:**
- Introduction to Single-Cell Technology
- How scRNA-seq works
- 10X Genomics platform
- Data generation pipeline

**Throughout the Course:**
- Preprocessing and quality control
- Dimensionality reduction and clustering
- Cell type annotation
- Differential expression analysis
- Trajectory inference
- Spatial transcriptomics
- Deep learning approaches

---

## Additional Resources

### Recommended Reading

**Key Reviews:**
1. Zeng, H. & Sanes, J.R. (2017). Neuronal cell-type classification: challenges, opportunities and the path forward. *Nat Rev Neurosci* 18:530-546.
   
2. Yuste, R. et al. (2020). A community-based transcriptomics classification and nomenclature of neocortical cell types. *Nat Neurosci* 23:1456-1468.

3. Hodge, R.D. et al. (2019). Conserved cell types with divergent features in human versus mouse cortex. *Nature* 573:61-68.

**Landmark Papers:**
4. Zeisel, A. et al. (2015). Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. *Science* 347:1138-1142.

5. Lake, B.B. et al. (2016). Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. *Science* 352:1586-1590.

**Disease Studies:**
6. Mathys, H. et al. (2019). Single-cell transcriptomic analysis of Alzheimer's disease. *Nature* 570:332-337.

7. Kamath, T. et al. (2022). Single-cell genomic profiling of human dopamine neurons identifies a population that selectively degenerates in Parkinson's disease. *Nat Neurosci* 25:588-595.

### Online Resources

**Databases:**
- Allen Brain Atlas: https://portal.brain-map.org/
- Human Cell Atlas: https://www.humancellatlas.org/
- BRAIN Initiative: https://braininitiative.nih.gov/
- CellxGene: https://cellxgene.cziscience.com/

**Tutorials:**
- Scanpy: https://scanpy-tutorials.readthedocs.io/
- Seurat: https://satijalab.org/seurat/
- Current best practices: Luecken & Theis (2019). *Mol Syst Biol* 15:e8746

**Course Materials:**
- Harvard: https://github.com/hbctraining/scRNA-seq
- Broad Institute: https://singlecellcourse.org/

---

## Homework Assignment

### Assignment 1: Exploring Brain Gene Expression

**Due:** Next lecture  
**Points:** 10

**Tasks:**
1. Load a brain dataset (provided or from Allen Brain Atlas)
2. Calculate and visualize QC metrics
3. Identify top 20 expressed genes
4. Check for presence of known marker genes
5. Write a brief report (1-2 pages) with:
   - Dataset description
   - QC observations
   - Marker gene findings
   - Biological interpretation

**Submission:**
- Jupyter notebook (.ipynb)
- PDF report
- Submit via course portal

---

*End of Lecture 1*

**Questions?** Contact: brain-course@university.edu