# Lecture 1: Introduction to Neurogenomics

**Course:** Single-Cell Neurogenomics  
**Date:** December 5, 2025  
**Estimated Time:** 60 minutes  

---

## Learning Objectives

By the end of this assignment, you will be able to:
- Understand the genetic basis of brain function and disease
- Learn key neurogenomic technologies and methods
- Analyze and interpret neurogenomic data
- Explore clinical and translational applications

---

## Introduction

Neurogenomics is the study of how the genome influences the development and function of the nervous system. This field combines neuroscience and genomics to understand:
- Gene expression patterns in different brain regions
- Genetic variants associated with neurological diseases
- Cell-type-specific gene expression in the brain
- Molecular mechanisms underlying brain function

In this assignment, you will work with brain gene expression data to explore fundamental concepts in neurogenomics.

---

## Setup

First, import the necessary libraries:

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("Libraries imported successfully!")

---

## Task 1: Exploring Brain Gene Expression Data (15 points)

### Background
Different brain regions have distinct gene expression profiles that reflect their specialized functions. In this task, you'll create and explore a simulated dataset representing gene expression across different brain regions.

### Instructions
1. Create a DataFrame with gene expression data for 5 key neuronal marker genes across 4 brain regions
2. The genes should be: 'SLC17A7' (glutamatergic neurons), 'GAD1' (GABAergic neurons), 'TH' (dopaminergic neurons), 'SLC6A4' (serotonergic neurons), and 'GFAP' (astrocytes)
3. The brain regions should be: 'Cortex', 'Hippocampus', 'Striatum', 'Cerebellum'
4. Display the first few rows and calculate summary statistics

### Hints
- Use `pd.DataFrame()` to create the data structure
- Expression values should be realistic (e.g., normalized counts between 0-100)
- Use `.head()` and `.describe()` methods

In [None]:
# TODO: Create a DataFrame with gene expression data
# Columns: brain regions, Rows: genes

# Your code here


# Display the data


# Show summary statistics


**Expected Output:** A DataFrame showing expression values for 5 genes across 4 brain regions, along with summary statistics.

---

## Task 2: Visualizing Region-Specific Gene Expression (20 points)

### Background
Visualization is crucial for understanding gene expression patterns. Heatmaps are particularly useful for displaying expression data across multiple genes and regions simultaneously.

### Instructions
1. Create a heatmap showing the expression of all genes across all brain regions
2. Add appropriate labels and a color bar
3. Use an appropriate color scheme (e.g., 'viridis', 'RdYlBu_r')
4. Include a title: "Gene Expression Patterns Across Brain Regions"

### Hints
- Use `sns.heatmap()` function
- Parameters: `annot=True` to show values, `cmap` for color scheme
- Use `plt.xlabel()`, `plt.ylabel()`, and `plt.title()` for labels

In [None]:
# TODO: Create a heatmap of gene expression

# Your code here


plt.show()

**Expected Output:** A heatmap with genes on one axis, brain regions on another, and color intensity representing expression levels.

---

## Task 3: Identifying Region-Enriched Genes (25 points)

### Background
Some genes show enriched expression in specific brain regions, which can indicate their functional importance in those areas. Identifying these patterns is a fundamental task in neurogenomics.

### Instructions
1. For each gene, identify which brain region has the highest expression
2. Calculate the fold-change between the highest and lowest expressing regions
3. Create a bar plot showing the maximum expression level for each gene
4. Print a summary statement for each gene indicating its enriched region

### Hints
- Use `.idxmax()` to find the region with highest expression
- Use `.max()` and `.min()` to calculate fold-change
- Create a bar plot with genes on x-axis and max expression on y-axis

In [None]:
# TODO: Identify region-enriched genes

# Find the region with maximum expression for each gene


# Calculate fold-change (max/min expression)


# Print summary for each gene


# Create bar plot of maximum expression levels


**Expected Output:** 
- Text output showing which region each gene is enriched in
- Fold-change values for each gene
- A bar plot showing maximum expression levels

---

## Task 4: Comparing Cell Type Markers (20 points)

### Background
Different cell types in the brain express distinct marker genes. Understanding these expression patterns is essential for cell type identification and characterization.

### Instructions
1. Create a grouped bar plot comparing the expression of neuronal markers (SLC17A7, GAD1, TH) vs glial marker (GFAP) across all brain regions
2. Use different colors for neuronal vs glial markers
3. Add a legend and appropriate labels
4. Calculate and print the average expression of neuronal markers vs the glial marker

### Hints
- Select specific rows from your DataFrame
- Use `.plot(kind='bar')` or create subplots
- Calculate mean across all regions using `.mean(axis=1)`

In [None]:
# TODO: Compare neuronal vs glial markers

# Define marker categories


# Create grouped bar plot


# Calculate and print average expression


**Expected Output:** 
- A grouped bar plot showing expression patterns
- Printed average expression values for each marker category

---

## Task 5: Clinical Relevance - Gene Expression Correlations (20 points)

### Background
Understanding co-expression patterns between genes can reveal functional relationships and regulatory networks. This is particularly important for understanding disease mechanisms.

### Instructions
1. Calculate the correlation matrix between all genes
2. Create a correlation heatmap with annotations
3. Identify which pair of genes shows the strongest positive correlation
4. Identify which pair shows the strongest negative correlation
5. Discuss what these correlations might mean biologically

### Hints
- Use `.corr()` method to calculate correlations
- Use `sns.heatmap()` with `annot=True`
- Use `.abs()` to find strongest correlations

In [None]:
# TODO: Calculate and visualize gene correlations

# Calculate correlation matrix


# Create correlation heatmap


# Identify strongest correlations


# Print findings


**Expected Output:** 
- A correlation heatmap showing relationships between genes
- Identification of strongest positive and negative correlations
- Brief interpretation of biological significance

---

## Reflection Questions (Bonus: 10 points)

Answer the following questions based on your analysis:

1. **Question 1:** Why do different brain regions show distinct gene expression patterns? What biological processes might explain these differences?

2. **Question 2:** How could understanding gene expression patterns in healthy brain tissue help in studying neurological diseases?

3. **Question 3:** What are the limitations of analyzing bulk tissue gene expression versus single-cell approaches in neurogenomics?

**Your Answers:**

1. [Your answer here]

2. [Your answer here]

3. [Your answer here]

---

## Submission Guidelines

1. Complete all tasks with proper code and outputs
2. Ensure all plots are properly labeled and visible
3. Answer reflection questions thoroughly
4. Save your notebook with outputs included
5. Submit the completed notebook file

---

## Grading Rubric

| Component | Points | Criteria |
|-----------|--------|----------|
| Task 1 | 15 | Data structure created correctly, summary stats displayed |
| Task 2 | 20 | Heatmap properly formatted with labels and appropriate colors |
| Task 3 | 25 | Correct identification of enriched regions, fold-change calculations, and visualization |
| Task 4 | 20 | Proper comparison between cell type markers with visualization |
| Task 5 | 20 | Correlation analysis with interpretation |
| Reflection | 10 | Thoughtful answers demonstrating understanding |
| **Total** | **110** | |

---

## Additional Resources

- Allen Brain Atlas: https://portal.brain-map.org/
- Gene Expression Omnibus (GEO): https://www.ncbi.nlm.nih.gov/geo/
- GTEx Portal (brain tissue data): https://gtexportal.org/
- PsychENCODE Consortium: http://psychencode.org/