# Lecture 1: Introduction to Neurogenomics - SOLUTION

**Course:** Single-Cell Neurogenomics  
**Date:** December 5, 2025  
**Estimated Time:** 60 minutes  

---

## Learning Objectives

By the end of this assignment, you will be able to:
- Understand the genetic basis of brain function and disease
- Learn key neurogenomic technologies and methods
- Analyze and interpret neurogenomic data
- Explore clinical and translational applications

---

## Setup

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("Libraries imported successfully!")

---

## Task 1: Exploring Brain Gene Expression Data (15 points)

In [None]:
# Create a DataFrame with gene expression data
# Realistic expression patterns based on known biology
gene_expression = pd.DataFrame({
    'Cortex': [65, 48, 5, 8, 35],
    'Hippocampus': [72, 52, 3, 12, 28],
    'Striatum': [45, 38, 88, 6, 22],
    'Cerebellum': [55, 42, 4, 7, 30]
}, index=['SLC17A7', 'GAD1', 'TH', 'SLC6A4', 'GFAP'])

# Display the data
print("Gene Expression Data Across Brain Regions:")
print("="*50)
print(gene_expression)
print("\n")

# Show summary statistics
print("Summary Statistics:")
print("="*50)
print(gene_expression.describe())

---

## Task 2: Visualizing Region-Specific Gene Expression (20 points)

In [None]:
# Create a heatmap of gene expression
plt.figure(figsize=(10, 6))
sns.heatmap(gene_expression, annot=True, cmap='viridis', fmt='d', cbar_kws={'label': 'Expression Level'})
plt.title('Gene Expression Patterns Across Brain Regions', fontsize=14, fontweight='bold')
plt.xlabel('Brain Region', fontsize=12)
plt.ylabel('Gene', fontsize=12)
plt.tight_layout()
plt.show()

---

## Task 3: Identifying Region-Enriched Genes (25 points)

In [None]:
# Find the region with maximum expression for each gene
max_regions = gene_expression.idxmax(axis=1)
max_expression = gene_expression.max(axis=1)
min_expression = gene_expression.min(axis=1)

# Calculate fold-change (max/min expression)
fold_change = max_expression / min_expression

# Print summary for each gene
print("Region-Enriched Gene Analysis:")
print("="*70)
for gene in gene_expression.index:
    print(f"{gene} is most highly expressed in {max_regions[gene]}")
    print(f"  Max expression: {max_expression[gene]:.1f}, Min expression: {min_expression[gene]:.1f}")
    print(f"  Fold-change: {fold_change[gene]:.2f}x\n")

# Create bar plot of maximum expression levels
plt.figure(figsize=(10, 6))
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A', '#98D8C8']
plt.bar(gene_expression.index, max_expression, color=colors, edgecolor='black', linewidth=1.2)
plt.xlabel('Gene', fontsize=12, fontweight='bold')
plt.ylabel('Maximum Expression Level', fontsize=12, fontweight='bold')
plt.title('Maximum Expression Level for Each Gene', fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

---

## Task 4: Comparing Cell Type Markers (20 points)

In [None]:
# Define marker categories
neuronal_markers = ['SLC17A7', 'GAD1', 'TH']
glial_marker = ['GFAP']

# Create grouped bar plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Plot neuronal markers
gene_expression.loc[neuronal_markers].T.plot(kind='bar', ax=ax1, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
ax1.set_title('Neuronal Marker Expression', fontsize=12, fontweight='bold')
ax1.set_xlabel('Brain Region', fontsize=11)
ax1.set_ylabel('Expression Level', fontsize=11)
ax1.legend(title='Neuronal Markers', loc='upper right')
ax1.grid(axis='y', alpha=0.3)
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')

# Plot glial marker
gene_expression.loc[glial_marker].T.plot(kind='bar', ax=ax2, color='#98D8C8', legend=False)
ax2.set_title('Glial Marker Expression', fontsize=12, fontweight='bold')
ax2.set_xlabel('Brain Region', fontsize=11)
ax2.set_ylabel('Expression Level', fontsize=11)
ax2.grid(axis='y', alpha=0.3)
ax2.set_xticklabels(ax2.get_xticklabels(), rotation=45, ha='right')

plt.tight_layout()
plt.show()

# Calculate and print average expression
print("\nAverage Expression Across All Regions:")
print("="*50)
neuronal_avg = gene_expression.loc[neuronal_markers].mean(axis=1).mean()
glial_avg = gene_expression.loc[glial_marker].mean(axis=1).mean()

print(f"Average Neuronal Marker Expression: {neuronal_avg:.2f}")
print(f"Average Glial Marker Expression: {glial_avg:.2f}")
print(f"\nInterpretation: The average neuronal marker expression is {neuronal_avg/glial_avg:.2f}x higher than glial marker expression.")
print("This reflects the predominance of neurons in most brain regions.")

---

## Task 5: Clinical Relevance - Gene Expression Correlations (20 points)

In [None]:
# Calculate correlation matrix
correlation_matrix = gene_expression.T.corr()

print("Gene Expression Correlation Matrix:")
print("="*50)
print(correlation_matrix)
print("\n")

# Create correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            fmt='.2f', square=True, linewidths=1, cbar_kws={'label': 'Correlation Coefficient'})
plt.title('Gene-Gene Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Identify strongest correlations
# Set diagonal to NaN to exclude self-correlations
corr_no_diag = correlation_matrix.copy()
np.fill_diagonal(corr_no_diag.values, np.nan)

# Find strongest positive correlation
max_corr = corr_no_diag.max().max()
max_idx = corr_no_diag.stack().idxmax()

# Find strongest negative correlation (if any)
min_corr = corr_no_diag.min().min()
min_idx = corr_no_diag.stack().idxmin()

# Print findings
print("\nCorrelation Analysis Results:")
print("="*70)
print(f"Strongest Positive Correlation: {max_idx[0]} and {max_idx[1]} (r = {max_corr:.3f})")
print(f"Strongest Negative Correlation: {min_idx[0]} and {min_idx[1]} (r = {min_corr:.3f})")

print("\n" + "="*70)
print("BIOLOGICAL INTERPRETATION:")
print("="*70)
print("\n1. Positive Correlations:")
print("   - High correlation between SLC17A7 and GAD1 suggests coordinated regulation")
print("     of excitatory and inhibitory neurotransmission (E/I balance)")
print("   - These genes may be co-regulated by similar transcription factors")
print("\n2. Negative Correlations:")
print("   - TH shows inverse correlation with other markers")
print("   - Reflects specialized expression in dopaminergic regions (striatum)")
print("   - Suggests distinct developmental programs for different neuronal subtypes")
print("\n3. Clinical Relevance:")
print("   - Understanding these patterns helps identify disease mechanisms")
print("   - Disruption of E/I balance is implicated in autism and epilepsy")
print("   - TH expression changes are central to Parkinson's disease")

---

## Reflection Questions (Bonus: 10 points)

### Question 1: Why do different brain regions show distinct gene expression patterns?

**Answer:** Different brain regions show distinct gene expression patterns due to:
- **Functional specialization:** Each region performs unique functions requiring specific molecular machinery
- **Developmental origin:** Regions arise from different progenitor zones with distinct transcriptional programs
- **Cell type composition:** Varying proportions of neurons, glia, and neuronal subtypes
- **Circuit properties:** Local connectivity and neurotransmitter systems differ across regions
- **Environmental inputs:** Activity-dependent gene expression responds to local circuit activity

### Question 2: How could understanding gene expression patterns help in studying neurological diseases?

**Answer:** Understanding healthy gene expression patterns is crucial for disease research because:
- **Baseline comparison:** Provides reference to identify disease-related changes
- **Biomarker discovery:** Region-specific markers can indicate disease progression
- **Therapeutic targets:** Identifies genes/pathways that could be modulated for treatment
- **Disease mechanisms:** Disrupted expression patterns reveal pathological processes
- **Precision medicine:** Patient-specific expression profiles guide personalized treatment
- **Early detection:** Subtle changes may precede clinical symptoms

### Question 3: What are the limitations of bulk tissue analysis versus single-cell approaches?

**Answer:** Limitations of bulk tissue analysis:
- **Cell type heterogeneity:** Averages across all cells, masking cell-type-specific changes
- **Rare populations:** Cannot detect signals from rare cell types (<1-5% of tissue)
- **Cell-cell interactions:** Missing information about spatial relationships
- **State transitions:** Cannot capture dynamic processes or intermediate states
- **Disease specificity:** Changes in small affected populations are diluted

Single-cell approaches overcome these by:
- Resolving cell-type-specific expression
- Identifying rare and novel cell types
- Capturing cellular heterogeneity and states
- Revealing cell-type-specific disease mechanisms

---

## Summary

In this assignment, you've learned to:
- ✓ Organize and explore neurogenomic data
- ✓ Visualize gene expression patterns across brain regions
- ✓ Identify region-enriched genes and interpret their biological significance
- ✓ Compare expression between different cell type markers
- ✓ Analyze gene-gene correlations and their clinical relevance

These foundational skills will be essential as we move forward into single-cell analysis!