Skip to content

Materials for EN.601.449/649 Computational Genomics: Applied Comparative Genomics

License

Notifications You must be signed in to change notification settings

schatzlab/appliedgenomics2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JHU EN.601.449/EN.601.649: Computational Genomics: Applied Comparative Genomics

Prof: Michael Schatz (mschatz @ cs.jhu.edu)
TA: Alex Sweeten (asweete1 @ jhu.edu)
Class Hours: Monday + Wednesday @ 3:00p - 4:15p Hodson 311
Schatz Office Hours: By appointment
Sweeten Office Hours: Monday from 4:30pm - 5:30pm (over zoom) and by appointment

The primary goal of the course is for students to be grounded in theory and leave the course empowered to conduct independent genomic analyses. We will study the leading computational and quantitative approaches for comparing and analyzing genomes starting from raw sequencing data. The course will focus on human genomics and human medical applications, but the techniques will be broadly applicable across the tree of life. The topics will include genome assembly & comparative genomics, variant identification & analysis, gene expression & regulation, personal genome analysis, and cancer genomics. The grading will be based on assignments, a midterm exam, class presentations, and a significant class project. There are no formal course prerequisites, although the course will require familiarity with UNIX scripting and/or programming to complete the assignments and course project.

Prerequisites

Course Resources:

Related Courses & Readings

Related Textbooks

Schedule

Class Date Day Topic Assignments Readings
1 28-Aug Mon Introduction Sign Up for Piazza * Biological data sciences in genome research (Schatz, 2015, Genome Research)
* Big Data: Astronomical or Genomical? (Stephens et al, 2015, PLOS Biology)
2 30-Aug Wed Genomic Technologies Assignment 1: Genomic Fundamentals * Molecular Structure of Nucleic Acid (Watson and Crick, 1953, Nature)
* Coming of age: ten years of next-generation sequencing technologies (Goodwin et al, 2016, Nature Reviews Genetics)
* Piercing the dark matter: bioinformatics of long-range sequencing and mapping (Sedlazeck et al, 2018, Nature Reviews Genetics)
* 4-Sep Mon $${\color{red}\text{Labor Day}}$$
3 6-Sep Wed Whole Genome Assembly Assignment 2: Assembly * Toward simplifying and accurately formulating fragment assembly. (Myers, 1995, J. Comp. Bio.)
* Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (Zerbino and Birney, 2008, Genome Research)
*SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing (Bankevich et al. 2012)
* Quake: quality-aware detection and correction of sequencing errors (Kelley et al, 2010, Genome Biology)
* MUMmer: Alignment of Whole Genomes (Delcher et al, 1999, NAR)
4 11-Sep Mon The human genome and intro to long reads * The complete sequence of a human genome (Nurk et al, 2022
* MHAP: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing (Berlin et al, 2015, Nature Biotech)
FALCON-unzip: Phased diploid genome assembly with single-molecule real-time sequencing (Chin et al, 2016, Nature Methods)
* Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies (Rhie et al. 2020)
5 13-Sep Wed Pan-Genomics * Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing (Kovaka et al, 2023, Nature Methods
* Piercing the dark matter: bioinformatics of long- range sequencing and mapping (Sedlazeck et al, 2018, Nature Reviews Genetics)
* A draft human pangenome reference (Liao et al, 2023)
6 18-Sep Mon Genomics in the Cloud * Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (Schatz et al, 2022, Cell Genomics)
7 20-Sep Wed Read Mapping Assignment 3: Mappability and Mapping * How to map billions of short reads onto genomes (Trapnell and Salzberg, 2009, Nature Biotech)
* Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (Langmead et al, 2009, Genome Biology)
* BWA-MEM: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (Li, 2013, arXiv)
* Sapling: Accelerating Suffix Array Queries with Learned Data Models (Kirsche et al, 2020, bioRxiv
8 25-Sep Mon Variant Analysis * Haplotype-based variant detection from short-read sequencing (Garrison and Marth, arXiv, 2012)
* The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data (McKenna et al, 2010, Genome Research)
* A universal SNP and small-indel variant caller using deep neural networks (Poplin et al, 2018, Nature Biotechnology
* SAM/BAM/Samtools: The Sequence Alignment/Map format and SAMtools (Li et al, 2009, Bioinformatics)
* IGV: Integrative genomics viewer (Robinson et al, 2011, Nature Biotech)
9 27-Sep Wed Structural Variant Analysis * Accurate detection of complex structural variations using single-molecule sequencing (Sedlazeck et al, 2018, Nature Methods)
* Characterizing the Major Structural Variant Alleles of the Human Genome (Audano et al, 2019, Cell)
* Resolving the complexity of the human genome using single-molecule sequencing (Chaisson et al, 2015, Nature)
10 2-Oct Mon Sketching * Mash: fast genome and metagenome distance estimation using MinHash (Ondov et al, 2016, Genome Biology)
11 4-Oct Wed Genome Arithmetic and Plane Sweep Assignment 4: BWT and Modimizers * BEDTools: a flexible suite of utilities for comparing genomic features (Quinlan & Hall, 2010, Bioinformatics)
* A Parallel Algorithm for N-Way Interval Set Intersection (Layer & Quinlan, 2016, IEEE Proceedings)
12 9-Oct Mon Machine Learning Primer * What are decision trees? (Kingsford and Salzberg, 2008, Nature Biotechnology)
* What is a hidden Markov model? (Eddy, 2004, Nature Biotechnology)
* Deep learning in biomedicine (Wainberg et al, 2018, Nature Biotechnology)
* Visualizing Data Using t-SNE
13 11-Oct Wed Functional Analysis 1: Annotation Project Proposal * BLAST: Basic Local Alignment Search Tool
* Glimmer: Microbial gene identification using interpolated Markov models
* MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
14 16-Oct Mon Functional Analysis 2: RNA-seq * RNA-Seq: a revolutionary tool for transcriptomics (Wang et al, 2009. Nature Reviews Genetics)
* Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Trapnell et al, 2012, Nature Protocols)
* Salmon provides fast and bias-aware quantification of transcript expression (Patro et al, 2017, Nature Methods)
* Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications (Krueger and Andrews, 2011, Bioinformatics)
15 18-Oct Wed Functional Analysis 3: Methyl-seq, Chip-seq, and Hi-C Assignment 5: Functional Genomics * ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions (Furey, 2012, Nature Reviews Genetics)
* PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Rozowsky et al. 2009. Nature Biotech)
* Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome (Lieberman-Aiden et al, 2009, Science)
16 23-Oct Mon Functional Analysis 4: Regulatory States, ENCODE, GTEx, RoadMap * An integrated encyclopedia of DNA elements in the human genome (The ENCODE Project Consortium, Nature, 2012)
* Genetic effects on gene expression across human tissues (GTEx Consortium, Nature, 2017)
* Integrative analysis of 111 reference human epigenomes (Roadmap Epigenome Consortium, Nature, 2015)
* ChromHMM: automating chromatin-state discovery and characterization (Ernst & Kellis, 2012, Nature Methods)
* Segway: Unsupervised pattern discovery in human chromatin structure through genomic segmentation (Hoffman et al, 2012, Nature Methods)
17 25-Oct Wed Functional Analysis 5: Single Cell Genomics * Ginkgo: Interactive analysis and assessment of single-cell copy-number variations (Garvin et al, 2015, Nature Methods)
* The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells (Trapnell et al, Nature Biotech, 2014)
* Eleven grand challenges in single-cell data science (Lähnemann et al, Genome Biology, 2020)
18 30-Oct Mon Human Evolution Preliminary Project Report * An integrated map of genetic variation from 1,092 human genomes (1000 Genomes Consortium, 2012, Nature)
* Analysis of protein-coding genetic variation in 60,706 humans (Let et al, 2016, Nature)
* A Draft Sequence of the Neandertal Genome (Green et al. 2010, Science)
* Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals (Vernot et al. 2016. Science)
19 1-Nov Wed $${\color{blue}\text{Midterm review}}$$
20 6-Nov Mon $${\color{blue}\text{Midterm}}$$ *In class exam*
21 8-Nov Wed Human Genetic Diseases * Genome-Wide Association Studies (Bush & Moore, 2012, PLOS Comp Bio)
* The contribution of de novo coding mutations to autism spectrum disorder (Iossifov et al, 2014, Nature)
22 13-Nov Mon Cancer Genomics Prelim Report Due * The Hallmarks of Cancer (Hanahan & Weinberg, 2000, Cell)
* Evolution of Cancer Genomes (Yates & Campbell, 2012, Nature Reviews Genetics)
* Comprehensive molecular portraits of human breast tumours (TCGA, 2012, Nature)
23 15-Nov Wed Microbiome and Metagenomics * Kraken: ultrafast metagenomic sequence classification using exact alignments (Wood and Salzberg, 2014, Genome Biology)
* Chapter 12: Human Microbiome Analysis (Morgan and Huttenhower)
* 20-Nov Mon $${\color{red}\text{Thanksgiving Break}}$$
* 22-Nov Wed $${\color{red}\text{Thanksgiving Break}}$$
24 27-Nov Mon $${\color{blue}\text{In class project presentation}}$$ Project presentation
25 29-Nov Wed $${\color{blue}\text{In class project presentation}}$$ Project presentation
26 4-Dec Mon $${\color{blue}\text{In class project presentation}}$$ Project presentation
* 6-Dec Wed $${\color{red}\text{No Class}}$$
* 17-Dec Sun $${\color{blue}\text{Final Report Due}}$$ Final Report

About

Materials for EN.601.449/649 Computational Genomics: Applied Comparative Genomics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published