Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

JHU EN.600.649: Computational Genomics: Applied Comparative Genomics

Michael Schatz (mschatz @
Class Hours: Tuesday + Thursday @ 1:30p - 2:45p in Shaffer 304
Office Hours: Tuesday + Thursday @ 3-4p in Malone 323 and by appointment

The primary goal of the course is for students to be grounded in theory and leave the course empowered to conduct independent genomic analyses. We will study the leading computational and quantitative approaches for comparing and analyzing genomes starting from raw sequencing data. The course will focus on human genomics and human medical applications, but the techniques will be broadly applicable across the tree of life. The topics will include genome assembly & comparative genomics, variant identification & analysis, gene expression & regulation, personal genome analysis, and cancer genomics. The grading will be based on assignments, a midterm exam, class presentations, and a significant class project. There are no formal course prerequisites, although the course will require familiarity with UNIX scripting and/or programming to complete the assignments and course project.


Course Resources:

Related Courses & Readings


# Date Lecture Readings & Resources Assignment
1. Tu 1/31 Introduction * Biological data sciences in genome research (Schatz, 2015, Genome Research)
* Big Data: Astronomical or Genomical? (Stephens et al, 2015, PLOS Biology)
Sign Up for Piazza
2. Th 2/2 Genomic Technologies * Molecular Structure of Nucleic Acid (Watson and Crick, 1953, Nature)
* Coming of age: ten years of next-generation sequencing technologies (Goodwin et al, 2016, Nature Reviews Genetics)
* High‐throughput sequencing for biology and medicine (Soon et al, 2013, Molecular Systems Biology)
3. Tu 2/7 Whole Genome Assembly * Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (Zerbino and Birney, 2008, Genome Research)
* Quake: quality-aware detection and correction of sequencing errors (Kelley et al, 2010, Genome Biology)
* Allpaths-LG: High-quality draft assemblies of mammalian genomes from massively parallel sequence data (Gnerre et al, 2011, PNAS)
4. Th 2/9 Whole Genome Assembly and Alignment * Toward simplifying and accurately formulating fragment assembly. (Myers, 1995, J. Comp. Bio.)
* MHAP: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing (Berlin et al, 2015, Nature Biotech)
* Genome assembly forensics: finding the elusive mis-assembly (Phillippy et al, 2008, Genome Biology)
* MUMmer: Alignment of Whole Genomes (Delcher et al, 1999, NAR)
Assignment 1
5. Tu 2/14 Read Mapping (Guest Lecture by Ben Langmead) * Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (Langmead et al, 2009, Genome Biology)
* BWA-MEM: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (Li, 2013, arXiv)
* SAM/BAM/Samtools: The Sequence Alignment/Map format and SAMtools (Li et al, 2009, Bioinformatics)
* IGV: Integrative genomics viewer (Robinson et al, 2011, Nature Biotech)
6. Th 2/16 Read Mapping (Guest Lecture by Ben Langmead) * How to map billions of short reads onto genomes (Trapnell and Salzberg, 2009, Nature Biotech)
7. Tu 2/21 Variant Analysis * PolyBayes: A general approach to single-nucleotide polymorphism discovery (Marth et al, 1999, Nature Genetics)
* GATK: A framework for variation discovery and genotyping using next-generation DNA sequencing data (Depristo et al, 2011, Nature Genetics)
* Scalpel: Accurate de novo and transmitted indel detection in exome-capture data using microassembly (Narzisi et al, 2014, Nature Methods)
8. Th 2/23 Structural Variant Analysis * Genome structural variation discovery and genotyping (Alkan et al, 2011, Nature Reviews Genetics)
* LUMPY: a probabilistic framework for structural variant discovery (Layer et al, 2014, Genome Biology)
* Assembly Reconciliation (Zimin et al, 2008, Bioinformatics)
* Ginkgo: Interactive analysis and assessment of single-cell copy-number variations (Garvin et al, 2015, Nature Methods)
9. Tu 2/28 Third Generation Sequencing * Resolving the complexity of the human genome using single-molecule sequencing (Chaisson et al, 2015, Nature)
* FALCON-unzip: Phased diploid genome assembly with single-molecule real-time sequencing (Chin et al, 2016, Nature Methods)
* Assemblytics: a web analytics tool for the detection of variants from an assembly (Nattestad et al, 2016, Bioinformatics)
* NanoCorr: Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome (Goodwin et al, 2015, Genome Research)
10. Th 3/2 Genome Arithmetic & Plane Sweep Algorithms * BEDTools: a flexible suite of utilities for comparing genomic features (Quinlan & Hall, 2010, Bioinformatics)
* A Parallel Algorithm for N-Way Interval Set Intersection (Layer & Quinlan, 2016, IEEE Proceedings)
11. Tu 3/7 Genome Annotation * BLAST: Basic Local Alignment Search Tool
* Glimmer: Microbial gene identification using interpolated Markov models
* MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
* What is a hidden Markov model?
Assignment 2
12. Th 3/9 Functional Analysis 1: RNA-seq * RNA-Seq: a revolutionary tool for transcriptomics (Wang et al, 2009. Nature Reviews Genetics)
* Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Trapnell et al, 2012, Nature Protocols)
* Salmon provides fast and bias-aware quantification of transcript expression (Patro et al, 2017, Nature Methods)
* Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications (Krueger and Andrews, 2011, Bioinformatics)
13. Tu 3/14 Snow Day!
14. Th 3/16 Functional Analysis 2: Methyl-seq & Chip-seq * ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions (Furey, 2012, Nature Reviews Genetics)
* PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Rozowsky et al. 2009. Nature Biotech)
Project Proposal
Tu 3/21 Spring Break!
Th 3/23 Spring Break!
15. Tu 3/28 Functional Analysis 3: Regulatory States * ChromHMM: automating chromatin-state discovery and characterization (Ernst & Kellis, 2012, Nature Methods)
* Segway: Unsupervised pattern discovery in human chromatin structure through genomic segmentation (Hoffman et al, 2012, Nature Methods)
16. Th 3/30 Functional Analysis 4: ENCODE * An integrated encyclopedia of DNA elements in the human genome (The ENCODE Project Consortium, Nature, 2012)
17. Tu 4/4 Midterm Review
18. Th 4/6 Midterm Exam
19. Tu 4/11 Midterm Discussion
20. Th 4/13 Human Evolution * An integrated map of genetic variation from 1,092 human genomes (1000 Genomes Consortium, 2012, Nature)
* Analysis of protein-coding genetic variation in 60,706 humans (Let et al, 2016, Nature)
* A Draft Sequence of the Neandertal Genome (Green et al. 2010, Science)
* Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals (Vernot et al. 2016. Science)
* Visualizing Data Using t-SNE
21. Tu 4/18 Human Genetic Diseases * Genome-Wide Association Studies (Bush & Moore, 2012, PLOS Comp Bio)
* The contribution of de novo coding mutations to autism spectrum disorder (Iossifov et al, 2014, Nature)
22. Th 4/20 Cancer Genomics & Adventures in Overfitting * The Hallmarks of Cancer (Hanahan & Weinberg, 2000, Cell)
* Evolution of Cancer Genomes (Yates & Campbell, 2012, Nature Reviews Genetics)
* Comprehensive molecular portraits of human breast tumours (TCGA, 2012, Nature)
Preliminary Project Report
23. Tu 4/25 Microbiome and Metagenomics * Kraken: ultrafast metagenomic sequence classification using exact alignments (Wood and Salzberg, 2014, Genome Biology)
* Chapter 12: Human Microbiome Analysis (Morgan and Huttenhower)
24. Th 4/27 Genomic Futures * "Snyderome" Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes (Chen et al, 2012, Cell)
* Identifying Personal Genomes by Surname Inference (Gymrek et al, 2013, Science)
Project Presentations
25. Tu 5/2 Project Presentations Project Presentations
26. Th 5/4 Project Presentations Project Presentations
27. Wed 5/17 Final Project Report Due! Project Report


JHU EN.600.649: Computational Genomics: Applied Comparative Genomics







No releases published


No packages published