Skip to content

natalies-teaching/Comp790-166-CompBio-Spring2022

Repository files navigation

Comp790-166: Computational Biology (Spring 2022)

Details

Instructor: Natalie Stanley

Email: natalies@cs.unc.edu

Note : Please send me an email if you enrolled late and did not recieve the welcome email!

Time and Place: Monday/Wednesday 9:30am-10:45am in Sitterson 115 (SN0115) and on Zoom. This course will be offered as a hybrid course. I will be present in SN0115 for each class meeting and also be on zoom and recording the lecture. Please watch your emails for a zoom link.

Info and Attributes: This is a 3-credit full-semester course and fulfills the 'Applications' category for CS students. It is a lecture-style class (I will teach the lectures) and includes two homework assignments and a course project. Please make sure you selected the 3-credit option when you enrolled.

Office Hours: Monday 10:45-noon or by appointment

Description

Modern, high-throughput assays allow us to efficiently profile a variety of biological processes to gain a systems-level understanding of health and disease. Recent technologies and experimental assays generate an abundance of detailed information that needs to be extracted, summarized, and interpreted. In this course we will discuss the methodology used to extract signal from (e.g. process, engineer features from, combine, etc.) data generated by some of the most cutting-edge experimental paradigms, such as single-cell assays and imaging. We will go into detail about the methods and theory underlying bioinformatics algorithms, originating from numerical linear algebra, graph-signal processing, and machine learning. While computational biology is a very broad field, we will focus here on applications in single-cell biology (CyTOF, single-cell RNA sequencing), multiomics/multi-modal analysis, systems immunology, and benchmarking. For each class of algorithms introduced for some task on biological data, we will also go over necessary theory and mathematical intuition. The course covers the foundations for biomedical data science and does not assume any biological knowledge.

Prerequistes

Students should be strong in programming in Python or Julia or R, and be comfortable with linear algebra and basic probability. I do not assume any prior biological knowledge. Any relevant concepts will be introduced. Please feel free to talk to me about any of these prerequistes.

Course Structure

This course will be mostly lecture-based with two homework assignments and a course project. I will provide ideas for several publicly available biological datasets and open problems for you to work on for these projects. Overall, the project is intended to give you an opportunity to implement/apply methodology discussed in the papers that we will discuss together. The final project writeup will also give you practice writing up results and communicating ideas. You are welcome to work on teams for this project.

Students will also pick any two days during the semester to answer a set of reading questions about one of the assigned papers.

Topics and Tentative Schedule.

This is the preliminary set of topics.

Date Topic Reading Notes Code
Monday January 10, 2022 Intro, Bioinformatics vs Computational Biology [Systems Immunology, Just Getting Started], [Grand Challenges in Single-Cell Data Science] [Lecture 1 Notes]
Wednesday January 12, 2022 Linear Algebra Review, Matrix Rank, Building Graphs from Data, Graph Laplacian, Graph Diffusion [SLMP. pages 10-22], [Data Matrices + Low Rank], [Random Projection Trees],[LargeVis] [Lecture 2 Notes] [graph tools for python]
Monday January 17, 2022 No class for MLK Day, and no regular office hours today. Please email me to set up a meeting for anything urgent.
Wednesday January 19, 2022 Overflow from last time introducing the diffusion point of view of graph laplacian ; building graphs from data; Graph Partitioning Fundamentals [Module Detection Benchmarking in Biological Data], [BigClam], [SBM for single-cell] [Lecture 3 Notes] [LargeVis]
Monday January 24, 2022 Graph Partitioning (Modularity-Based {Louvain, Leiden}; Finding disease-relevant modules from protein-protein and/or gene-gene interactions networks [Fast Unfolding of Communities in Large Networks] [Lecture 4 Notes] [SNAP], [Louvain], [Leiden], [graph-tool (SBM)]
Wednesday January 26 Finish probabilistic graph partitioning {SBM, Affiliation Model}); Graph Embeddings with node2vec (such as node2vec); Single-Cell Day 1 : Intro to Single-Cell Profiling and Data Structures [Node2Vec], [Representation Learning on Graphs], [Review: Graph Embedding in Comp Bio], [Vicus] , omitted for lack of time, but will come back to later (Graph Signal Processing (GSP) Basics[Intro to Graph Signal Processing in Machine Learning] [Lecture 5 Notes] [node2vec]
Monday January 31 Single Cell Day 1: Intro to Single-Cell {Data Structure, Technologies, Pre-Processing}, Automating Gating and Cell-Population Discovery in Single Cell Data [Spade], [Single-Cells, Many Features] [Lecture 6 Notes] [FCS file tutorial], [Spade]
Wednesday February 2 Graph-based automated gating, imputation in single-cell data [phenograph] [Lecture 7 Notes] [phenograph], [FastPG]
Monday February 7 Imputation for single-cell data, Branch Preserving Visualization for Single-Cell Data [MAGIC],[PHATE], [Lecture 8 Notes] [MAGIC],[Phate]
Wednesday February 9 [Homework 1 Assigned] Branch Preserving Visualizating for Cellular Differentiation, Data Augmentation for Single Cell Identifying prototypical cells of a particular experimental condition with graph signal processing [SUGAR] [Lecture 9 Notes] [sugar]
Monday February 14 Finish data augmentation for sparse single-cell landscapes, start graph signal processing background [Graph Signal Processing Review Article],[MELD] [Lecture 10 Notes] [MELD] [pyGSP]
Wednesday February 16 [Project Proposal Signup Sheet] ; [Project Proposal Template] Finish introducing graph signal processing, low-pass filtering, MELD for selecting condition-specific prototypical cells See references from Monday. Also, [The Emerging Field of Graph Signal Processing] [Lecture 11 Notes]
Monday February 21 Finish up GSP background, Identifying condition or experimentally-specific prototypical cells with Meld, Start Differential Abundance Analysis with Cydar [Cydar] [Lecture 12 Notes] [cydar]
Wednesday February 23, Homework 1 is due by 11:59pm eastern time on February 25 Differential Abundance Day 2. Cydar, Milo [Milo] [Lecture 13 Notes] [Milo]
Monday February 28 [Please sign up for your project presentations here!] Finish differential abundance analysis (Milo and Cydar), Contrastive PCA for dealing with background data [Contrastive PCA] [Lecture 14 Notes] [Contrastive PCA]
Wednesday March 2 Please bring your laptops to class! Trajectory inference - guest lecture and tutorial by Jolene Ranek [PAGA], [A Comparison of Single-Cell Trajectory Inference Methods], [Slingshot]. [Lecture 15 Notes], [Colab Notebook for trajectory inference] [PAGA]
Monday, March 7 Batch 1 of project presentations [Link to Presentation Schedule]
Wednesday, March 9 Batch 2 of project presentations [Link to Presentation Schedule]
Monday, March 14 Spring Break -- No class!
Wednesday, March 16 Spring Break -- No class!
Monday, March 21 Finish differential abundance analysis (Milo), Contrastive PCA, Merging Multiple Single-Cell Datasets (Conos) [Conos] [Lecture 18 Notes] [Conos]
Wednesday, March 23 Pseudotime, Diffusion, and Cellular Differentiation [Diffusion Maps for Differentiation], [SLICER-developed at UNC], [Original Diffusion Maps (Coifman)] [Lecture 19 Notes] [Diffusion Maps -Scanpy], [SLICER]
Monday, March 28 Last single-cell lecture. Branching trajectories with SLICER. Begin combining biological data from multiple modalities using Grassmann Embeddding [Subspace Merging on Grassmann Manifold], [Rayleigh Ritz Business (Spectral Clustering...] [Lecture 20 Notes] [Grassmann Cluster]
Wednesday, March 30 Finish multimodal data integration with Grassmann + Rayleigh Ritz, Start MOFA integration [MOFA], [MOFA+] [Lecture 21 Notes] [MOFA]
Monday, April 4 Integrating multiple heterogeneous graphs (e.g. multiple relational definitions) [Mashup] [Lecture 22 Notes] [Mashup]
Wednesday, April 6 [Project Presentation Signup ], [Final Project Writeup Template], [Homework 2 Assigned and due Fri April 22] Graph Alignment and Summarization [REGAL (graph alignment)], [Refining Network Alignment] [Lecture 23 Notes] [REGAL], [RefiNA]
Monday, April 11 Graph Alignment Refinment, Summarization, and Compression See papers from last time [Lecture 24 Notes]
Wednesday, April 13 Label Propagation and Graph Neural Networks [Correct and Smooth] [Lecture 25 Notes]
Monday, April 18, [Project Rubric] Imaging Modalities and Spatial Regularization [LEAPH] Lecture 26 Notes]
Wednesday, April 20 Technical Writing in Comp Bio and Wrap-Up, Summary of the Semester wrt to graphs in biomedicine [Watch : How to be a Machine Learning Biologist], [Graph Representation Learning in Biomedicine] Lecture 27 Notes]
Monday, April 25 [zoom option available] Project Presentations Day 1
Wednesday, April 27 [No zoom option today] Project Presentations Day 2
May 5 Final Project Writeups Due

Key Dates

  • Homework 1 Due: February 23 (assigned by February 9)
  • Project Proposal Due: March 9
  • Homework 2 Due: April 20
  • Final Project Due: Final Exam Day (May 5)

Homework, Project, Reading, Grading, Etc

Homework

There will be two homework assignments to practice implementing particular concepts. Often, things can become a bit easier to understand and use when they are implemented by you. I will provide code and hints in Python, but will be happy to read/run code written in Python, R, Julia, or Matlab. Please submit your homework writeup as a PDF.

Background Resources

Most of what we discuss in class will come from papers. However, I suggest the following textbooks as background references. Conveniently, they are also available for free.

  • [PRML] Pattern Recognition and Machine Learning-- Chris Bishop [Link]

  • [SLMP] Spectral Learning on Matrices and Tensors -- Majid Janzamin et al. [Link]

  • The Matrix Cookbook [Link]

  • [PML] Probabilistic Machine Learning: An Introduction. -- Kevin Murphy [Link]

  • [GRL] Graph Representation Learning -- William Hamilton [Link

  • My favorite general linear Algebra Resource: Matrix Computational by Golub and Van Loan

Readings

Each student much choose two class meetings to answer reading questions for one of the discussed papers. Please answer the following questions and send the responses in PDF format to me by the beginning of our class meeting.

Reading Questions

Please choose 2 papers over the course of the semester to do this for, and turn them in before our class meeting 9:30 am to natalies+comp790@cs.unc.edu.

  1. Please explain in 2 sentences or less what the problem being solved is.

  2. What were the main contributions of the authors in this work? (You can answer in a few bullet points).

  3. Please describe 1-2 computational experiments that the authors implemented to test their method.

  4. Were the authors the first to attempt this particular problem? If not, did they compare their results to other baselines? Do you think that their evaluation was objective?

  5. Do you think that the authors provided enough evidence for why their developed method is an important contribution? If yes, please describe their reasoning here. If you do not think they adequately justified why they worked on this particular problem, please describe your thoughts on that here.

  6. What is one follow-up idea or extension from this work?

Final Project

I will provide you with several examples of publicly available biological datasets and problems (https://github.com/natalies-teaching/Comp790-166-CompBio-Spring2022/blob/main/Datasets.md). Half-way through the semester, you will submit your project proposal and present your idea to the class. The proposal will be a short document describing 1) The problem 2) A background on other people's attempts to solve this problem and 3) A background on your idea of a solution and 4) the data you will use to test your method. At the end of the semester you will write a short paper explaining your method and results and present your results.

Grading

Grading will be based on the following

  1. Reading Questions : 15% -- sufficiently 2 completed during the semester
  2. Homework 1: 20%
  3. Homework 2: 20%
  4. Project Proposal : 10%
  5. Project final writeup: 30%
  6. Class Participation and Attendance : 5%

Lateness Policy

I understand if things come up. If you need more time on your homework or project proposal, you are welcome to talk to me about it. However, if you simply turn in your homework late without any prior notice, I will deduct 10% of the points per day. There will be no late submissions of the course project permitted.

Mask Usage

I don't anticipate any problems with this, but if you do intent to join the class in-person, please make sure to wear a face mask covering your nose and mouth.

Accessibility Statement

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities. Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email ars@unc.edu.

(source: https://ars.unc.edu/faculty-staff/syllabus-statement)

Diversity Statement

I value the perspectives of individuals from all backgrounds reflecting the diversity of our students. I broadly define diversity to include race, gender identity, national origin, ethnicity, religion, social class, age, sexual orientation, political background, and physical and learning ability. I strive to make this classroom an inclusive space for all students. Please let me know if there is anything I can do to improve, I appreciate suggestions.

About

Course page for comp790-166 computational biology in spring 2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages