- Welcome to the EPP 575 RNA Sequencing Workshop wiki!
- Modality - Zoom/Videos
- Code of Conduct
- Course data
- Day 1: Thursday, Jan. 13
- Day 2: Friday, Jan. 14
- Day 3: Tuesday, Jan. 18
- Day 4: Wednesday, Jan. 19
- Day 5: Thursday, Jan. 20
- Day 6: Friday, Jan. 21
- Previous course materials
Clone this wiki locally
The purpose of this wiki page is to provide a resource for students and instructors for the Introduction to Linux and RNASeq Workshop.
Catalog Information: EPP 575
Dates: January 13-14 and 18-21, 9 am to 11:30 am
Location: Ag Campus, PBB 160
For registered students, please review the syllabus, which contains important information on attendance and grading.
Everything is now on Zoom. The instructor has emailed the link out to everyone, please email if you need it to be resent.
We will host recordings of all classes for anyone who is unable to attend class or wishes to review the material at a later time. The recordings of the classes will be posted below.
This one-credit course meets over a period of two weeks. The overall goal of the section is to provide an introduction to computational analysis of RNA sequencing data. The first two classes, taught on January 13 and 14, 2022, cover the basics of command line programming on UTIA’s Computational Resources server. The remainder of the course, taught from January 18 to 21, 2022, utilizes these basics in analyzing RNA sequencing data. Basic steps such as quality assessment, read mapping, and differential expression analysis will be covered.
We will use the following software in this class to process mRNA reads. While we picked what we consider to be respected community standards, each step of the pipeline has many viable alternatives, as we will discuss during the class.
- FastQC - Examine read quality metrics
- Skewer - Trim adapters and low quality bases from reads
- STAR - Map reads to reference genome
- samtools - Manipulate sam/bam files, light weight alignment viewer
- IGV - Sophisticated alignment viewer
- HTseq-count - Quantify reads per feature (ie gene)
- DESeq2 - Statistical testing of differential gene expression
- BLAST - Search reads against a sequence database for sequence similarity
It is strongly recommended that everyone have:
- Basic Linux experience.
- Basic R experience.
- An understanding of RNA from a biological standpoint and RNA sequencing from a laboratory perspective.
Students are responsible for bringing their laptops to class each day.
This course has been designed and will be taught by a group of students, staff, and faculty at UT:
- Meg Staton, Assistant Professor, Entomology and Plant Pathology
- Matthew Huff, Research Associate in Bioinformatics, Entomology and Plant Pathology
- Lav Yadav, Postdoctoral Scholar, Entomology and Plant Pathology
- Tara Rickman, Research Associate, Entomology and Plant Pathology
- Ryan Kuster, PhD student in Entomology and Plant Pathology
- Trinity Hamm, PhD student in Entomology and Plant Pathology
We value the participation of every member of the scientific community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to others. Any concerns should be emailed directly to Meg Staton.
This course will utilize publicly available data associated with Arabidopsis thaliana, a well-understood model organism.
- Understand grading and learning environment for the course
- Have a working account on Centaur and access to the class project directory
- Develop (or review) basic usage of the Linux command line
|9:00am - Attendance and Introductory Slides and Syllabus Overview||Meg|
|9:30am - Log in and start Get Set Up, Software Carpentry Lesson 1 - Intro to the Shell||Meg|
|9:40am - Software Carpentry Lesson 2 - Navigating Files and Directories||Meg|
|10:15am - 10 minute break|
|10:25am -Software Carpentry Lesson 3 - Working with Files and Directories with Practice Questions||Trinity (Backup: Meg)|
- Create a file using the 'touch' command that is named yourusername.txt. Take a screenshot of your terminal and send it to Meg and Matt via email or slack.
Recording of the class session. - This is hosted via google drive.
- Understand the power and utility of high performance computational systems by seeing a BLAST example
- Become comfortable moving around in a Linux filesystem
|9:00am - Directory Organization and Lab Notebooks||Meg|
|9:20am - Finish Software Carpentry Lesson 3 - Working with Files and Directories with Practice Questions||Trinity|
|9:45am - Break|
|10:00am - ISAAC - the High Performance Cluster for UT||Victor Hazelwood, Director of High Performance & Scientific Computing (HPSC)|
|10:30am - Lab: BLAST and Transferring Files - SCP Slides and SCP Lab||Meg|
|11:00am - Software Carpentry Lesson 4 - Pipes and Filters||Meg|
- Understand why quality assessment of reads matters and what metrics are important
- Understand the reasons why read trimming may be needed and when it isn't
- Learn to perform quality assessment and trimming of reads
|9:00am - Lab: BLAST and Transferring Files - SCP Slides and SCP Lab||Meg|
|9:30am - Slides: Introduction to RNA Sequencing||Matt|
|Discuss FASTQ/FASTA formats, paired-end vs single-end||Matt|
|10 minute break|
|10:00am - Lab: Quality Assessment (FastQC/MultiQC)||Matt|
|10:40am - Slides: Trimming||Ryan|
|11:00am - Lab: Trimming reads||Ryan|
- Quality Assessment: Send the file labeled EPP575_raw_multiqc_report.html to firstname.lastname@example.org and email@example.com. This file must contain quality assessment information of both read pairs for sample SRR17062759.
- Trimming Reads: Comment on the differences you find in the FastQC html reports on the 'trimming_example' data before and after trimming with Skewer. Send a brief summary (5 sentences or fewer) to firstname.lastname@example.org and email@example.com.
- Understand the general algorithm behind read mapping and how to use a read mapping software program on the command line
- Apply basic principles of experimental design to evaluate a transcriptome project
- Visualize reads mapped to a genome and use that visualization to evaluate if read mapping was successful
|9:00am - Slides: Read Mapping||Lav|
|Discuss GFF3 format|
|9:30am - Lab: Read mapping (STAR)|
|As reads map, Slides discussing SAM/BAM/CRAM formats|
|10 minute break|
|10:30 - Slides: Visualizing Mapped Reads||Matt|
|11:00 - Lab: Visualization (IGV)|
- Search for MAX2's position in IGV, take a screenshot of the whole screen, and send this screenshot to firstname.lastname@example.org and email@example.com.
- Learn the steps involved in converting mapped reads into gene expression data
- Apply htseq to our mapped data and produce read counts necessary for gene expression analysis
- Consider automation of the approaches learned so far in this course
|9:00am - Slides: Counting Reads||Ryan|
|9:15am - Lab: Counting Reads (HTSeq-count)||Ryan|
|9:30am - Slides: Transcriptome Project Design (Slides 1-17)||Matt|
|10 minute break|
|10:15am - Slides: Scaling Up Processes (Optional)||Matt|
|11:00am - UTIACR Calendar and RClone (Slides 18-26)||Matt|
Additional training on For Loops: Software Carpentry.
- Become comfortable with the concept of read normalization
- Apply DESeq2 to count data to detect significant changes in gene expression
- Visualize gene expression patterns across samples
- Gain intuition on networks of gene interactions using gene ontology (GO)
|9:00am - Slides: DESeq2 Differential Expression analysis||Matt and Lav|
|9:30am - Lab: Identify Differentially Expressed Genes (DESeq2)|
|10 minute break|
|10:45am - Slides: GO Enrichment; Lab: agriGO analysis||Tara|
|11:15am - Closing thoughts|