Skip to content
MattHuff edited this page Jan 20, 2022 · 51 revisions

Welcome to the EPP 575 RNA Sequencing Workshop wiki!

The purpose of this wiki page is to provide a resource for students and instructors for the Introduction to Linux and RNASeq Workshop.

Catalog Information: EPP 575

Dates: January 13-14 and 18-21, 9 am to 11:30 am

Location: Ag Campus, PBB 160

For registered students, please review the syllabus, which contains important information on attendance and grading.

Modality - Zoom/Videos

Everything is now on Zoom. The instructor has emailed the link out to everyone, please email if you need it to be resent.

We will host recordings of all classes for anyone who is unable to attend class or wishes to review the material at a later time. The recordings of the classes will be posted below.


This one-credit course meets over a period of two weeks. The overall goal of the section is to provide an introduction to computational analysis of RNA sequencing data. The first two classes, taught on January 13 and 14, 2022, cover the basics of command line programming on UTIA’s Computational Resources server. The remainder of the course, taught from January 18 to 21, 2022, utilizes these basics in analyzing RNA sequencing data. Basic steps such as quality assessment, read mapping, and differential expression analysis will be covered.


We will use the following software in this class to process mRNA reads. While we picked what we consider to be respected community standards, each step of the pipeline has many viable alternatives, as we will discuss during the class.

  • FastQC - Examine read quality metrics
  • Skewer - Trim adapters and low quality bases from reads
  • STAR - Map reads to reference genome
  • samtools - Manipulate sam/bam files, light weight alignment viewer
  • IGV - Sophisticated alignment viewer
  • HTseq-count - Quantify reads per feature (ie gene)
  • DESeq2 - Statistical testing of differential gene expression
  • BLAST - Search reads against a sequence database for sequence similarity


It is strongly recommended that everyone have:

  • Basic Linux experience.
  • Basic R experience.
  • An understanding of RNA from a biological standpoint and RNA sequencing from a laboratory perspective.

Students are responsible for bringing their laptops to class each day.


This course has been designed and will be taught by a group of students, staff, and faculty at UT:

  • Meg Staton, Assistant Professor, Entomology and Plant Pathology
  • Matthew Huff, Research Associate in Bioinformatics, Entomology and Plant Pathology
  • Lav Yadav, Postdoctoral Scholar, Entomology and Plant Pathology
  • Tara Rickman, Research Associate, Entomology and Plant Pathology
  • Ryan Kuster, PhD student in Entomology and Plant Pathology
  • Trinity Hamm, PhD student in Entomology and Plant Pathology

Code of Conduct

We value the participation of every member of the scientific community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to others. Any concerns should be emailed directly to Meg Staton.

Course data

This course will utilize publicly available data associated with Arabidopsis thaliana, a well-understood model organism.


Day 1: Thursday, Jan. 13


  • Understand grading and learning environment for the course
  • Have a working account on Centaur and access to the class project directory
  • Develop (or review) basic usage of the Linux command line
Material Instructor
9:00am - Attendance and Introductory Slides and Syllabus Overview Meg
9:30am - Log in and start Get Set Up, Software Carpentry Lesson 1 - Intro to the Shell Meg
9:40am - Software Carpentry Lesson 2 - Navigating Files and Directories Meg
10:15am - 10 minute break
10:25am -Software Carpentry Lesson 3 - Working with Files and Directories with Practice Questions Trinity (Backup: Meg)


  • Create a file using the 'touch' command that is named yourusername.txt. Take a screenshot of your terminal and send it to Meg and Matt via email or slack.

Recording of the class session. - This is hosted via google drive.

Day 2: Friday, Jan. 14


  • Understand the power and utility of high performance computational systems by seeing a BLAST example
  • Become comfortable moving around in a Linux filesystem
Material Instructor
9:00am - Directory Organization and Lab Notebooks Meg
9:20am - Finish Software Carpentry Lesson 3 - Working with Files and Directories with Practice Questions Trinity
9:45am - Break
10:00am - ISAAC - the High Performance Cluster for UT Victor Hazelwood, Director of High Performance & Scientific Computing (HPSC)
10:30am - Lab: BLAST and Transferring Files - SCP Slides and SCP Lab Meg
11:00am - Software Carpentry Lesson 4 - Pipes and Filters Meg


  • TBD

Recording of the class session.

Day 3: Tuesday, Jan. 18


  • Understand why quality assessment of reads matters and what metrics are important
  • Understand the reasons why read trimming may be needed and when it isn't
  • Learn to perform quality assessment and trimming of reads
Material Instructor
9:00am - Lab: BLAST and Transferring Files - SCP Slides and SCP Lab Meg
9:30am - Slides: Introduction to RNA Sequencing Matt
Discuss FASTQ/FASTA formats, paired-end vs single-end Matt
10 minute break
10:00am - Lab: Quality Assessment (FastQC/MultiQC) Matt
10:40am - Slides: Trimming Ryan
11:00am - Lab: Trimming reads Ryan


  • Quality Assessment: Send the file labeled EPP575_raw_multiqc_report.html to and This file must contain quality assessment information of both read pairs for sample SRR17062759.
  • Trimming Reads: Comment on the differences you find in the FastQC html reports on the 'trimming_example' data before and after trimming with Skewer. Send a brief summary (5 sentences or fewer) to and

Recording of the class session.

Day 4: Wednesday, Jan. 19


  • Understand the general algorithm behind read mapping and how to use a read mapping software program on the command line
  • Apply basic principles of experimental design to evaluate a transcriptome project
  • Visualize reads mapped to a genome and use that visualization to evaluate if read mapping was successful
Material Instructor
9:00am - Slides: Read Mapping Lav
Discuss GFF3 format
9:30am - Lab: Read mapping (STAR)
As reads map, Slides discussing SAM/BAM/CRAM formats
10 minute break
10:30 - Slides: Visualizing Mapped Reads Matt
11:00 - Lab: Visualization (IGV)


Recording of the class session.

Day 5: Thursday, Jan. 20


  • Learn the steps involved in converting mapped reads into gene expression data
  • Apply htseq to our mapped data and produce read counts necessary for gene expression analysis
  • Consider automation of the approaches learned so far in this course
Material Instructor
9:00am - Slides: Counting Reads Ryan
9:15am - Lab: Counting Reads (HTSeq-count) Ryan
9:30am - Slides: Transcriptome Project Design (Slides 1-17) Matt
10 minute break
10:15am - Slides: Scaling Up Processes (Optional) Matt
11:00am - UTIACR Calendar and RClone (Slides 18-26) Matt


Additional training on For Loops: Software Carpentry.

Recording of the class session.

Day 6: Friday, Jan. 21


  • Become comfortable with the concept of read normalization
  • Apply DESeq2 to count data to detect significant changes in gene expression
  • Visualize gene expression patterns across samples
  • Gain intuition on networks of gene interactions using gene ontology (GO)
Material Instructor
9:00am - Slides: DESeq2 Differential Expression analysis Matt and Lav
9:30am - Lab: Identify Differentially Expressed Genes (DESeq2)
10 minute break
10:45am - Slides: GO Enrichment; Lab: agriGO analysis Tara
11:15am - Closing thoughts

Homework: How many significant GO terms are reported in the agriGO analysis? Send your response to and

Previous course materials

2019 RNASeq Workshop Homepage