Home

RNA sequence protocol for assessing Alternative Splicing

This repository contains a protocol to analyse RNA-seq data, focusing on alternative splicing & polyadenylation, authored by Oliver Ziff.
The contents are based on multiple resources including:
- RNAseq worksheet
- Biostars handbook
- rnaseq.wiki
- Data Camp
- Coursera
- and most importantly the experience of established experts in RNAseq analysis within the Luscombe lab - my host laboratory.
The protocol utilises a combination of bash unix commmand line and R scripts.
FAQs https://journals.plos.org/ploscompbiol/article/file?type=supplementary&id=info:doi/10.1371/journal.pcbi.1004393.s009
Tools: https://journals.plos.org/ploscompbiol/article/file?type=supplementary&id=info:doi/10.1371/journal.pcbi.1004393.s004

Chapters

RNA seq workflow
Wet-lab RNA sequencing phase
Accessing sequencing data
QC of sequencing files
Alignment
Visualisation in IGV browser
QE of aligned reads
Read quantification
Differential expression analysis
Splicing analysis
Gene enrichment analysis

RNA-seq Workflow

Introduction

The aim of RNA-seq is to interrogate relative transcript abundance and diversity. It's accuracy is superior to microarray and similar to qPCR

Central Dogma of molecular biology Steps of RNA-Seq:

enter image description here Analysis goals:

transcript discovery
genome annotation
alternative expression analysis
gene fusion detection
viral detection
detect RNA editing (CRISP/Cas9)

Wet-lab sequencing phase:

Extract & isolate RNA
Prepare library: break RNA into small fragments, enrich nonribosomal RNA, convert to cDNA, construct fragment library (add sequencing adapters, PCR amplify)
High-throughput Sequence the cDNA library: generate single or paired end reads of 30-300bp in length. Flow cell, base calling & quality score, replicates (technical = multiple lanes in flow cell; biological = multiple samples from each condition)

Preparing RNA seq library enter image description here

Bioinformatic phase:

https://www.biostarhandbook.com/rnaseq/rnaseq-intro.html

Process raw Reads: FATQ files download SRA, quality scores (Phred), paired vs single end sequence, FASTQC quality control, variability, spike-ins, blocking & randomise, filter out low quality reads & artifacts (adapter sequence reads).
Align (map) reads to reference genome (FASTA, GFF, GTF): annotation file (BED), alignment program (STAR, HISAT), reference genomes (GenCODE, Ensemble), generate genome index, create & manipulate BAM/SAM files containing sequence alignment data
Visualise & explore alignment data in IGV and R studio: ggplot2, bias identification QoRTs,
Estimate Read Quantification (abundance) with gene based read counting
Compare abundances between conditions & replicates (differential expression): Normalise, adjust each gene read counts for the total aligned reads within each sample. Summarise data with pairwise correlation, hierarchical clustering, PCA analysis - look for differences between samples & identify outliers to consider excluding.

enter image description here

Compare mutant vs wild type gene expression

Requirements

On the CAMP cluscd ter most packages are preinstalled but to use them you need to use the module load function: ml STAR ml ncbi-vdb ml fastq-tools ml SAMtools ml RSeQC ml QoRTs ml multiqc ml Subread ml Java Use module spider to search for packages.

Install conda and activate bioconda

Installing packages in R install.package("package name") Bioconductor is a free software project for genomic analyses based on R programming. Install Bioconductor Source source ("https://bioconductor.org/biocLite.R") biocLite (“package_name“) biocLite("erccdashboard") # erccdashboard (for artificial spike in quantification) biocLite("DESeq")

Even though packages have been installed into R locally, then need to be brought into the working memory before using them: library("erccdashboard") library("DESeq")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly