Tripathy lab RNAseq data processing and QC pipeline

This repository holds documentation and pseudo-tutorials for RNAseq (including bulk, sc/snRNAseq, and patch-seq) data processing and quality control methods in the Tripathy lab at the Krembil Centre for Neuroinformatics in the Centre for Addiction and Mental Health (CAMH). Some of the contents in this document is specific to the CAMH high-performance computing cluster (SCC), though most of the information should be generally applicable.

Data processing

Overview

Data processing of RNAseq data generally involves transforming the data from random fragments of RNA sequences into human-readable information. Most commonly, we perform alignment of RNAseq reads to an annotated reference genome followed by quantification to produce count matrices. We can also produce plots of aligned RNAseq reads to better understand mRNA species and composition in samples.

Currently, we have four pipeline-tutorials at various stages of development:

STAR-RSEM for alignment and quantification (also recommended for beginners to RNAseq data processing): https://github.com/sonnyc247/PSQ_Pipeline/tree/master/Code/Alignment/STAR-RSEM
STAR-generation of coordinated-sorted BAM files for gene quantification and read-pile-up plots: https://github.com/sonnyc247/PSQ_Pipeline/tree/master/Code/Alignment/STAR-SumOverlaps-Sashimi
Kallisto for isoform-level pseudo-alignment and quantification: https://github.com/sonnyc247/PSQ_Pipeline/tree/master/Code/Alignment/Isoform-Quantifiers/Kallisto
Salmon for isoform-level pseudo-alignment and quantification: https://github.com/sonnyc247/PSQ_Pipeline/tree/master/Code/Alignment/Isoform-Quantifiers/Salmon

Quality assessment and control

Overview

It is difficult to know form processed data whether the RNAseq data from a particular experiment or sample is of high or low quality. Information from before, during, and after RNAseq data processing can be used in combination to assess the quality of RNAseq data from a particular sample.

Currently, this work and pipeline is in development. Some example code and tasks that we currently use to assess RNAseq quality are stored in https://github.com/sonnyc247/PSQ_Pipeline/tree/master/Code/Quality_Assessment.

Acknowledgements

Special thanks to Justin Chee (https://github.com/cheejus2), Jordan Sicherman (https://github.com/jsicherman), and Derek Howard (https://github.com/derekhoward) for help and consultation in developing this pipeline.

This work was supported in part by funding provided by Brain Canada, in partnership with Health Canada, for the Canadian Open Neuroscience Platform initiative.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Code		Code
Data		Data
.gitignore		.gitignore
PSQ_Pipeline.Rproj		PSQ_Pipeline.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tripathy lab RNAseq data processing and QC pipeline

Data processing

Overview

Quality assessment and control

Overview

Acknowledgements

About

Releases

Packages

Languages

sonnyc247/PSQ_Pipeline

Folders and files

Latest commit

History

Repository files navigation

Tripathy lab RNAseq data processing and QC pipeline

Data processing

Overview

Quality assessment and control

Overview

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages