Skip to content

pangxueyu233/Pipeline-of-transcriptome

Repository files navigation

Pipeline of transcriptome data analysis

Overview

This workshop records the whole processing steps of transcriptome data analysis in CC-LY Lab written by Xiangyu Pan and Xuelan Chen. This page would be helpful and easy to be read, which is designed for the new-hand of bioinformaticians. We will try to maintain and update the Pipeline-of-transcriptome in time. And this pipeline is also flexible, you can broaden more analysis steps and tools which could integrate into this page, such as GSEA analysis, TF enrichments, bulk RNA-seq data deconvolution or anything else. We also expect you could add comments and provide some useful requests to improve and optimize this page. Finally, Hope you could have a good grip of the basic transcriptome data analysis.

The analysis pipeline included

  • Alignment
  • Transcription quantification
    • GenomicsFeatures and Rsamtools
    • Stringtie
    • RSEM
  • DEG identification
    • The summary of the methods to calculate the p-value
  • GO/KEGG enrichment
  • GSEA
  • Alternative splicing
  • Motif/TF identification
  • RNA editing
  • Mutation
  • et al.

1. The pre-processing steps

In this page, GenomicAlignments and Rsamtools were used to quantify the counts of transcriptome data. In old version, we used FPKM and TPM for heatmap visualization and gene set enrichment analysis, however, in latest version, DESeq2 normalized data , which was much better to reduce the effect of gene body and library size, were used to describe the expression pattern of each gene. And the pathways enrichment also based on the DESeq2 normalized data, especially for GSEA processing.

Here, DESeq2 pipeline also was used to identify the differentiated expressed genes. There were some essential parameters to set the cutoff of DEG detecting in this pipeline. The detail information would be explained in following pages. To direct visualize the DEGs' function, clusterprofiler was implemented in this pipeline. GO/KEGG database could be enriched by DEGs with default parameter. Besides, we also integrated the GSEA processing in following page.

  • Before, we used this pipeline, there were some softwares should be installed:
#STAR
STAR_2.6.0a

#Rscript
R scripting front-end version 3.5.1 (2018-07-02)

2. The post-processing steps

After you running out the pre-processing steps, you could directly begin The quantification of genes and the identification of DEG. You could could visit the page by clinking here.

3. The optional methods in transcripts quantification and p-value calculation

3.1 The summary of quantification of transcripts methods

3.2 The summary of some statistic methods

  • When we compared the expression levels of candidate gene in different biology group, statistic power is so important that could determine the confidence of the results. To better support our hypothesis of candidate genes, especially doing analysis in multiple clinical cohorts, we could refer to more methods of p-value calculation.
  • Here, I had generated a summary of the methods to calculate the p-value in DEG identification. And you also could visit them by clicking here

4. The identification of alternative splicing events

5. Keep updating

About

The pipeline of transcriptome analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages