This workshop records the whole processing steps of transcriptome data analysis in CC-LY Lab written by Xiangyu Pan and Xuelan Chen. This page would be helpful and easy to be read, which is designed for the new-hand of bioinformaticians. We will try to maintain and update the Pipeline-of-transcriptome
in time. And this pipeline is also flexible, you can broaden more analysis steps and tools which could integrate into this page, such as GSEA analysis, TF enrichments, bulk RNA-seq data deconvolution or anything else. We also expect you could add comments and provide some useful requests to improve and optimize this page. Finally, Hope you could have a good grip of the basic transcriptome data analysis.
- Alignment
- Transcription quantification
-
GenomicsFeatures
andRsamtools
-
Stringtie
-
RSEM
-
- DEG identification
- The summary of the methods to calculate the p-value
- GO/KEGG enrichment
- GSEA
- Alternative splicing
- Motif/TF identification
- RNA editing
- Mutation
- et al.
In this page, GenomicAlignments
and Rsamtools
were used to quantify the counts of transcriptome data. In old version, we used FPKM and TPM for heatmap visualization and gene set enrichment analysis, however, in latest version, DESeq2 normalized data , which was much better to reduce the effect of gene body and library size, were used to describe the expression pattern of each gene. And the pathways enrichment also based on the DESeq2 normalized data, especially for GSEA processing.
Here, DESeq2 pipeline
also was used to identify the differentiated expressed genes. There were some essential parameters to set the cutoff of DEG detecting in this pipeline. The detail information would be explained in following pages. To direct visualize the DEGs' function, clusterprofiler
was implemented in this pipeline. GO/KEGG database could be enriched by DEGs with default parameter. Besides, we also integrated the GSEA processing in following page.
- Before, we used this pipeline, there were some softwares should be installed:
#STAR
STAR_2.6.0a
#Rscript
R scripting front-end version 3.5.1 (2018-07-02)
- Then, you could begin the learning of The alignment of bulk RNA-seq
After you running out the pre-processing steps
, you could directly begin The quantification of genes and the identification of DEG. You could could visit the page by clinking here.
-
And sometimes, you want to quantify the expression levels of each transcripts in bulk RNA-seq, I suggest you follow next pipeline, quantified by
stringtie
and/orRSEM
.
- When we compared the expression levels of candidate gene in different biology group, statistic power is so important that could determine the confidence of the results. To better support our hypothesis of candidate genes, especially doing analysis in multiple clinical cohorts, we could refer to more methods of p-value calculation.
- Here, I had generated a summary of the methods to calculate the p-value in DEG identification. And you also could visit them by clicking here
- After you learn all the steps mentioned above, you could begin the learning of The identification of alternative splicing events