Pipeline of transcriptome data analysis

Overview

This workshop records the whole processing steps of transcriptome data analysis in CC-LY Lab written by Xiangyu Pan and Xuelan Chen. This page would be helpful and easy to be read, which is designed for the new-hand of bioinformaticians. We will try to maintain and update the Pipeline-of-transcriptome in time. And this pipeline is also flexible, you can broaden more analysis steps and tools which could integrate into this page, such as GSEA analysis, TF enrichments, bulk RNA-seq data deconvolution or anything else. We also expect you could add comments and provide some useful requests to improve and optimize this page. Finally, Hope you could have a good grip of the basic transcriptome data analysis.

The analysis pipeline included

1. The pre-processing steps

In this page, GenomicAlignments and Rsamtools were used to quantify the counts of transcriptome data. In old version, we used FPKM and TPM for heatmap visualization and gene set enrichment analysis, however, in latest version, DESeq2 normalized data , which was much better to reduce the effect of gene body and library size, were used to describe the expression pattern of each gene. And the pathways enrichment also based on the DESeq2 normalized data, especially for GSEA processing.

Here, DESeq2 pipeline also was used to identify the differentiated expressed genes. There were some essential parameters to set the cutoff of DEG detecting in this pipeline. The detail information would be explained in following pages. To direct visualize the DEGs' function, clusterprofiler was implemented in this pipeline. GO/KEGG database could be enriched by DEGs with default parameter. Besides, we also integrated the GSEA processing in following page.

Before, we used this pipeline, there were some softwares should be installed:

#STAR
STAR_2.6.0a

#Rscript
R scripting front-end version 3.5.1 (2018-07-02)

Then, you could begin the learning of The alignment of bulk RNA-seq

2. The post-processing steps

After you running out the pre-processing steps, you could directly begin The quantification of genes and the identification of DEG. You could could visit the page by clinking here.

3. The optional methods in transcripts quantification and p-value calculation

3.1 The summary of quantification of transcripts methods

And sometimes, you want to quantify the expression levels of each transcripts in bulk RNA-seq, I suggest you follow next pipeline, quantified by stringtie and/or RSEM.

Part 1. The quantification of transcripts by stringtie

Part 2. The quantification of transcripts by RSEM

3.2 The summary of some statistic methods

When we compared the expression levels of candidate gene in different biology group, statistic power is so important that could determine the confidence of the results. To better support our hypothesis of candidate genes, especially doing analysis in multiple clinical cohorts, we could refer to more methods of p-value calculation.
Here, I had generated a summary of the methods to calculate the p-value in DEG identification. And you also could visit them by clicking here

4. The identification of alternative splicing events

After you learn all the steps mentioned above, you could begin the learning of The identification of alternative splicing events

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.assets		README.assets
T3_vs_T1_3_vs_3_work_file		T3_vs_T1_3_vs_3_work_file
WORKFLOW_RNAseq		WORKFLOW_RNAseq
pvalue_cal.assets		pvalue_cal.assets
step2.assets		step2.assets
step3.assets		step3.assets
step4.assets		step4.assets
AS_identified.md		AS_identified.md
README.md		README.md
pvalue_cal.md		pvalue_cal.md
step1.md		step1.md
step2.md		step2.md
step3.md		step3.md
step4.md		step4.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipeline of transcriptome data analysis

Overview

The analysis pipeline included

1. The pre-processing steps

2. The post-processing steps

3. The optional methods in transcripts quantification and p-value calculation

3.1 The summary of quantification of transcripts methods

3.2 The summary of some statistic methods

4. The identification of alternative splicing events

5. Keep updating

About

Releases

Packages

Contributors 2

Languages

pangxueyu233/Pipeline-of-transcriptome

Folders and files

Latest commit

History

Repository files navigation

Pipeline of transcriptome data analysis

Overview

The analysis pipeline included

1. The pre-processing steps

2. The post-processing steps

3. The optional methods in transcripts quantification and p-value calculation

3.1 The summary of quantification of transcripts methods

3.2 The summary of some statistic methods

4. The identification of alternative splicing events

5. Keep updating

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages