Skip to content

python scripts and software command line arguments used to assess expression estimate performance for different sequencing strategies

Notifications You must be signed in to change notification settings

harvardinformatics/rnaseq_readlength_assessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

Assessing sequencing strategy effects on expression estimator performance

This repository accompanies our draft manuscript "Short paired-end reads trump long single-end reads for expression analysis", which summarizes our findings concerning the robustness of gene and isoform-level expression estimates and downstream differential expression results generated with alternative RNA-seq sequencing strategies. The primary focus of this work was to determine whether short paired-end sequencing outperforms single-end sequencing of longer reads, when both strategies sequence the same number of nucleotides, such that they are equivalent in cost. When one has an annotated genome, a common practice is to obtaine gene-level expression estimates using single-end reads, particularly if there are budgetary constraints. Yet, this approach does not take advantage of the information contained in the full library fragment, i.e. the potentially greater mapping specificity that would accompany sequencing reads from both ends of a fragment. While sequencing the same read length for single and paired-end strategies would lead to two times the cost for the latter, our intuition was that shorter paired-end reads might lead to improvements in expression estimates over single-end sequencing, over and above any performance penalties resulting from shorter reads. We demonstrate that this is in fact the case.

This repository provides generic command line arguments for kallisto and RSEM, the expression estimation tools we utilized. Because we also evaluated downstream effects on differential expression tests, we provide example R scripts for sleuth, which we use to analyze kallisto expression estimates, and limma-voom, which we use to analyze RSEM-derived estimates.

About

python scripts and software command line arguments used to assess expression estimate performance for different sequencing strategies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages