New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Majority of FPKM values is 0 ( Stringtie-1.2.3) #61
Comments
Thank you for the well documented report. We took a look at this case and we acknowledge that it exposed some issues that we are going to correct (like the missing gene_id causing some ugly "(null)" entries and "inf" TPM values there -- these are going to be fixed soon). I guess this could be considered a way of warning you that your reference annotation might be wrong (or incomplete) here -- which is in fact the main problem with the -e option, as StringTie is forced to take the given transcripts for granted as the valid, true set of transcripts being sampled by the RNA-Seq experiment; but if that's wrong... then the GIGO principle might kick in :). On the other hand if you allow StringTie to look at the "evidence" (the actual read alignments) and assemble the transcripts by itself, you'll see that there will be some transcripts generated there which do overlap the reference annotation significantly but it will have different start-end coordinates - however the cov and FPKM estimates will be non-zero for those assembled transcripts. |
@gpertea , thanks for the thorough explanation. It's very enlightening.
I made a mistake with the strandness option in HISAT2, this is probably one of the causes for the alignments to be discarded -- The samples were prepared with the Illumina TruSeq stranded mRNA HT library preparation kit, and I recently learnt that the option should be switched to
As we were only interested in quantifying the expression of those transcripts-of-interest with Stringtie, this got missed out in the initial plan! We'll try it out and make a comparison with the current annotation. Thanks for the suggestion. Cheers and have a nice weekend, |
Hiya,
We are using Stringtie version 1.2.3 to compute the relative expressions of a few transcripts-of-interest, these are all single-exon transcripts. We found that most of the computed FPKM values are 0, even though there are reads mapping to the annotated region. This is in both the t_data.ctab and the output transcripts GTF file. The link to a folder containing all the files (incl. a subset of the bam and gff files) is here
We used the following command right after mapping the reads to the reference genome using HISAT with the
--rna-strandness FR
option,stringtie -e -b $sampleName -G $annotation $bam -p 2 -o $output
The alignment suggests that there are reads mapping to our region of interest, so does the e_data.ctab file.
Running stringtie in default, without the annotation file produces the following result.
Please can you have a look?
Cheers and thanks in advance,
Joanne
The text was updated successfully, but these errors were encountered: