Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Majority of FPKMs are zero #6

Closed
srithegreat opened this issue Mar 25, 2015 · 2 comments
Closed

Majority of FPKMs are zero #6

srithegreat opened this issue Mar 25, 2015 · 2 comments

Comments

@srithegreat
Copy link

Hi,

I am using the NCBI GRCH38 containing chromosome names as NC_.xxx to which I aligned my RNA-Seq data (101bp paired-end). I have used the annotation gff with same chromosome IDs, but with the -b option, all my FPKM values are zero. I was wondering if it has anything to do with the annotation file?

Srikanth

@gpertea
Copy link
Owner

gpertea commented Mar 27, 2015

Could you please confirm and clarify:

  • you are running at least v1.0.1 ? I think only prior to the v1.0 release we had a bug that would zero the FPKMs when Ballgown output was enabled..
  • in what output file you see these zero FPKM values ? (t_data.ctab, the output transcripts GTF or both?)
  • when you say "my FPKM values", are you referring to some specific target transcripts that you know are expressed in the sample but have their FPKM reported as zero only when you use the -b option ? In other words, without the -b option, are the FPKM values non-zero for the same transcripts? (because that would eliminate your doubts about the annotation file as the cause for this anomaly).

When either of -b/-B options are used, all the transcripts given in the reference annotation file will be reported in the *.ctab files, not just the "expressed" ones. Since the majority of those reference transcripts are not expressed, their FPKMs will be written as 0.000000, so the t_data.ctab file will have a lot of these zero FPKMs, but not all of them should be zero..

It's rather unusual to have genome indexes and annotation using the NC_* accessions instead of the more meaningful chromosome numbers/names.. That should not be a problem for StringTie, I am just saying that maybe it is worth double checking that the chromosome names in the .BAM header do indeed match the ones in the annotation file..

@srithegreat
Copy link
Author

Thanks for the reply.
I am using the latest version. I figured out the cause for zero FPKMs. It
was because the library files I was using was a stranded library and I did
not align initially with proper strandedness. I see that Stringtie does not
have the library type argument anymore.
When I re-aaligned the data with correct strandedness with HISAT and then
re-ran StringTie, now I see non-zero FPKMs.

Regards,
Srikanth

On Thu, Mar 26, 2015 at 10:37 PM, Geo Pertea notifications@github.com
wrote:

Could you please confirm and clarify:

  • you are running at least v1.0.1 ? I think only prior to the v1.0
    release we had a bug that would zero the FPKMs when Ballgown output was
    enabled..
  • in what output file you see these zero FPKM values ? (t_data.ctab,
    the output transcripts GTF or both?)
  • when you say "my FPKM values", are you referring to some specific
    target transcripts that you know are expressed in the sample but have their
    FPKM reported as zero only when you use the -b option ? In other
    words, without the -b option, are the FPKM values non-zero for the
    same transcripts? (because that would eliminate your doubts about the
    annotation file as the cause for this anomaly).

When either of -b/-B options are used, all the transcripts given in the
reference annotation file will be reported in the .ctab files, not just
the "expressed" ones. Since the majority of those reference transcripts are
not expressed, their FPKMs will be written as 0.000000, so the t_data.ctab
file will have a lot of these zero FPKMs, but not *all
of them should be
zero..

It's rather unusual to have genome indexes and annotation using the NC_*
accessions instead of the more meaningful chromosome numbers/names.. That
should not be a problem for StringTie, I am just saying that maybe it is
worth double checking that the chromosome names in the .BAM header do
indeed match the ones in the annotation file..


Reply to this email directly or view it on GitHub
#6 (comment).

Srikanth S. Manda
Research Scholar
Pandey Lab
McKusick-Nathans Institute of Genetic Medicine
Johns Hopkins University School of Medicine
Miller Research Building, Room 560
733 North Broadway
Baltimore, Maryland 21205

@gpertea gpertea closed this as completed Apr 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants