Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gffcompare run on stringtie --merge gtf file results in zeros in the summary text #21

Closed
adeslatt opened this issue Jul 6, 2017 · 2 comments

Comments

@adeslatt
Copy link

adeslatt commented Jul 6, 2017

Hello,

I have a problem of empty output for the
Each set of paired end sequence data were aligned using hisat2 -- according to the published protocol for string tie. And then stringtie run on them with a command similar to:

stringtie -B -G Homo_sapiens.GRCh38.88.gtf rnaseq1.bam -o rnaseq1.gtf
stringtie -B -G Homo_sapiens.GRCh38.88.gtf rnaseq2.bam -o rnaseq2.gtf
...
stringtie -B -G Homo_sapiens.GRCh38.88.gtf rnaseqn.bam -o rnaseqn.gtf

putting these files in a text file I call stringtiefilelist

rnaseq1.gtf
rnaseq2.gtf
...
rnaseqn.gtf

I run stringtie merge as follows:

stringtie --merge  -G Homo_sapiens.GRCh38.88.gtf  -o stringtie_merged.gtf stringtiefilelist

Each of these steps produces expected output. But then when I run gffcompare

gffcompare -r gencode.v.25.annotation.gtf -o stringtie_merged.txt stringtie_merged.gtf

I get empty output so to speak

#= Summary for dataset: stringtie_merged.gtf
#     Query mRNAs :  238442 in   34534 loci  (238442 multi-exon transcripts)
#            (21005 multi-transcript loci, ~6.9 transcripts per locus)
# Reference mRNAs :  171986 in   33152 loci  (171986 multi-exon)
# Super-loci w/ reference transcripts:        0
#-----------------| Sensitivity | Precision  |
        Base level:     0.0     |     0.0    |
        Exon level:     0.0     |     0.0    |
      Intron level:     0.0     |     0.0    |
Intron chain level:     0.0     |     0.0    |
  Transcript level:     0.0     |     0.0    |
       Locus level:     0.0     |     0.0    |

     Matching intron chains:       0
       Matching transcripts:       0
              Matching loci:       0

          Missed exons:  544338/544338  (100.0%)
           Novel exons:  634910/634910  (100.0%)
        Missed introns:  350157/350157  (100.0%)
         Novel introns:  398915/398915  (100.0%)
           Missed loci:   33152/33152   (100.0%)
            Novel loci:   34534/34534   (100.0%)

Is there perhaps a problem in the parameters I am using?

Do I need to convert the gtf file to a gff file? A successful run I had used as input a gff file. Perhaps this is the reason.

@gpertea
Copy link
Owner

gpertea commented Jul 6, 2017

Does the chromosome naming convention match with the reference ? Because if not, that would be the best way to miss everything, like it seems to be the case here.

@adeslatt
Copy link
Author

adeslatt commented Jul 6, 2017

That was it!

@gpertea gpertea closed this as completed Jul 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants