Issue with translating genes on complement strand #1

reubwn · 2017-11-24T11:47:10Z

Hello,

I've come across an issue with how CDS features are printed for genes encoded on the complementary strand. The problem manifests itself clearly when using the --translate flag, as it produces lots of erroneous translations riddled with stop codons *.

I give an example below.

The EMBL output for an affected gene looks like:

FT   gene            complement(123273..128445)
FT                   /locus_tag="BANY_locus6"
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.g00007"
FT   mRNA            complement(join(128366..128445,126919..127115,124188..124
FT                   406,123273..123484))
FT                   /locus_tag="BANY_locus6"
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.t00007"
FT   exon            complement(128366..128445)
FT                   /locus_tag="BANY_locus6"
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.t00007-E1"
FT   exon            complement(126919..127115)
FT                   /locus_tag="BANY_locus6"
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.t00007-E2"
FT   exon            complement(124188..124406)
FT                   /locus_tag="BANY_locus6"
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.t00007-E3"
FT   exon            complement(123273..123484)
FT                   /locus_tag="BANY_locus6"
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.t00007-E4"
FT   CDS             complement(join(<128366..128445,126919..127115,124188..12
FT                   4406,123273..>123484))
FT                   /locus_tag="BANY_locus6"
FT                   /codon_start=1
FT                   /note="source:GenomeHubs"
FT                   /note="ID:BANY.1.2.t00007-CDS"
FT                   /translation="QKFI*SNIWC*HLVIRS*TTNALTLVCVTFSACRRGSSIRCRVVS
FT                   LHVAAALSSRAMEIPPRAMTTPL*VSS*QTNMDRE*RASNDRHTVVQRNVWRTCEDRKI
FT                   DS*RRNSNRKRLSV*GRCR*CCF*MWFR*L**MGSSYKL*FGEKCEIIKISKPIKSHWA
FT                   KENNLNLNELLSDGEYKELYRLAMIKWSEDMREKDYGCFCRAACENDVSTSNFTVQR*E
FT                   KVWQRFFN*SLKRK"
FT                   /transl_table=1

The mRNA feature looks fine, but there are some puzzling < and > characters in the CDS feature that I think may be the problem. The translation is then subsequently messed up, and in fact appears to be the translation for the exons in reverse order, as QKFI* corresponds to the first 4 "codons" of the last exon (E4, 123273..123484).

Hopefully an easy issue, and thanks for a great tool, this is going to extremely useful :-)

Or maybe something funny in the GFF? the entry for this gene is:

BANY00001       GenomeHubs      gene    123273  128445  .       -       .       ID=BANY.1.2.g00007
BANY00001       GenomeHubs      mRNA    123273  128445  .       -       .       ID=BANY.1.2.t00007;Parent=BANY.1.2.g00007
BANY00001       GenomeHubs      exon    128366  128445  .       -       .       ID=BANY.1.2.t00007-E1;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      exon    126919  127115  .       -       .       ID=BANY.1.2.t00007-E2;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      exon    124188  124406  .       -       .       ID=BANY.1.2.t00007-E3;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      exon    123273  123484  .       -       .       ID=BANY.1.2.t00007-E4;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      CDS     128366  128445  .       -       0       ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      CDS     126919  127115  .       -       2       ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      CDS     124188  124406  .       -       1       ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
BANY00001       GenomeHubs      CDS     123273  123484  .       -       1       ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007

Running biopython version: 1.67 and bcbio-gff version: 0.6.4

The text was updated successfully, but these errors were encountered:

Juke34 · 2017-11-24T13:21:04Z

Interesting we will look at it. Could you provide the fasta for the sequence BANY00001 ?

Thank you for having pointed that.

reubwn · 2017-11-24T14:22:09Z

No problem, here's the file. Cheers!

BANY00001.fa.gz

Juke34 · 2017-11-24T18:22:45Z

It looks like it's due to your odd gff3. Indeed, when you have a gene on the minus strand, the sub-features (exons and cds) are inverse sorted. I will add a fix to systematically sort those sub-features when we go through them to avoid such cases.

reubwn · 2017-11-24T18:59:19Z

Ah, gff3, the file format with no fixed format... This gff was downloaded directly from an ensembl database (ensembl.lepbase.org) too. Could you say what exactly is odd about it?

Thanks for the fix!

Juke34 · 2017-11-24T20:05:08Z

Usually all the cds or exon features are sorted in increasing order of their locations, no matter their strand.

Juke34 · 2017-11-27T09:57:29Z

Issue fixed.

Juke34 closed this as completed Nov 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with translating genes on complement strand #1

Issue with translating genes on complement strand #1

reubwn commented Nov 24, 2017

Juke34 commented Nov 24, 2017 •

edited

Loading

reubwn commented Nov 24, 2017

Juke34 commented Nov 24, 2017 •

edited

Loading

reubwn commented Nov 24, 2017

Juke34 commented Nov 24, 2017 •

edited

Loading

Juke34 commented Nov 27, 2017

Issue with translating genes on complement strand #1

Issue with translating genes on complement strand #1

Comments

reubwn commented Nov 24, 2017

Juke34 commented Nov 24, 2017 • edited Loading

reubwn commented Nov 24, 2017

Juke34 commented Nov 24, 2017 • edited Loading

reubwn commented Nov 24, 2017

Juke34 commented Nov 24, 2017 • edited Loading

Juke34 commented Nov 27, 2017

Juke34 commented Nov 24, 2017 •

edited

Loading

Juke34 commented Nov 24, 2017 •

edited

Loading

Juke34 commented Nov 24, 2017 •

edited

Loading