Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converts to embl but with interrogation marks in SQ #35

Closed
Iseez opened this issue Aug 14, 2019 · 3 comments
Closed

Converts to embl but with interrogation marks in SQ #35

Iseez opened this issue Aug 14, 2019 · 3 comments
Labels

Comments

@Iseez
Copy link

Iseez commented Aug 14, 2019

I'm working with yeasts and I really need their embl files, but when i run the programm (using bash and python), i encounter with the next problem:
The current warnings pops on the terminal (though I dont think they are the cause of the problem):

17:25:17 ERROR feature: >>trna<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
17:25:17 WARNING feature: Unknown qualifier 'NAME' - skipped
17:25:17 ERROR feature: >>trna_exon<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
17:25:30 ERROR feature: >>UTR<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
Conversion done

And, as it says on the final line, the conversion is done, when I open the generated embl the features are fine, but the sequence is all interrogation marks.

FT /transl_table=12
XX
SQ Sequence 2596028 BP; 0 A; 0 C; 0 G; 0 T; 2596028 other;
?????????? ?????????? ?????????? ?????????? ?????????? ?????????? 60
?????????? ?????????? ?????????? ?????????? ?????????? ?????????? 120

And if I keep scrolling, it is as if the conversion had started again:

 ????????                                                            2596028

//
ID XXX; XXX; linear; genomic DNA; XXX; XXX; 2596667 BP.
XX
AC XXX;
XX
AC * SOME_YEAST
XX
PR Project:XXX;

After that the only existing feature is "gap" and the sequence (SQ) is now like it is suppossed to be:
FT gap 2556681..2556981
FT /estimated_length=301
XX
SQ Sequence 2596667 BP; 806943 A; 475017 C; 477105 G; 804281 T; 33321 other;
AATCTGCTCA GTAAGGCCCA TAAATCGGCT CTGCATTTCT TCTGTGGGCA TTTTGCCGTA 60
CTTTTTTAAT TATGTTGCAG ACGAAACTGA ATCAAGCTCG TCGACAGCTT CGTACAGCCT 120

I have no idea why this would happen, I really hope you can help me figure out what is happening,
I really need those emb files.

@Juke34
Copy link
Collaborator

Juke34 commented Aug 15, 2019

Your gff annotation contains annotations attached to a sequence that is absent from the fasta file provided. So it creates a unknown sequence (with ??? because it cannot guess the DNA sequence) with the lenght corresponding to the position of the last feature found on it. Check your fasta file, it shoiuld be the same as used for the annotation.

Otherwise you can filter the annotations not attached to sequences present in the fasta file using the script gff3_sq_keep_annotation_from_fastaSeq.pl from the GAAS toolkit

@Iseez
Copy link
Author

Iseez commented Aug 21, 2019

Thanks, that worked.

Just one question more, when i was tryng to obtain the embl for a different species i encountered the following error:

Traceback (most recent call last):                                             ]
  File "/cm/shared/apps/emblmygff3/1.2.6/bin/EMBLmyGFF3", line 11, in <module>
    load_entry_point('EMBLmyGFF3==1.2.6', 'console_scripts', 'EMBLmyGFF3')()
  File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1383, in main
    writer.write_all( outfile )
  File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1179, in write_all
    self._add_mandatory()
  File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 195, in _add_mandatory
    if seq[end] == 'n' :
IndexError: string index out of range

Is the problem due to the files I'm using as input?

@Juke34
Copy link
Collaborator

Juke34 commented Aug 21, 2019

The original question is resolved. I close this issue. I created a new issue reporting the other problem you are encountered => see #38

@Juke34 Juke34 closed this as completed Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants