-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contigs are disorganised in the EMBL file #70
Comments
We might change this order but I guess it will be only esthetic. I mean the change will affect the order in EMBL flat file but I don't think it will change anything about the ordering in ENA archive. Anyway we could try to see if changing the EMBLmyGFF3 behavior is an easy task. If so we will change that. |
Yes, it's just a question of esthetic. I have no idea what is the order they keep in the ENA archive, but I received the .gz file after submitting the annotated sequences and the order is the same as in the flat file. Honestly, I don't know if that .gz file will be the one that will be released to public as this is my first time submitting to ENA. I only uploaded the assembly (only interested in the contig level, not scaffolds) and the sequence annotations. In fact, I didn't know about the AGP file. Should I upload this file too? Is this format correct for that file? I haven't seen this file before...
Thanks for your help! |
If order matter yes you should upload an AGP file. See the dedicated section in the ENA help: Loot at biostars.org, there were several questions related to AGP files. You can try the validator then to be sure if it is well done: https://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_validate.cgi |
I got an answer: Order matter:
So we need to update EMBLmyGFF3 to fix this problem |
@ireneortega I would need some feedbacks from you to close this issue. I have tried on my side and EMBLmyGFF3 v2.1 sounds to work as expected. Did you use an older version? Could you tell my your python version, EMBLmyGFF3 version and biopython version? Otherwise could you try with EMBLmyGFF3 v2.1 to see if you see the same problem? |
@Juke34 My contig ordering is not kept in the EBML file in the same way as in the fasta and gff files, so contigs are ordered in the way I told you at the beginning. After assembling, contigs are named as contig1, contig2, etc. and then contigs are reordered, imagine this way: contig14, contig1, contig3, etc, so I want the EMBL shows the contigs in this same way. What I got is: contig10, contig11...contig19, contig1, contig20... I am using EMBLmyGFF3 v2.1 with Python 2.7.18 and biopython 1.76. |
Ok then it wiould be fixed if you install python >=3.6
I made a try with this order
The order from the fasta is kept by
I made a try moving around order of GFF feature, the final order is still respected. So updating python would fix the problem. |
I've just installed EMBLmyGFF3 through conda with python 3.6 and the same problem appeared (contig1, contig10, contig11...). |
Try with branch 2.2.
Then it should work properly. |
Please feel free to reopen the issue if you still encountered problem in v2.2 of EMBLmyGFF3 |
My genome has 30 contigs named consecutively as contig1, contig2, contig3..., contig29 and contig30. When I create the EMBL file, contigs are ordered this way: contig1, contig10, contig11...contig19, contig2, contig20,... However, I don't like that order because my contigs are ordered against a reference genomes and then genome is disordered in the EMBL file to submit. I want the EMBL file to keep the contigs in this order: contig1, contig2, contig3..., contig29 and contig30. What can I do to keep the order of my contigs?
The text was updated successfully, but these errors were encountered: