-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add reference sequences to reduce missing genes #79
Comments
Dear zskysafe, Sorry for my late reply. there was some problem with my network. I found you raised a new issue #78, and I think you were adding new aa sequences to I will run a test with your https://github.com/linzhi2013/MitoZ/files/5188671/Arthropoda_CDS_protein.fa.txt and check the codes, then get back to you asap. Cheers |
Hi zskysafe, That was due to inconsistency between the format of NCBI Access numbers. The mitogenomes in their RefSeq database have accession numbers like For the consistency, the accession numbers of non-RefSeq mitogenome in the
That's to say, your Best |
I can't wait to test if it works, but a mistake stopped me.
|
It just means that "Nematoda" is not in the NCBI taxonomy database. What is your full species name? when I searched in NCBI taxonomy online database, I can only found https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=6231&lvl=3&lin=f&keep=1&srchmode=1&unlock, which belongs to a phylumn, and not a rank belonging to Arthropoda. I'm not sure what have you done to the source codes. As your present command, it should only search 'Arthropoda'. I have no idea where 'Nematoda' came from. |
What I'm saying is a bit confusing. This error occurs whether I type Arthropoda or not. |
I see... It seems that your NCBI taxonomy database is broken. Maybe you don't have enough HOME space for it? if that's the case, please check https://github.com/linzhi2013/taxonomy_ranks/blob/master/README.md. The NCBI taxonomy database is regularly updated so its volume is increasing, maybe now it needs more than 600M. |
Hello zskysafe, This problem is caused by a recent change of NCBI's taxonomy database, which broke some assertion of ete3, causing it to fail parsing the data. For a more official report, please refer to etetoolkit/ete#469. The fix of this issue is declared to be released when ete4 is published, in mid-late 2020. If you need to fix this problem right now, maybe Prunoideae/MitoFlex#2 can also help you. |
Thanks @Prunoideae for pointing out the problem. This reminds me that I have already downloaded an older NCBI taxonomy database and it works for another user, you can follow the instructions here (#72 (comment)) to re-prepare your NCBI taxonomy database. Cheers |
@linzhi2013 @Prunoideae |
Part of the operation can not be carried out normally. running genewise convert result to gff3 format cat: './work71.hmmtblout.besthit.sim.fa.genewise//.genewise': No such file or directory Sorry, the annotation finished with no result! ... Error occured when running command: /usr/lib/anaconda3/envs/mitozEnv/bin/python3 /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/findmitoscaf/filter_taxonomy_by_CDS_annotation.py -fa work71.hmmtblout.besthit.sim.fa -MTsoft /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/MT_annotation_BGI_V1.32/MT_annotation_BGI.pl -db /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Animal_CDS_protein.fa -thread 8 -genetic_code 5 -requiring_taxa 'Arthropoda' -relax 0 -WISECONFIGDIR /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/wisecfg -outf work71.hmmtblout.besthit.sim.filtered.fa |
Hi zskysafe, Please send me the |
I have not changed this Animal_CDS_protein.fa. I've made a quality trimming on cleandata, so my reads` length is different. This should not affect the assembly, right? |
I'd like to know what is the content of the Then read length has not much effect in your case. |
Please raise a new issue. I'm closing this issue since the subject on "How to add reference sequences to reduce missing genes" has been resolved. |
#61 (comment)
Refer to your suggestions.
I manually expand the release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Arthropoda_CDS_protein.fa
As shown in the figure, except for ID, they are written in the format.
The following error occurred
“
can not find taxid for
can not find taxid for ['dispersus'], maybe it's a misspelling.
KeyError: 'Aleurodicus'
Error occured when running command:
/usr/lib/anaconda3/envs/mitozEnv/bin/python3 /apps/MitoZ/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/cds_ft_v2.py XJ3-1_L2_142142.cds.position.sorted.revised.filtered 5 XJ3-1_L2_142142_mitoscaf.fa.cds.ft
”
Where else do I need to add changes?
I would be very grateful if you could give me more tips to help me complete annotation of some species.
I added a suffix of txt to Arthropoda_CDS_protein.fa to facilitate uploading
log.txt
Arthropoda_CDS_protein.fa.txt
The text was updated successfully, but these errors were encountered: