How to add reference sequences to reduce missing genes #79

Tianyou96 · 2020-09-08T13:45:31Z

#61 (comment)
Refer to your suggestions.
I manually expand the release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Arthropoda_CDS_protein.fa

As shown in the figure, except for ID, they are written in the format.

The following error occurred
“
can not find taxid for
can not find taxid for ['dispersus'], maybe it's a misspelling.
KeyError: 'Aleurodicus'
Error occured when running command:
/usr/lib/anaconda3/envs/mitozEnv/bin/python3 /apps/MitoZ/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/cds_ft_v2.py XJ3-1_L2_142142.cds.position.sorted.revised.filtered 5 XJ3-1_L2_142142_mitoscaf.fa.cds.ft
”

Where else do I need to add changes？
I would be very grateful if you could give me more tips to help me complete annotation of some species.
I added a suffix of txt to Arthropoda_CDS_protein.fa to facilitate uploading
log.txt
Arthropoda_CDS_protein.fa.txt

linzhi2013 · 2020-09-08T15:54:04Z

Dear zskysafe,

Sorry for my late reply. there was some problem with my network.

I found you raised a new issue #78, and I think you were adding new aa sequences to release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Arthropoda_CDS_protein.fa, this should work.

I will run a test with your https://github.com/linzhi2013/MitoZ/files/5188671/Arthropoda_CDS_protein.fa.txt and check the codes, then get back to you asap.

Cheers

linzhi2013 · 2020-09-09T08:05:14Z

Hi zskysafe,

That was due to inconsistency between the format of NCBI Access numbers. The mitogenomes in their RefSeq database have accession numbers like NC_001620, while the non-RefSeq mitogenomes have accession numbers like KT225300, and we used the _ to split the string.

For the consistency, the accession numbers of non-RefSeq mitogenome in the elease_MitoZ_v2.4-alpha/bin/profiles/MT_database/Arthropoda_CDS_protein.fa file MUST also start with >gi_NC_, the result look like this:

>gi_NC_KP861632_ND6_Chrysomya_pacifica_174_aa
>gi_NC_KX090381_ND6_Microthoracius_praelongiceps_157_aa
>gi_NC_EU583500_ND6_Euphausia_superba_173_aa

That's to say, your >gi_KR_063274_ATP6_Aleurodicus_dispersus_216_aa should be reformated as >gi_NC_KR063274_ATP6_Aleurodicus_dispersus_216_aa.

Best

Tianyou96 · 2020-09-09T12:50:32Z

I can't wait to test if it works, but a mistake stopped me.
This error is shown below
''
can not find taxid for Nematoda
can not find taxid for ['Nematoda'], maybe it's a misspelling.
Please use other taxanomy name.
''
Arthropoda will also report a mistake.
I tried to unload and reload mitozEnv and mitoz. However, this problem still exists, taxid will terminate the program at the beginning of running.
I haven't changed that file since I reloaded mitoz.

python3 $DIR_mitoz/MitoZ.py all2 --genetic_code 5 --clade Arthropoda --outprefix $name \ --thread_number 8 \ --fq_size 5 \ --fastq1 $fq1 \ --fastq2 $fq2 \ --fastq_read_length 150 \ --insert_size 250 \ --run_mode 2 \ --filter_taxa_method 1 \ --requiring_taxa 'Arthropoda' >> mitoz.log 2>&1

linzhi2013 · 2020-09-09T13:24:06Z

It just means that "Nematoda" is not in the NCBI taxonomy database.

What is your full species name? when I searched in NCBI taxonomy online database, I can only found https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=6231&lvl=3&lin=f&keep=1&srchmode=1&unlock, which belongs to a phylumn, and not a rank belonging to Arthropoda.

I'm not sure what have you done to the source codes. As your present command, it should only search 'Arthropoda'. I have no idea where 'Nematoda' came from.

Tianyou96 · 2020-09-09T14:00:17Z

What I'm saying is a bit confusing.
Actually，I tried Nematoda and Arthropoda at the same time. Both have the same error at the beginning of the run.
can not find taxid for Arthropoda can not find taxid for ['Arthropoda'], maybe it's a misspelling. Please use other taxanomy name.
This problem existed before I installed it again.
After I re installed mitoz, I didn't make any changes to the source code.
The problem is still there, I try to change the input.
python3 $DIR_mitoz/MitoZ.py all2 --genetic_code 5 --outprefix $name \ --thread_number 8 \ --fq_size 5 \ --fastq1 $fq1 \ --fastq2 $fq2 \ --fastq_read_length 150 \ --insert_size 250 \ --run_mode 2 \ >> mitoz.log 2>&1

This error occurs whether I type Arthropoda or not.
Then, I tried to run a couple of mitoz on different paths, which had not changed since the decompression.
This problem will still arise.
I feel confused
I didn't make any changes to the mitozEnv.

I reload NCBI taxonomy database

linzhi2013 · 2020-09-09T14:21:14Z

I see...

It seems that your NCBI taxonomy database is broken. Maybe you don't have enough HOME space for it? if that's the case, please check https://github.com/linzhi2013/taxonomy_ranks/blob/master/README.md.

The NCBI taxonomy database is regularly updated so its volume is increasing, maybe now it needs more than 600M.

Prunoideae · 2020-09-09T15:58:34Z

What I'm saying is a bit confusing.
Actually，I tried Nematoda and Arthropoda at the same time. Both have the same error at the beginning of the run.
can not find taxid for Arthropoda can not find taxid for ['Arthropoda'], maybe it's a misspelling. Please use other taxanomy name.
This problem existed before I installed it again.
After I re installed mitoz, I didn't make any changes to the source code.
The problem is still there, I try to change the input.
python3 $DIR_mitoz/MitoZ.py all2 --genetic_code 5 --outprefix $name \ --thread_number 8 \ --fq_size 5 \ --fastq1 $fq1 \ --fastq2 $fq2 \ --fastq_read_length 150 \ --insert_size 250 \ --run_mode 2 \ >> mitoz.log 2>&1

This error occurs whether I type Arthropoda or not.
Then, I tried to run a couple of mitoz on different paths, which had not changed since the decompression.
This problem will still arise.
I feel confused
I didn't make any changes to the mitozEnv.

I reload NCBI taxonomy database

Hello zskysafe,

This problem is caused by a recent change of NCBI's taxonomy database, which broke some assertion of ete3, causing it to fail parsing the data.

For a more official report, please refer to etetoolkit/ete#469.

The fix of this issue is declared to be released when ete4 is published, in mid-late 2020. If you need to fix this problem right now, maybe Prunoideae/MitoFlex#2 can also help you.

linzhi2013 · 2020-09-09T20:40:16Z

Thanks @Prunoideae for pointing out the problem.

This reminds me that I have already downloaded an older NCBI taxonomy database and it works for another user, you can follow the instructions here (#72 (comment)) to re-prepare your NCBI taxonomy database.

Cheers

Tianyou96 · 2020-09-10T01:07:36Z

@linzhi2013 @Prunoideae
Thank you very much. With your help, the problem has been solved. I downloaded an older NCBI taxonomy database provided by linzhi2013.

Tianyou96 · 2020-09-10T09:05:01Z

Part of the operation can not be carried out normally.
``
run the genewise shell file

running genewise

convert result to gff3 format

cat: './work71.hmmtblout.besthit.sim.fa.genewise//.genewise': No such file or directory

Sorry, the annotation finished with no result!

...

Error occured when running command:

/usr/lib/anaconda3/envs/mitozEnv/bin/python3 /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/findmitoscaf/filter_taxonomy_by_CDS_annotation.py -fa work71.hmmtblout.besthit.sim.fa -MTsoft /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/MT_annotation_BGI_V1.32/MT_annotation_BGI.pl -db /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Animal_CDS_protein.fa -thread 8 -genetic_code 5 -requiring_taxa 'Arthropoda' -relax 0 -WISECONFIGDIR /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/wisecfg -outf work71.hmmtblout.besthit.sim.filtered.fa
``

linzhi2013 · 2020-09-10T10:10:33Z

Part of the operation can not be carried out normally.
``
run the genewise shell file

running genewise

convert result to gff3 format

cat: './work71.hmmtblout.besthit.sim.fa.genewise//.genewise': No such file or directory

Sorry, the annotation finished with no result!

...

Error occured when running command:

/usr/lib/anaconda3/envs/mitozEnv/bin/python3 /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/findmitoscaf/filter_taxonomy_by_CDS_annotation.py -fa work71.hmmtblout.besthit.sim.fa -MTsoft /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/MT_annotation_BGI_V1.32/MT_annotation_BGI.pl -db /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Animal_CDS_protein.fa -thread 8 -genetic_code 5 -requiring_taxa 'Arthropoda' -relax 0 -WISECONFIGDIR /apps/Mitoz/version_2.4-alpha/release_MitoZ_v2.4-alpha/bin/annotate/wisecfg -outf work71.hmmtblout.besthit.sim.filtered.fa
``

Hi zskysafe,

Please send me the release_MitoZ_v2.4-alpha/bin/profiles/MT_database/Animal_CDS_protein.fa file and your mitogenome sequences. please send to linzhi2012<mitoz>@<mitoz>gmail<mitoz>com

Tianyou96 · 2020-09-10T11:42:04Z

I have not changed this Animal_CDS_protein.fa.
My mitogenome sequences？ I chose the all2 mode. I haven't got the assembly result file yet.

Animal_CDS_protein.zip

I've made a quality trimming on cleandata, so my reads` length is different. This should not affect the assembly, right?

python3 $DIR_mitoz/MitoZ.py all2 --genetic_code 5 --clade Arthropoda --outprefix $name \ --thread_number 8 \ --fq_size 5 \ --fastq1 $fq1 \ --fastq2 $fq2 \ --fastq_read_length 150 \ --insert_size 250 \ --run_mode 2 \ --filter_taxa_method 1 \ --requiring_taxa 'Arthropoda' >> mitoz.log 2>&1

linzhi2013 · 2020-09-10T14:57:30Z

I have not changed this Animal_CDS_protein.fa.
My mitogenome sequences？ I chose the all2 mode. I haven't got the assembly result file yet.

Animal_CDS_protein.zip

I've made a quality trimming on cleandata, so my readslength is different. This should not affect the assembly, right? ![图片](https://user-images.githubusercontent.com/51896128/92724687-9eed7a80-f39d-11ea-8c1f-e15c9350eb87.png)python3 $DIR_mitoz/MitoZ.py all2 --genetic_code 5 --clade Arthropoda --outprefix $name \ --thread_number 8 \ --fq_size 5 \ --fastq1 $fq1 \ --fastq2 $fq2 \ --fastq_read_length 150 \ --insert_size 250 \ --run_mode 2 \ --filter_taxa_method 1 \ --requiring_taxa 'Arthropoda' >> mitoz.log 2>&1`

I'd like to know what is the content of the work71.hmmtblout.besthit.sim.fa file. If it is empty, then no mitochondrial sequence was found from work71.ScafSeq. How much G bp data did you use for mitogenome assembly?

Then read length has not much effect in your case.

linzhi2013 · 2020-09-10T14:59:50Z

Please raise a new issue.

I'm closing this issue since the subject on "How to add reference sequences to reduce missing genes" has been resolved.

linzhi2013 mentioned this issue Sep 9, 2020

tRNA get well annotated but not CDS's #61

Closed

linzhi2013 closed this as completed Sep 10, 2020

linzhi2013 mentioned this issue Sep 11, 2020

can not find taxid for ['Chordata'], maybe it's a misspelling. #81

Closed

linzhi2013 mentioned this issue Jan 8, 2021

Problem using the Singularity image to run MitoZ v2.3 #99

Closed

linzhi2013 mentioned this issue Aug 9, 2021

can not find taxid for ['Arthropoda'], maybe it's a misspelling. #100

Closed

linzhi2013 mentioned this issue Mar 26, 2022

assembly is linear and atp 6 is missing #142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to add reference sequences to reduce missing genes #79

How to add reference sequences to reduce missing genes #79

Tianyou96 commented Sep 8, 2020 •

edited

Loading

linzhi2013 commented Sep 8, 2020

linzhi2013 commented Sep 9, 2020 •

edited

Loading

Tianyou96 commented Sep 9, 2020 •

edited

Loading

linzhi2013 commented Sep 9, 2020

Tianyou96 commented Sep 9, 2020 •

edited

Loading

linzhi2013 commented Sep 9, 2020

Prunoideae commented Sep 9, 2020 •

edited

Loading

linzhi2013 commented Sep 9, 2020

Tianyou96 commented Sep 10, 2020

Tianyou96 commented Sep 10, 2020 •

edited

Loading

linzhi2013 commented Sep 10, 2020 •

edited

Loading

Tianyou96 commented Sep 10, 2020 •

edited

Loading

linzhi2013 commented Sep 10, 2020 •

edited

Loading

linzhi2013 commented Sep 10, 2020

How to add reference sequences to reduce missing genes #79

How to add reference sequences to reduce missing genes #79

Comments

Tianyou96 commented Sep 8, 2020 • edited Loading

linzhi2013 commented Sep 8, 2020

linzhi2013 commented Sep 9, 2020 • edited Loading

Tianyou96 commented Sep 9, 2020 • edited Loading

linzhi2013 commented Sep 9, 2020

Tianyou96 commented Sep 9, 2020 • edited Loading

linzhi2013 commented Sep 9, 2020

Prunoideae commented Sep 9, 2020 • edited Loading

linzhi2013 commented Sep 9, 2020

Tianyou96 commented Sep 10, 2020

Tianyou96 commented Sep 10, 2020 • edited Loading

linzhi2013 commented Sep 10, 2020 • edited Loading

Tianyou96 commented Sep 10, 2020 • edited Loading

linzhi2013 commented Sep 10, 2020 • edited Loading

linzhi2013 commented Sep 10, 2020

Tianyou96 commented Sep 8, 2020 •

edited

Loading

linzhi2013 commented Sep 9, 2020 •

edited

Loading

Tianyou96 commented Sep 9, 2020 •

edited

Loading

Tianyou96 commented Sep 9, 2020 •

edited

Loading

Prunoideae commented Sep 9, 2020 •

edited

Loading

Tianyou96 commented Sep 10, 2020 •

edited

Loading

linzhi2013 commented Sep 10, 2020 •

edited

Loading

Tianyou96 commented Sep 10, 2020 •

edited

Loading

linzhi2013 commented Sep 10, 2020 •

edited

Loading