-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run with third-party taxonomy db #18
Comments
I'm interested in using MAPseq using Silva 138 pre-clustered at 99% identity (SILVA_138_SSURef_NR99_tax_silva.fasta.gz from here) Here's what the silva files look like gzip -dc SILVA_138_SSURef_NR99_tax_silva.fasta.gz | head -n 2
>AY846380.1.2583 Eukaryota;Archaeplastida;Chloroplastida;Chlorophyta;Chlorophyceae;Monoraphidium minutum
AACCUG...
gzip -dc tax_ncbi_ssu_ref_nr99_138.txt.gz | head -n 4
root; 1 no rank
root;Viruses; 10239 superkingdom
root;Viruses;Caudovirales; 28883 order
root;Viruses;Caudovirales;Ackermannviridae; 2169529 family
gzip -dc taxmap_ncbi_ssu_ref_nr99_138.txt.gz | head -n 3
primaryAccession start stop Unclassified; submitted_name
BD359736 3 2150 root;cellular organisms;Eukaryota;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae;Plasmodium <genus>;Plasmodium (Plasmodium);Plasmodium malariae; Plasmodium malariae
AB000278 1 1410 root;cellular organisms;Bacteria <prokaryotes>;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium;Photobacterium iliopiscarium; Photobacterium iliopiscarium There's a plugin with Qiime 2 to normalize taxonomy levels, which could be helpful here. |
@colinbrislawn @alexaibio |
I have not figured out how to use custom databases, but also I have not worked on this sense posting. I would be interested in updates, though |
Hi! To use a custom database, you would need to have a file with the fasta sequences (which is already provided with SILVA), and a taxonomy file which has two (tab separated) columns one with the IDs of the fasta sequences and one with the taxonomic labels for each of the sequences. The taxonomic annotations should be normalized (equal number of ranks). That will get you a result, the problem is there are still a lot of misannotations in SILVA sequences that will throw off mapseq, so to get optimal results one would need to clean the sequences and annotations from SILVA a bit. Some collaborators have recently made such a set for SILVA which we were planning on including in the next release, I can ask them for the dataset if you are interested in it and try to push it out faster. |
You can find an example of the taxonomy (NCBI and our OTUs) files included with mapseq, the NCBI taxonomy is mapref-2.2b.fna.ncbitax and the OTU "taxonomy" is mapref-2.2b.fna.otutax. You will want to copy the parameters in the NCBI taxonomy file in the line: these are needed to exclude hits based on identity cutoffs, and should work also for the SILVA set if you use 7 taxonomic levels. |
@jfmrod Thanks a lot for your response. Also, my question was that if we use this, as I saw in some previous issue threads, how do I use the output with krona, was the krona output flag added? I don't see it in the help message. Yes we have -otucounts and -otutables option but when I import the generated -otutable in krona, it says "|Unclassified| has no OTU code". I will be really grateful if you can help me with the issue. Is it going wrong from mapseq or krona is the question! Thanks again! |
Hi Joao,
Is that possible to run MapSeq with green genes, rdp or silva taxonomy databases?
It seems like they have a different format.
If so, could you please update the readme file as well?
Best
Alex
The text was updated successfully, but these errors were encountered: