Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider performing taxonomy search using vsearch #10

Closed
colinbrislawn opened this issue Apr 13, 2018 · 1 comment
Closed

Consider performing taxonomy search using vsearch #10

colinbrislawn opened this issue Apr 13, 2018 · 1 comment

Comments

@colinbrislawn
Copy link
Collaborator

Taxonomy assignment is currently one of the longest steps. blastn running on a single node takes about 6 hours to search 23k OTU centroids against the silva 128 database. vsearch takes 1.5 minutes.

This vsearch command is equivalent to the current blastn command, with a few exceptions:

vsearch -usearch_global $input/OTUs.fna \
  -db /pic/projects/mint/hundo/ref/silvamod128.fasta \
  -maxaccepts 25 -threads 24 --maxrejects 100 \
  -strand plus -id .60 \
  -blast6out $output/vsearch-hits.txt

The largest difference is that the vsearch glocal alignment does not report the same information as blast, so some of the output columns contain less information.

program
blast OTU_9997 KF712870 92.254 284 18 3 8 289 1 282 8.10E-110 399
vsearch OTU_9997 KF712870 91.8 282 23 0 1 289 1 1518 -1 0

The hits are not identical, and vsearch consistently scores hits about 1.5% lower.


The largest difference is that hundo's LCA script cann't parse the vsearch hits.

hundo lca --min-score -1 --top-fraction .95 OTUs.fna vsearch-hits.txt \
/pic/projects/mint/hundo/ref/silvamod128.map \
/pic/projects/mint/hundo/ref/silvamod128.tre \
$output/OTU_tax.fasta $output/OTU_tax_assignments.tsv

[2018-04-13 12:46 INFO] Parsing BLAST hits
Traceback (most recent call last):
  File "/people/bris469/.conda/envs/hundo/bin/hundo", line 11, in <module>
    sys.exit(cli())
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/hundo/hundo.py", line 111, in run_lca
    lca_node = tree.get_common_ancestor(hits.names)
  File "/people/bris469/.conda/envs/hundo/lib/python3.6/site-packages/hundo/crest_classifier.py", line 163, in get_common_ancestor
    lca_path = paths[0]
IndexError: list index out of range

@brwnj
Copy link
Contributor

brwnj commented Sep 5, 2018

This is now supported using hundo annotate --aligner vsearch .... Commit of merge is 3db5bba.

@brwnj brwnj closed this as completed Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants